A very interesting day. In the introduction the chairman asked for a show of hands as to who worked in IT or other parts of organisations: the split was about 50/50.
My main observations are:
- Hadoop is the standard; think of it as ETL on steroids: you will probably still want to feed the results into traditional databases and analytical tools. Hive provides a SQL-like language over the top. You can use Hadoop to make your archives ‘active’
- Organisations need to know what is being said about them, too often people find out what is happening in their own organisation on social media first.
- Think of the value in the data. For example car manufacturers are increasing the number of sensors in cars and collecting the data: they understand how you drive, maybe they could offer you insurance?
- Context is very important when looking at a piece of unstructured data.
- Decision makers need to be given a relevant subset of data.
- Organisations need to monitor global mega-trends. Take a look at http://www.news-spectrum.com/
- If you are analysing email content the disclaimers often placed at the end of the message can cause a lot of misleading conclusions
- “See Lots – Know Little – Do Less” (David Ackroyd, Telefonica); in other words too much information is not useful
- When you have a lot of data you can start looking for hidden patterns
- Prediction: can you spot customers who are about to depart?
- A Big Data initiative needs to offer value. Look for the sweet spot: a conjunction of revenue, cost and risk.
- Make sure Big Data thinking includes an outside-in perspective
- Data Art is the next big paradigm?
Pingback: Mining Twitter from Windows Azure (Part 1) | Robert Gimeno's Adventures in Data Science
Pingback: Time Well Spent at Cloud World Forum and Big Data World Congress | Robert Gimeno's Adventures in Data Science