Tag Archives: Big Data

Social Analysis with DataSift, Google Enterprise and Tableau at #sdwk13 London

A compelling presentation showing how easy it is to take off-the-shelf software (COTS in the old jargon) to go right from extracting social data to sorting, querying and presenting it.

For those of us from an IT Architecture background I’ve illustrated the ETL/Data Warehouse type steps that these three offerings bring together (they have already built the integrations), it really did look very straightforward.

social data week 1

What was missing, for me, was the ability to explore or analyse the social network. I spoke to Datasift a few weeks ago about Twitter data and they explained that they did not provide follower data so, at this stage, those of us wanting to look into the networks are still going to have to write a bit of code.

Time Well Spent at Cloud World Forum and Big Data World Congress

cwf_bdwc

An interseting couple of days.

Google are targeting enterprises as a PaaS provider. Their view is Digital Natives will bring consumer technologies into the workplace and concentrate on systems of collaboration rather than systems of record . They put a lot of value in the Gartner Nexus of Forces.

Performance Management of Cloud Platforms (PaaS): with elastic computing sloppy development of inefficient code can be masked by infrastructure but at what cost? Application Performance Monitoring may be more important, than in on-prem, to help stop money leaks.

The panel discussion of Big Data Skills started with a great quote: “do you have a [Big Data] Problem or is it a Big [Data Problem]?”, well I found it amusing. The discussion concluded that:

  • Many organisations are struggling to get Big Data out of R&D
  • We need to be careful that it does not become over-intrusive in people’s lives

As we know there is rarely anything genuinely new: I met the guys from elasticsearch who explained it uses Lucene (as does Neo4j) to index text and that this was something called an inverted index. Well that bought back a few memories from the early 90s when I worked for Dataware Technologies integrating BRS into customer solutions.

The Open Data Institute promote the use of government data and offer help (not financial) to start-ups who want to take that data and add value to it.

Talend echoed sentiment from last week’s BDA conference: why Extract – Transform – Load when you can Extract – Load – Transform using Hadoop.

The Cloud Security Alliance called for more transparency and honesty which should apply to corporations as well as governments. It’s something Enterprise Architects need to consider when they examine proposals that will impact individuals’ anonymity.

The question of whether recent revelations about PRISM will see people move away from Facebook was raise during a panel discussing protection of sensitive data in the cloud. I suspect not but time will tell.

Digital Innovation Group reinforced the message about the importance of context and trying to understand the language of peoples social media output. You need to be able to deal with slang and differentiate between not agreeing with an opinion versus that being their opinion.

Whitehall Media Big Data Analytics, June 2013

Whitehall Media Big Data Analytics Logo

A very interesting day. In the introduction the chairman asked for a show of hands as to who worked in IT or other parts of organisations: the split was about 50/50.

My main observations are:

  • Hadoop is the standard; think of it as ETL on steroids: you will probably still want to feed the results into traditional databases and analytical tools. Hive provides a SQL-like language over the top. You can use Hadoop to make your archives ‘active’
  • Organisations need to know what is being said about them, too often people find out what is happening in their own organisation on social media first.
  • Think of the value in the data. For example car manufacturers are increasing the number of sensors in cars and collecting the data: they understand how you drive, maybe they could offer you insurance?
  • Context is very important when looking at a piece of unstructured data.
  • Decision makers need to be given a relevant subset of data.
  • Organisations need to monitor global mega-trends. Take a look at http://www.news-spectrum.com/
  • If you are analysing email content the disclaimers often placed at the end of the message can cause a lot of misleading conclusions
  • “See Lots – Know Little – Do Less” (David Ackroyd, Telefonica); in other words too much information is not useful
  • When you have a lot of data you can start looking for hidden patterns
  • Prediction: can you spot customers who are about to depart?
  • A Big Data initiative needs to offer value. Look for the sweet spot: a conjunction of revenue, cost and risk.
  • Make sure Big Data thinking includes an outside-in perspective
  • Data Art is the next big paradigm?