Data – How do we talk about it?
Media and their technologies are the carriers of public conversation and, increasingly these days, also the subject. For example, the content of our talk is transmitted digitally as data but data itself is also particularly ‘big’ in the public conversation at the moment. Who is talking about ‘big data’, why and what does the public discourse reveal about how the concept is understood? Analysing public discourse is a way of reading between the lines of big topic conversations like this. It goes beyond conscious statements and explores bigger themes and resonances that can tell us something about the role big data plays.
Inevitably of course, we need both quantitative and qualitative answers to understand ‘big data’ discourse. First, and only briefly though, to the sums: a quick and rough database search for news articles (in Ireland), which mention “big data”, shows that since 2000, it has appeared with increasing frequency. It is not a new concept, but we are talking about it much more now than ever. For the first eleven years of this century, a total of eight articles made reference to ‘big data’. It made headlines but could not be described as ‘big’ in the public consciousness. Then in 2012, ‘big data’ became the ‘next big thing in technology’ and appeared in nine articles in one year. In 2013, this grew to 42 and in the first five months of 2014 over 50 articles focused on ‘big data’, suggesting that the conversation will keep getting bigger, until at some point…well, something will happen. We would need to further analyse the data to be able to predict what direction the discourse might follow and whether the pattern relates to a certain ‘hype cycle’ – a question for another time. For now, I’m more interested in the qualitative aspects – how we are talking about big data?
Extra! Extra! Big Data in Fear v. Promise shock!
Since 2012, news reports on big data have followed a pattern familiar to that of reporting on many other ‘new’ technological phenomena – the promise of what it can deliver and how it might change the world is set against the fear that it is destabilising structures and cultures. Simply replace the words ‘big data’ in some of these headlines with the words ‘social media’, ‘browsers’, ‘internet’ – even ‘electricity’ – to understand the trend.
The overriding message is that “big data” is going to change everything – but no one is quite sure yet how this will happen. There remains some suspicion around “big data”, reflected in the occasional use of “scare quotes”. However, the ramp up in the number of articles, and their focus of interest, reflects something like a ‘ diffusion of innovations’ pattern (after Everett Rogers, 1962). Perhaps this is more accurately described as the loose-predictions-of-how-an-innovation-might diffuse, prevalent in technology journalism and sometimes part of what Winston (1995) calls the ‘technicist’ commentary. The discourse has moved on from the general ‘what is it?’ discussion of the early century, to an examination of how big data might impact on diverse fields, from sports (strategic team/event training and planning), entertainment (Netflix algorithms, predicting a box office smash), science (major advances in the ‘omics’ ), farming (cheese status updates), transport (real time traffic analysis) politics (little brother is watching) and of course business.
Big data is big business, for the new enterprises whose ephemeral data-centred processes and services provide many fast growth success stories to fill the newshole. And (opportunity’s long-term partner) ‘threat’ provides many more stories on how long established technology companies are grappling with the change while their capital is tied up in physical hardware and devices. Both the opportunity and threat coalesce in the idea that big data/brother embodies all our concerns over personal data privacy, authoritative state surveillance and Snowdenesque revelations about potential uses of social media data banks. Ultimately these discourses associate big data with the mediatisation and thus monetization of everything we know – data is distilled for its information which is converted to revenue, for better or worse. Meanwhile the media, which runs on digital data, is part of the diffusion of innovations process, providing communication channels through which and by which the innovation is transmitted over time across a social system. The discourse reports on and makes early adopters and laggards of us all.
What is interesting about the coverage is how comprehensive it is across the spectrum of promise to fear and across so many fields of business, culture and farming. But then Ireland is an interesting place to explore public discourse on big data, indeed on anything digital, positioned as we are in semi-peripheral state of connection to/dependance on so many relevant discourse communities. Ireland hosts world and regional headquarters of some of the largest data-monetising companies globally (Google, Facebook, Microsoft), indeed is friends with them, and yet is also legally responsible for implementing EU rules on data protection over their dark materials. Meanwhile our cool damp weather, while less than useful for most things, is deemed ideal for running lower cost data centres, generating income and employment at a time when both are sorely needed. Government industrial policy positions Ireland as a digital ‘hub’, with big data needs, reflecting an EU ‘digital agenda’ across all policy platforms. Whether the media is reflecting the results of this policy or playing into its dissemination or implementation is, again, a question for another time. The recent coverage of big data merely follows typical patterns and explores the gamut of reaction to what can be done with big data and what it might do to us. However, it is the earlier discourses from the first ten years of the century that, although are thinner on the ground, give deeper insights into what this ‘next big thing’ might actually be.
Mining for metaphors on “big data”
Digital data has been growing in Saganian multiples for over two decades. From 2000 to 2010, much of the public discourse on ‘big data’ was about ‘big data-bases’ and the emerging area of ‘data-mining’. Until recently it was considered just a resource, which could be mined but whose potential for extraction, dissemination and use was – like Ireland’s offshore gas fields – just a pipe-dream. Any potential beyond this was unknown as the processing power required for actually doing something with big data was, by today’s standards, very slow, very laborious (for both computers and humans) and very very expensive.
The persistent use of the term ‘mining’ among contributors to the discourse on big data in its early years is not just a metaphor for the labour intensive digging required to extract value. It creates obvious but striking analogies with the exploitation of coal mines, especially in the 19th century, and their subsequent impact on industry and society. Coal has always existed as a resource but until the 1800s was extracted and used only in very small, local ways. It became the ‘next big thing’ due in part to a rapid increase in the speed, ease and cost effectiveness of extraction and distribution. Communication channels again play a part in its diffusion, both physically and metaphorically as the development of railways – the new media and communications of the time – were intricately connected with the growth of a large scale coal mining industry. Rail helped to extract and distribute coal which paid for the railways, which used coal in their engines, to travel further to distribute coal further, developing new coal-based industries and coal reliant markets requiring more coal and more railways…
Here’s another sum: Resources + Communications = Value
Today’s mines are clean, ethereal and float in fluffy white clouds where little blue birds flutter in and out and baby elephants chuckle underneath. Their data resource is invisible and ubiquitous and is easy, quick and cheap to extract, distribute and use. Big data mines are made accessible by the railroads of cable, satellite and invisible waves passing the resource around for all who need it. At least that’s how we talk about it. We are light years beyond dirty resources like coal and oil whose discourse after a century of diffusion is now overwhelmingly about the fear and seldom about the opportunity. These are crude resources compared to data which is, to paraphrase an old master, an elegant resource for a more civilised age. A whole new industry, an economic paradigm, has arisen in the distribution and use of data creating further new innovations, enterprises and forms of employment.
Data analytics – the new mining?
Predictive modeling, sentiment analysis, insights… Public discourse now focuses on how big data can tell us (or mostly others) what we are tweeting, sharing, buying, reading, thinking, feeling and most importantly, what we might do next. After mining, data analytics forms the next part of a line of distinct influence, where business discourses on technologies bleed into the public discourse. Again, similar to historic discourses on other technological concepts, the conversation on big data is initially broadly inclusive and reflective of interests across all potential fields of diffusion. Slowly however, the language starts to reflect the field perhaps with the greatest investment in its potential. Discourse communities adopt language, terminology and features from other orders of discourse for a variety of reasons, often to achieve strategic aims in communication such as promotion or association. This is what Fairclough (2009) describes as the transfer of discourses ‘interdiscursively’, which is what happens for example when university presidents appear to speak in marketing terms, or politicians sound like PR executives. Phrases such as ‘democratising data’, creating a ‘data culture’ and ‘gaining insights’ started to appear in public discourse on big data in 2012 and can traced to their origins in influential business, recruitment and consulting firms discourses.
“Fostering a data driven culture” Economist Intelligence Unit, February 19, 2013
The Economist reported in 2013 on the importance of acknowledging ‘data culture’ in traditionally non-IT businesses, stating that, for example, until now, “the marketing domain has been dominated by creative types. Now it is as much a quantitative science as it is an exercise in art and design”. This suggests that adopting a ‘data culture’ doesn’t just change business practices but actively privileges quantitative ‘data’ over qualitative information. In doing so it implies that mere qualitative analysis and information cannot create ‘insights’ of the same value as data can.
“Do you like creating insights from data?” asks a well-known professional networking website, which uses data analytics to target potential recruits with jobs they might be interested in, such as data analytics. Frankly, it would be hard for any sentient being to live a functional life without creating insights from data all the time. However, those who answer ‘yes’ and get the job will most likely end up spending time creating insights with the help of a small yellow elephant called Hadoop. This open source data analytics software has been central to the big data revolution, significantly improving the ease and speed of data analysis mainly through a process-concept called ‘map-reduce’. This describes a set of algorithms which systematically map the data, and reduce it to ‘tuples’ (key/value pairs) which are subsequently mapped, paired, reduced, mapped, paired…and so on. The processing rapidly takes care of what were once impossibly large computational tasks and there is no questioning its functional value. However, software labelling has a life beyond the process it describes. It influences how we think and talk about all digital data and our relationships with it, to the extent where the narrative of the process – copy/paste – becomes part of the narrative of our lives (see Manovich, 2002).
Another sum: Ctrl C +Ctrl V = content is free
The impact of map/reduce on our concept of big data – supposed to be too big for the human brain to conceive let alone process – makes it more manageable in discourse terms. Adopting a map/reduce narrative into our general worldview on the other hand, reduces the content of social and human life to ‘tuples’ and ‘insights’, a binary logic that carries an anthropic principle and takes all the fun out of our multiverse existence. After all, an ‘insight’ is defined (OED) as the ‘capacity to attain deep understanding’, ‘apprehending the true nature’ or a ‘penetrating mental vision’ of a thing. The ‘insights’ arrived at in analyses of big data are generally more directed and intended to predict, plan or provoke action. Thus, they can only produce partial understanding of the concept of big data itself while simultaneously producing more data to add to it. The ‘next big thing’ does not stand still. Map/reduce that.
“Analytics start-ups drink deep from Twitter ‘fire hose’ of data” –
Financial Times, December 4, 2013
Analysts talk of big data like water, another global resource produced freely and naturally (though increasingly received as neither). Data analysts want data on tap, but not on any old domestic tap. They want the ‘fire hose’ torrent of high value public data gathered by Twitter and similar companies, but they complain of limited access to merely a ‘garden hose’ flow. These metaphors effectively convey some key issues in data distribution, control and delivery. But the vessels described are blunt instruments for fire fighting, crowd control, soakage – uses related more to the force of delivery than the specific nature of the resource itself (a powerful jet of frogs can effectively put out a fire).
What these metaphors reveal is the emphasis, an unswerving focus on quantity over the qualitative nature of data. The volume and speed of data production and distribution attracts uses, which reflect its velocity such as instantaneous targeted advertising, real-time traffic information/transport adjustment and so on. These are fast voluminous calculations but they are not necessarily based on insights, apart from those that arrive at the business end. When a resource becomes affordably accessible, someone develops a deep understanding of how to monetise it, apprehends its true value to someone and experiences a penetrating mental vision of business innovations in its delivery and reception. These insights come from a qualitative analysis of small trickles of data drunk deep. If water delivery mechanisms remain the metaphors of choice, perhaps the ‘watering can’ is a more appropriate way to create ‘insights’ of deeper value as to what to do with data – seedlings don’t thrive under a fire-hose.
Public discourse is dominated by talk about the value of big data, in trickles and as a whole – calculated in one article as EUR5.1bn today [March 2014] and rising to EUR53.4 Bn in 2017 – impossibly large numbers about impossibly large numbers. Who benefits from big data value is only beginning to be discussed such as in an article published to coincide with the Web Summit, the largest technology conference in Europe, taking place in Dublin, where big data in all its forms was on the menu:
“It used to be that the richest people owned oil fields or transportation systems. Now, the top new wealth is accruing to those who own information hubs. The richest people own mobile phone networks, social networks, or some other kind of platform from which to gather everyone else’s data.”
[Jaron Lanier, Irish Times 31/10/13]
Where does big data live?
No fluffy white clouds here, only cool mega warehouses, the new mines sprouting up all over the world…
Where does data protection live?
What’s missing from the discourse?
So far, the talk about ‘big data’ focuses on quantity: the amount of data being produced, the speed and volume of analysis and the size of the potential business returns. But while many valuable insights have been extracted from big data and its potential still holds much promise, there is always the risk of being subject to what Charles Wheelan (2013) describes as the most common flaw in quantitative analysis: “garbage in – garbage out”. Data analysts are also at risk of being diagnosed with symptoms of apophenia (boyd & Crawford (2011) – seeing patterns where none may actually exist.
What is missing from the public discourse to date and may help avoid such risks is maintaining a link between content, data and information, the latter being where real insights may lie. I believe that the key to transforming content into data and thence into valuable information is by looking for stories in the data and developing a better understanding of interactive narratives. But rather than taking data and looking randomly for the narratives it may contain, as analysts are often require to do, we need to start (as academics are usually required to do) with a question – what, where, who, when, why – and answer it (as journalists are usually required to do) methodically in acknowledgment of the fluid nature of data. Interpreting data in this way makes it available to people so that they can both tell and hear the stories that give us the information we really need.
The Bloomberg billionaires is a web application which uses big data to tell user-selected stories about the world’s richest people, including the ability to select live data on those who increasingly own and benefit most from big data. It’s good to know that at least big data is watching everyone.