Small (random) thoughts on Big Data

By Shriram Venkatraman, on 12 December 2012

Photo: hisperati (Creative Commons)

A casual search for the definition/description of Big Data can throw up results that define/describe this phenomenon in various ways. Though most agree on size (as the term itself implies), there are other dimensions applied to this term, that seem to be on the increase based on the nature of the industry that defines this. Definitions range from using 3V models to 4V models; single dataset to multiple datasets; single database with multiple datasets to multiple databases with multiple datasets; size of each dataset from gigabytes to exabytes (very relative); nature of each of this dataset; complexity not only in terms of types of data sources but also with respect to the relationships that these data points share; speed (or velocity) at which the data is produced, so on and so forth. Other than the dimensions of size and complexity, it looks like the definition of Big Data is as big as the data itself.

From a universal perspective, most of these definitions that speak about the size of the dataset proclaim that humanity creates 2.5 exabytes of data every day. However, one has to remember that this is only tracked data defined based on the technological storage capacity. So, what happens to the untracked data? So are these exabytes of data our data generation, or production that can be tracked by technology? Though, this will definitely grow in size as technology advances with data storage capacity, can technology reach out to every nook and corner of this world? It seems like a major portion of Big Data description is limited to the digital space alone.  Though, the definition of Big Data seems to grow in a non-linear fashion, the growth of Big Data itself seems to be linear based on its dependency on digital and or technological growth.

Data can be processed and does have the potential to turn into information, and information can be broken into data – so processing of this information is, in a way, producing more data, which is again processed to produce more information which is data again – in a way becoming a vicious cycle of production, storage and processing.

It will definitely be interesting to see what comes out of Big Data research; it might produce big definitions, bigger philosophies and biggest profits too.

‘Big data’ or ‘Data with a soul’?

By Xin Yuan Wang, on 8 November 2012

Image: Thegreenfly (Creative Commons)

What is big data? In the digital era, the data produced by people on an everyday basis is myriad. There is always more data coming into being, and it is growing at an unimaginable rate. People believe that big data will lead to big impact, claiming that big data opens the door to a new approach to understanding people and helps to making decisions. At the 2012 World Economic Forum in Davos, Switzerland, big data was a theme topic and the report Big Data, Big Impact by the forum claimed that big data should be considered as a new class of economic asset, like currency or gold. People who are masters at harnessing the big data of the Web (online searches, posts and messages) with Internet advertising stand to make a big fortune.

I love data, so big data sounds brilliant! However I am not a ‘big fan’ of big data. Partly because, for me, big data sounds more like a marketing term rather than analytical tool; partly because, being trained as an anthropologist, I am very cautious about going too far out on a limb to make such assumptions. For me, it will be a great pity to see people who fancy formulating big data with brilliant statistics, however ignoring the little stories happen in daily life which have been taken for granted

For anthropology, to some extent story is the date with a soul, or contextualized data to be exact. There is always a danger that data without a context would be confusing and very misleading. For example, in my previous study on the appropriation of Facebook among Taiwanese students in the UK, one thing I discovered is that the Taiwanese use the function ‘like’ on Facebook much more frequently compared to UK Facebook users. For a Taiwanese who have 150-200 friends on Facebook, 20-50 ‘likes’ for each status or posting is very commonplace, and the average amount of ‘like’s’ which people give to others is 15-35 daily. Such considerable amount of ‘likes’, per se, could possibly lead me to making some superficial conclusions, for example, that Taiwanese are more predisposed to admire others online, so on and so forth. However, it was only after long-term participant-observation and several in-depth discussions with each of my informants, that I start to realize that both the Chinese normativity of proper social reaction (save face, reciprocity, renqing) and moral responsibility taken by individuals in the negotiation of real life communication practices shape the pattern of Taiwanese online performance.

 “For most of the time I ‘like’ people because I have nothing to say about their updates, but I want them to know that I care about them, I follow their lives.”

“Liking is polite, just like saying hello when you meet your friends. Nothing to do with the content which you like.”

“…I kind of think that, the more I like a certain person, the less I want to be really involved into his/her real life. ‘Like’ is easy and safe. You know you still need to give a face to people.”

Also, according to the principle of Chinese “Bao” (reciprocity), people who have been ‘liked’, will try to find all the means to pay off debts of the “Renqing” (favor) to others.

“I would expect ‘likes’ from others on Facebook, you know, which makes me more engaged with them and I will like their posts as often as I can. For those who like or leave comments on my profile, I will reply to them with careful preparation to show my sincerity.” as the other key informant said.

It’s so interesting to explore the ways in which “Being Chinese” and Facebook appropriation have been mutually constituted. Facebook is to some extent re-invented by the Taiwanese. If I just count how many ‘likes’ and analyze it without looking into the online content and offline context, I will miss the point no matter how big and sophisticated the data is.

So, the question is whether we are looking at ‘big data’ or ‘data with a soul’? Of course, these two are not necessarily mutually exclusive to each other, even though there are some things you can only do with Big Data or ethnographic data. The point is how can we take advantage of the best parts of the both and contribute to the understanding of our human society as a whole, which is also a big question mark for all the researchers in the digital age.