Small (random) thoughts on Big Data

By Shriram Venkatraman, on 12 December 2012

Photo: hisperati (Creative Commons)

A casual search for the definition/description of Big Data can throw up results that define/describe this phenomenon in various ways. Though most agree on size (as the term itself implies), there are other dimensions applied to this term, that seem to be on the increase based on the nature of the industry that defines this. Definitions range from using 3V models to 4V models; single dataset to multiple datasets; single database with multiple datasets to multiple databases with multiple datasets; size of each dataset from gigabytes to exabytes (very relative); nature of each of this dataset; complexity not only in terms of types of data sources but also with respect to the relationships that these data points share; speed (or velocity) at which the data is produced, so on and so forth. Other than the dimensions of size and complexity, it looks like the definition of Big Data is as big as the data itself.

From a universal perspective, most of these definitions that speak about the size of the dataset proclaim that humanity creates 2.5 exabytes of data every day. However, one has to remember that this is only tracked data defined based on the technological storage capacity. So, what happens to the untracked data? So are these exabytes of data our data generation, or production that can be tracked by technology? Though, this will definitely grow in size as technology advances with data storage capacity, can technology reach out to every nook and corner of this world? It seems like a major portion of Big Data description is limited to the digital space alone.  Though, the definition of Big Data seems to grow in a non-linear fashion, the growth of Big Data itself seems to be linear based on its dependency on digital and or technological growth.

Data can be processed and does have the potential to turn into information, and information can be broken into data – so processing of this information is, in a way, producing more data, which is again processed to produce more information which is data again – in a way becoming a vicious cycle of production, storage and processing.

It will definitely be interesting to see what comes out of Big Data research; it might produce big definitions, bigger philosophies and biggest profits too.