Knowingly or unknowingly, every one of us relies on data to make a myriad of daily decisions. We compare prices of consumables in local grocery stores to maximise our savings. We check the train time table to work out when we should leave home to catch the next train. We listen to weather reports in the morning to decide if an umbrella would come handy during the day. While such humble uses of data will continue to have their place, a revolution has already begun in terms of how data is generated and used in modern societies.

Technological advancement and fast penetration allow data to be generated at a rapidly increasing rate. Global data volume reached 2.8 trillion gigabytes in 2012 with 90 percent of this data being created in the previous two years alone. At this rate, global data volume is predicted to grow 50 times by the year 2020. To put this into perspective, global per capita data volume in 2020 will roughly be 5200 gigabytes. Unfortunately, data is like crude oil, largely unusable in the original form. Valuable insights from data can only be gained by cleaning, organising and analysing. However, less than 1 percent of the world’s data has been analysed so far. This ‘big data’ is capable of providing an estimated economic value of US$ 3 trillion per year in just seven industries, including education, transportation, consumer products, electricity and healthcare.

Big data is a popular term coined to describe large volumes of data of high variety generated at a rapid pace (velocity). Perhaps, the most challenging element of big data is its variety. Data comes in a range of types and formats and is structured, semi-structured and unstructured in nature. Structured data refers to data that is neatly organised for easy storage, querying and analysis. Some examples of structured data are name, date, address and various identification numbers like a passport number. Unstructured data, on the other hand, does not follow a pre-defined model and is unorganised. In reality, however, most of today’s data falls somewhere between these two ends, hence are called semi-structured data. Good examples are text, images, audio, video and sensor logs. A text message posted on twitter or facebook is semi-structured because it is accompanied by organised metadata such as the date and time, username and location.

Now, who generates such massive volumes of data? The answer is people and machines. Every time we click on a webpage, search something on Google, make a call on our mobile phone, post something on facebook, twitter or Instagram, swipe our credit card, use navigation apps on our smartphone, we generate streams of data that get stored somewhere. The latest smart devices come packed with sensors that are relied upon by their operating system and applications to perform key actions. These include not only well-known sensors like the microphone, camera and GPS receiver, but also lesser known ones such as the accelerometer, gyroscope and environmental sensors. Some consider people armed with smart devices as the best sensors, hence the emergence of a new sensing paradigm called ‘people as sensors’. However, majority of today’s data is generated by machines as they communicate with other machines over data networks.

Recent technological advancements have made it possible to sift through mountains of data and uncover hidden intelligence. As previously stated, there are big data applications in every imaginable industry and domain. In retail sector, ‘market basket analysis’ is a popular data analysis technique used to understand consumer behaviour and aid targeted marketing. Financial organisations perform big data analysis to detect fraudulent transactions. In transportation, data can reveal how people make travel choices and use roads, thus enabling authorities to improve services. Big commercial brands use massive volumes of unstructured text posted on various online platforms to detect public sentiment towards their products.

Although largely forgotten in the presence of big data, small data is critical to extract the value of big data as it often forms the core of any analytical framework.  Such small data may include political and administrative areas of a country, list of customer identifications of a commercial entity, facebook or twitter users, and links of a road network. Each of these small data elements have massive volumes of data associated with them. Through big data analysis, in fact, we attempt to draw inferences about things that are finite. In this series, we will discuss how big and small data can be used to generate economic value, improve liveability and safeguard the environment in a number of domains including transportation, health, education, utilities and retail industry. Finally, we will discuss technological, institutional and legal frameworks that should be put in place to extract value out of data.

(Dr. Rohan Wickramasuriya is a Research Group Leader at the  SMART Infrastructure Facility, University of Wollongong, Australia)