What is Big Data, and Why is it Called the “New Oil.”
Big data are structured or unstructured large-volume data sets. They are processed using special automated tools for statistics, analysis, forecasts, and decision-making.
Nature editor Clifford Lynch proposed the term “big data” in a special issue in 2008. He spoke about the explosive growth of information worldwide. Lynch classified big data as an array of heterogeneous data more significant than 150 GB per day, but a single criterion still does not exist.
Until 2011, extensive data analysis was carried out only within the framework of scientific and statistical research. However, by the beginning of 2012, the volume of data had grown to enormous proportions, and the need arose for their systematization and practical application.
Since 2014, the world’s leading universities have paid attention to Big Data, where they teach applied engineering and IT specialties. Then IT corporations, such as Microsoft, IBM, Oracle, EMC, Google, Apple, Facebook (since March 21, 2022, the social network has been banned in Russia by a court decision), and Amazon joined the collection and analysis. Today, big data is used by large companies in all industries, as well as government agencies. Read more about this in the material “Who collects big data and why?”
What are the characteristics of Big Data?
The Meta Group company proposed the main characteristics of big data.
Volume: 150 GB or more of data per day ;
Velocity is the speed of accumulation and processing of data sets. Big data is updated regularly, so intelligent technologies are needed to process it online;
Variety—there are various data types. Data can be structured, unstructured, or partially structured. For example, in social networks, the data flow is not structured: it can be text posts, photos, or videos.
Today, three more signs are added to these three:
Veracity is the degree to which the data set and the analysis it produced are trustworthy;
Variability – variability. Data flows have their peaks and valleys, influenced by seasons or social events. The more unstable the data stream, the more difficult it is to analyze;
Value—value or significance. Like any information, big data can be simple or complex to perceive and analyze. An example of simple data is posted on social networks, while complex data is banking transactions.
How big data is gathered and stored is a critical component of its operation.
Big data is required to evaluate all pertinent information and reach the best conclusion. It is also used to build simulation models to test specific ideas, solutions, or products.
Primary sources of big data:
- Internet of Things (IoT) and devices connected to it;
- Social networks, blogs, and media;
- Company Data: transactions, orders of goods and services, taxi and car sharing trips, customer profiles;
- Instrument Readings: meteorological stations, air and water composition meters, satellite data;
- Statistics of cities and states: data on movements, birth rates, and deaths;
- Medical Data: tests, diseases, diagnostic images.
Modern computing systems provide instant access to large data sets, which are stored in particular data centers with the most powerful servers.
In addition to traditional physical servers, they use cloud storage, “data lakes” (storage of large amounts of unstructured data from a single source), and Hadoop, a framework consisting of a set of utilities for developing and executing distributed computing programs. To work with Big Data, they use advanced methods of integration and management and prepare data for analytics.