Logo du blog

IDTDS Center

Don't miss the fourth industrial revolution

What is Big Data? How does it work ? What are its uses today? What is his interest ? In short, how to benefit from big data in your business?

Quantitative burnt of digital data has forced researchers to find new ways to see and analyze the world. It is about discovering new orders of magnitude regarding the capture, research, sharing, storage, analysis and presentation of data. Thus was born "Big Data". It is a concept for storing an unspeakable number of information on a digital basis. According to the archives of the digital library of the Association for Computing Machinery (or ACM) in scientific articles concerning the technological challenges to be met in order to visualize "large data sets", this designation appeared in October 1997.

L’intelligence artificielle

Big Data: data analysis

Literally, these terms mean big data, big data or even big data. They refer to a very large set of data that no conventional database or information management tool can really work. In fact, we procreate about 2.5 trillion bytes of data every day. It’s information from everywhere: messages we send to ourselves, videos we publish, climate information, GPS signals, transactional recordings of online purchases and much more. This data is called Big Data or massive volumes of data. The web giants, at the forefront of which Yahoo (but also Facebook and Google), were the very first to deploy this type of technology.

However, no precise or universal definition can be given to Big Data. Being a complex polymorphic object, its definition varies according to the communities which are interested in it as a user or service provider. A transdisciplinary approach makes it possible to apprehend the behavior of the various actors: the designers and suppliers of tools (computer scientists), the categories of users (managers, business managers, political decision-makers, researchers), health actors and users.

Big data does not derive from the rules of all technologies, it is also a dual technical system. Indeed, it brings benefits but can also generate disadvantages. Thus, it is used by speculators on the financial markets, autonomously with, at the end, the creation of hypothetical bubbles.

The arrival of Big Data is now presented by many articles as a new industrial revolution similar to the discovery of steam (early 19th century), electricity (late 19th century) and computers (late 19th century). 20th century). Others, a little more measured, qualify this phenomenon as being the last stage of the third industrial revolution, which is in fact that of "information". In any case, Big Data is considered to be a source of profound upheaval in society.

L’intelligence artificielle

The 3 v of Big Data

Invented by the web giants, Big Data presents itself as a solution designed to allow everyone to access giant databases in real time. It aims to offer a choice to classic database and analysis solutions (Business Intelligence platform in SQL server ...).

According to Gartner, this concept brings together a family of tools that respond to a triple problem called the 3V rule. These include a considerable volume of data to be processed, a large variety of information (from various sources, unstructured, organized, open, etc.), and a certain level of velocity to achieve, in other words frequency. creation, collection and sharing of this data.

The technological creations that have facilitated the advent and growth of Big Data can be broadly categorized into two families: on the one hand, storage technologies, driven particularly by the deployment of Cloud Computing. On the other hand, the arrival of adjusted processing technologies, especially the development of new databases adapted to unstructured data (Hadoop) and the development of high performance computing modes (MapReduce).

There are several solutions that can come into play to optimize processing times on giant databases, namely NoSQL databases (such as MongoDB, Cassandra or Redis), server infrastructures for the distribution of processing on nodes and data storage in memory:

The first solution makes it possible to implement storage systems considered to be more efficient than traditional SQL for mass data analysis (key / value oriented, document, column or graph).

The second is also called massively parallel processing. The Hadoop Framework is an example. This combines the HDFS distributed file system, the NoSQL HBase database and the MapReduce algorithm.

As for the last solution, it accelerates the processing time of requests.