Data: An important resource in the AI revolution
Data might have become the new oil of the 21st century.
Hi!
The other day I was thinking that many people still donāt know the value of data, which is surprising given that data is what makes AI possible. Companies like OpenAI have been collecting data for many years to train their models, creating the tools we all know today.
In the coming articles, Iāll show you the techniques these companies use to collect data and what you can do with this data but, first, letās see why data is so important nowadays.
The concept of data as a strategic asset has been gaining momentum in the past years, however, regular people arenāt able to see the real value in data.
We know big tech companies have been collecting data for a long time. We know that year after year new regulations about the use of data are created. That said, most of us still donāt understand the impact data has on our society.
A few years ago, The Economist published an article called āThe worldās most valuable resource is no longer oil, but data.ā However, for regular folks, itās still hard to understand how data can be the new oil.
Data and oil have some similarities, but also some differences. Here are some of them.
1. Data and oil need to beĀ refined
Data and oil are rarely used in their raw state.
If oil is unrefined, it cannot be used. For oil to be useful, it has to be extracted, refined, and distributed. The same happens with data. We donāt use the data as soon as itās extracted, but we have to process it first before itās ready for analysis.
Hereās how Clive Humby, the data science entrepreneur who coined the phrase ādata is the new oil,ā compares oil and data.
āData is the new oil. Like oil, data is valuable, but if unrefined, it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity. So, must data be broken down, analysed for it to have value.ā
This is true. Once data is collected, it needs to be cleaned and transformed to get it in the desired format. Why? Well, real-world data is messy, so there might be inaccurate or missing data that we need to deal with.
To put it simply, imagine you have collected data from a survey. You can be confident that the results obtained from the multiple-choice questions donāt need much preprocessing, but things change with the open-ended questions because people can answer whatever they want (sometimes without following a common pattern) and even leave an answer blank.
Real-world data is sometimes as messy as those open-ended questions.
This is why raw data isnāt enough. Only after the data is ārefinedā we can make the most of it by making reports, doing analysis, and creating something valuable.
2. Oil is a finite resource, while more and more data is created everyĀ day
Keep reading with a 7-day free trial
Subscribe to Artificial Corner to keep reading this post and get 7 days of free access to the full post archives.