Data is the New Oil, Again…

Posted on April 2, 2023

The phrase ‘Data is the new oil’ has been around for a while (2006?) and like all analogies it can be pushed too far, but it is fair to say that data has been increasing in value as a commodity.

Like oil, data varies in quality and has to be extracted and refined to convert it from zeros and ones into information, insights and wisdom. Data has some value as a raw material, but a lot more value once processed.

Unlike oil, data is plentiful, even ubiquitous. Data can be transported instantaneously. It can be replicated and copied and its ownership is harder to determine.

Several years ago, the term ‘Big Data’ was as trendy as the term artificial intelligence (AI) is now. Many companies, who were running global ecommerce websites or processing large amounts of data from IoT sensors, had more data than they knew what to do with. It sat in massive storage facilities. Data lakes became data oceans, but in many cases it just sat there.

Now, with the rise of the latest generation Large Language Model AIs, those vast data reserves could become as sought after as ‘black gold’ and even more valuable.

The most valuable commodity I know of is information.
Gordon Gekko, Wall Street 1987

What the fictional corporate raider Gordon Gekko was actually saying in the movie Wall Street was that the most valuable commodity is asymmetric information.

Asymmetric information is where some participants in the market have more (or better) information than others. Traditionally, in regulated markets like the Stock Exchange, certain kinds of information were restricted. ‘Inside information’ is a form of asymmetric information can lead to market distortions. Acting on inside or privileged information’ is a crime in some jurisdictions.

The usefulness of the information generated by AI models relies heavily on the amount and quality of data that is fed into it. In some cases, for the AI to come up with truly useful information they will need to know specifics.

And so, the companies that have spent the last couple of decades collecting and storing data, may be sitting on a very valuable commodity indeed, especially if that data cannot be sourced anywhere else and has been tagged and structured in a way to enable the AI to talk about specifics rather than generic patterns.

Circling back to the idea of asymmetric information within financial markets – Bloomberg has announced their own AI product with one of the largest domain-specific datasets yet. Bloomberg’s data analysts have collected and maintained financial language documents over the span of forty years. The team pulled from this extensive archive of financial data to create a comprehensive 363 billion token dataset.

This data was augmented with a 345 billion token public dataset to create a large training corpus with over 700 billion tokens. Using a portion of this training corpus, the team trained a 50-billion parameter decoder-only causal language model.

Which raises some interesting questions. Where does privileged information end and insight gleaned from an AI begin? Should you keep your data and build your own AI or sell your data, where it might be useful for the public good? Do you have the right to sell the data you have?

Will information in markets be more prefect or asymetrical with AI?

As my old economics professor used to say – It depends. AI has the potential to make market information more perfect and reduce information asymmetry between market participants.

AI-powered systems can identify patterns, trends, and insights that might be difficult for humans to discern. AI can reveal hidden information in markets with the potential of making it more widely available to all participants.
AI could help to reduce the impact of human emotions on market information. These emotions can lead to bias and irrational behaviour and can distort market information, creating information asymmetry. AU could be used to provide objective and unbiased information to all participants.
However, AI can also create new information asymmetries, particularly if certain market participants have access to superior AI technology or data sources like the Bloomberg system.
The effectiveness of AI in reducing information asymmetry will depend on the quality and accuracy of the data used to train these systems, which will in turn be determined by whether organsisations share data for the benefit of the commonwealth or withhold their proprietary information.

What is true now and has always been true is that informed decisions are better than ignorant ones.