Data Becomes the New Oil: Businesses Have Opportunity to Monetize Information as Fuel for AI Models
Data Becomes the New Oil: Businesses Have a Unique Opportunity to Monetize Content and Information as Fuel for AI Models
by Solomon Partners Technology Group
Over the past 10 years, data has become currency as companies across many verticals mine for information to help them solve previously intractable problems and gain greater efficiency.
Most recently, the hunger for usable data has increased exponentially as businesses aim to harness the power and potential of artificial intelligence. As fuel for AI language models, data is an essential tool for critical decision-making. Without this high-quality information, businesses on the frontlines of AI will struggle to overcome the hurdles of low-quality data sets, and any applications and related economic value created through generative AI will be constrained.
In a sign that businesses are committed to utilizing high-value data assets, Google and Reddit recently inked a $60 million landmark deal that will likely set the stage for further partnerships. This model builds on another pioneering partnership. Late last year, OpenAI agreed to pay Axel Springer—publisher of Politico, Business Insider, Bild, and Welt—to train its AI models on the company’s news stories.
As part of the deal, Google will use Reddit’s online discussion forums to train its AI model. This partnership will help Google differentiate its products from competitors such as Microsoft, which has a major investment in OpenAI, the creator of ChatGPT.
Though the Google-Reddit deal is relatively modest by Silicon Valley standards, it is nevertheless important because it sets the groundwork for what will likely become fertile terrain: businesses partnering with social media companies and other businesses that have large troves of information in order to obtain proprietary data for their AI models.
Businesses increasingly recognize that they can’t always create effective language models simply by using publicly available data. Public data may be unreliable, and companies that rely on scraping websites for information leave themselves vulnerable to copyright infringement and other potential litigation. The New York Times is even testing the waters with a landmark federal lawsuit filed in December contending that OpenAI and Microsoft used millions of its news stories to train ChatGPT, and asking for “billions of dollars in statutory and actual damages.”
This suit should motivate enterprises to invest in exclusive or proprietary data sets that will allow them to differentiate themselves and their products from competitors. We believe that continued interest and investment in data will be immense in the years ahead, not just for training chatbots but particularly in highly regulated industries such as financial services and medical diagnostics that necessitate critical decision-making. As AI technology advances, businesses are finding various use cases for the technology, including streamlining processes, optimizing costs, eliminating repetitive tasks prone to human error, and improving customer service, among others.
When trained on high-quality data sets, AI has the potential to outperform humans. Think about how artificial intelligence and machine learning can be used in a field such as medical diagnostics. In a typical clinical setting, radiologists have an approximate 4% miss rate. Out of one billion radiology examinations performed annually worldwide, that translates into roughly 40 million errors. However, an AI data set trained on thousands of high-quality images can pick up early-stage molecular cell changes that a physician might miss. When you combine AI technology with human oversight, it far exceeds what a physician can do alone. Additionally, it offers scale, democratizing access to the highest-quality medical support and improving human outcomes across the globe.
Of course, buyers and sellers both have a lot to gain through these types of content-sharing partnerships. Social media companies such as Reddit can monetize their businesses by tapping into new revenue streams in the form of latent data sets while also touting the validation that monetization brings to their content. Not surprisingly, one day before it went public with the Google deal, Reddit announced plans for its initial public offering.
Though the focus is now squarely on generative AI, large organizations have long seen the value in owning companies that gather data. Microsoft purchased LinkedIn in 2016 for $26 billion; Google bought YouTube in 2006 for $1.65 billion. At the time, neither company was seen as a data business, but the incoming owners saw value in owning large troves of proprietary data assets that could be monetized for years to come.
We believe that B2B and consumer enterprises will continue seeking partnerships with companies to obtain access to their data sets. Buyers and sellers should think creatively and strategically about these broader arrangements—for example, considering the value of a proprietary data-sharing deal, which would provide the buyer with exclusive access but limit the seller’s ability to further monetize, thus demanding a higher price point.
Whether a business acquires a company outright or purchases access rights to information, the value of high-quality data will only increase in the coming years. Companies that develop a strategy to harness this resource, either as buyers building differentiated, higher-quality AI products or as sellers monetizing internally created data sets, will be poised to ride this wave and drive rapid revenue and enterprise value growth.