Harnessing the Power and Promise of Data Responsibly in the AI Age
Q&A with Joseph Watson and Barbara C. Matthews, founder and CEO of BCMStrategy, Inc.
From Silicon Valley to Wall Street and businesses across the country, artificial intelligence (AI) is fundamentally transforming corporate America. Across a multitude of industries, businesses have an insatiable need for high-quality, usable data—whether accessed proprietarily or via third-party sources—to fuel large language models (LLMs) that train AI on pattern recognition.
AI is expected to add $15.7 trillion to the global economy by 2030—more than the current output of China and India—and will have a major impact on sectors such as healthcare, financial services, communications, and manufacturing. But with great opportunity comes challenges. How will companies safeguard sensitive information and provide transparency in outputs? Can we trust unsupervised machines to provide answers to monumental questions, especially those that affect human lives and livelihoods?
Solomon Partners Managing Director Joseph Watson, who focuses on Data, Analytics, and Information Services, recently hosted a Q&A with Barbara C. Matthews, a nonresident senior fellow at the Atlantic Council’s GeoEconomic Center, a globally recognized public policy and quantitative finance leader, and founder and CEO of BCMStrategy, Inc., a data company using patented language technology to measure public policy volatility and anticipate public policy trajectories.
What might divergent policy approaches in the US vs. EU mean for AI-related applications and development?
BARBARA: In the near term, I don’t expect it to have a material impact because the American market is moving forward so fast that the groundswell in corporate America is only just beginning to pivot towards adoption at scale domestically.
We’re at the front end of the growth cycle here, so the business case is driving technological capacity much more so than policy.
On both sides of the Atlantic, the biggest constraint is access to quality data. Vertical integration or partnerships will become a necessity for many AI companies seeking some measure of quality control regarding the data inputs at the granular level.
From the third-party perspective, is there friction between companies and AI vendors where businesses are hesitant to provide access to their systems?
BARBARA: Great question. Yes. I think having dedicated, partitioned servers or providing on-premises delivery is the way to go, especially in terms of Generative AI because the data isn’t merely the corpus of documents that are used as data inputs but also the queries that users formulate.
With Generative AI, those queries create the risk of data leakage. The closer companies get to deploying AI solutions using very sensitive personal information about finances or health as inputs, the stakes become even higher.
Fortunately, the necessary infrastructure is well-established to deliver within dedicated servers. In addition to the server issues, most companies also understand the importance of protecting both their data and their exposure to potential liability through inadvertent copyright violations. AI buyers and data buyers are becoming savvy about asking how sellers are protecting against IP theft/data leakage on the output side and copyrighted material on the input side.
Which segments relating to AI development/use cases do you see accruing the greatest value over time?
BARBARA: My view is biased because, as a data and analytics company, I’m chasing a specific value proposition. But I think the capacity to have a conversation with your data—to have an automated research assistant—for example, is probably one of the most compelling use cases for any kind of corporation.
Think about the knowledge gains. It doesn’t replace people; it just makes a human faster and smarter, enabling experts to connect the dots at a higher level. How people do business when backed by such AI agents will be transformative.
We’re already seeing it in the marketing space, but I think it will become more evident around business processes—how people do their analytical jobs internally and even in client-facing roles.
Who makes the money? Is it the assistant, like an OpenAI or a broad language model, or is it a specific vertical use case, or the hardware provider who makes the chips and processing power? Or is it all of the above?
BARBARA: Clearly, the hardware companies—both servers and chips—will remain highly profitable because they are essential to the computing process.
Small (not large) language models trained on a specialized corpus of content will also support a broad range of use cases that extend well past the current marketing context. We believe that Generative AI companies that focus on targeted domain-specific inputs will deliver extraordinary value (and generate extraordinary revenue) by delivering automated research agents that support most of the analytical job functions currently performed by research assistants, equity analysts, and even chief economists.
This means that the language model market will now begin to focus more on the quality of the language inputs used to train the models. The first generation of LLM companies trained their agents on all available language, which is very general and not as useful for individual use cases inside firms, especially with higher-level analytics such as financial analysis. Data companies that have taken the time to structure and even tokenize domain-specific language data will also profit handsomely by delivering to AI companies the kind of precision inputs required for fine-tuning language models with a high degree of precision and integrity.
This is not always glamorous work. Language training data must be structured and enriched in a way that is respectful, honest, and transparent. It takes time. It takes resources, but it’s crucial to delivering reliable outputs. This level of data preparation also requires something that Silicon Valley does not have at scale: in-house domain experts.
I think we’re at the front end of exploring what kinds of business combinations can make sense for different use cases.
What’s the general perception of Gen AI and Large Language Models? Are your clients and the people you speak to still as enthusiastic? Has it taken longer than they thought it would, or did they know this was going to be a long road?
BARBARA: You’re 100% right. It’s slow going. I divide the world into pre-OpenAI and post-OpenAI for a very specific reason. Prior to the OpenAI release in late 2023, companies were in an exploratory phase. Few were committing resources to deploy any AI (much less Generative AI) at scale.
Much has changed in 18 months. The pace and scale of internal deployments has accelerated considerably. Curiosity and caution have been replaced with initial deployments at scale within a handful of verticals predominantly in areas that do not involve sensitive customer information, such as sales and marketing.
Financial firms like Goldmans Sachs and JPMorgan have also found deployment of internal Generative AI research assistants as a fast track towards implementation by focusing on internal deployments on dedicated servers that minimize data leakage. Investment analysts are finding a fast track for analyzing earnings transcripts by having Generative AI solutions comb through the language and deliver answers as well as automated summaries.
Having said this, we are still at the beginning of a long growth cycle. Generating automated summaries with footnotes and internal guardrails architected to minimize hallucinations and spurious leaps of logic requires attention to detail on the data input side, which means the road may be longer than many would like. We find customers are very enthusiastic about exploring use cases beyond document summarization. I personally believe that the “killer app” pairs predictive analytics regarding, say, corporate earnings or public policy decisions and Generative AI outputs so that users can query both quantitative and language data. Data vendors that understand how to assemble training data packages that help customers bridge the divide between verbal logic and mathematical logic will make it easy for customers to accelerate along their AI journey.
Can you discuss some of the ethical concerns that leaders in AI development and regulators must consider?
BARBARA: On the input side, respecting copyright matters. Training a model on a corpus of language when you didn’t purchase the data-mining license—that’s not just an ethical issue, it’s a legal one.
I make no bones about siding with The New York Times and Dow Jones on this. In fact, at the beginning, although it was terribly expensive, we had data-mining licenses because I believe it’s massively important to have integrity at every step of the process, and that starts with the inputs. So, integrity and honesty.
It is also important for firms to deliver integrity during the training and delivery process. It’s tempting to want to throw spaghetti at the wall to see what sticks. Many have blind faith that because the AI is so good, it will ultimately give you the right answer. And it probably will, but in the meantime, it burns compute resources for your customer. It’s not exactly unethical, but it’s not right. And so, having the integrity to do the hard work at the front end and make sure the data is structured properly—that matters a lot.
Last but certainly not least, we have to have integrity on the output side. The issue is not just about controlling for hallucination. If we are going to trust unsupervised machines to deliver answers in use cases that involve human life, public policy, and financial savings, the tolerance for mistakes must be zero. I know that’s a high bar.
That’s a good segue. What worries you about AI’s potential to disrupt?
BARBARA: The nefarious use cases are legendary, but I don’t think they’re unrealistic: generating disinformation campaigns at scale. AI-powered fabrications make it impossible for people to make data-driven decisions. News reports additionally suggest that some AI applications incorporate feedback loops that turn the AI engine into a data-gathering machine for the vendor to acquire information about sensitive information that individuals and firms would not ordinarily share with third parties, much less foreign governments.
Beyond the nefarious, AI’s disruptive capabilities can encourage a level of logical laziness as people outsource reasoning and research to a machine.
Finally, it is important to note that disruption can be a good thing. It forces people to find new and better ways of solving problems.
How did you organize your business to act ethically and responsibly when using AI?
BARBARA: We believe that board and executive leadership are indispensable. In our case, when we created our board of directors, our first step was to adopt a set of board resolutions that commit to having integrity and respecting both copyrights and customer data. We made the commitment from day one that we would never share without a court order or sell our customer data. We also made the commitment from day one that we would pay for copyrighted language inputs, even though it represented a significant commitment of capital.
The closer you get to data that moves markets and changes people’s lives, the more guardrails must be put into place, and the more you need company leaders to commit to integrity at each step in the process. For use cases in healthcare and finance, for example, adopting a growth-at-any-cost approach is equivalent to flinging spaghetti at the wall. It creates risks that the AI output is somehow tainted, problematic, incorrect, or false. Integrity must be non-negotiable for anyone in this business, so this means having a rigorous internal process to validate both model outputs as well as model inputs. The slower development cycle will translate into more reliable outputs that markets can trust from the beginning.
Thank you, Barbara, for this insightful conversation. I hope we can continue the discussion as the landscape continues to evolve at lightning speed.
Absolutely, Joe. Thank you for this opportunity, and I would love to revisit these topics down the road.