What the FTC’s Investigation Into ChatGPT Means for Data Security

Written by Tim Pullan | Jul 18, 2023 1:28:04 PM

In the many recent conversations I have had with AI entrepreneurs, vendors, consultants, and other industry insiders, the phrase ‘the genie’s out of the bottle’ features most heavily. In a broad conclusion, it’s right. No amount of regulatory action, litigation, or congressional investigations will take us back to the world before 31 October 2022. Through the awesome work of Sam Altman’s smart team and many others, we have discovered what happens when you compress the entire Internet using the latest model technology and then polish it with a huge amount of fine-tuning. The tech works, it’s needed for the next wave of productivity improvements, and usage is only going to proliferate.

But that’s not to say the coming legal and business landscape is going to be easy for the owners and distributors of large models. There are significant legal and ethical principles regarding content rights and personal information rights with these new tools which have yet to be explored and resolved. For example, is it ok that a foundational model creator copied a user’s valuable copyright material to train an LLM, but now prevents that same user from using the LLM to train another model under license restrictions? For those of us old enough to remember, this will be akin to the furious rights-based litigation and legislative bouts at the birth of the dot-com era. After many rounds, the clear knock-out winners were the internet-based operators. But what helped them in that fight was the nature of the Internet at that time with its non-profit governance and democratic principles such as net neutrality. The Internet was generally regarded as a force for human good, even as it swept away many old industries. For-profit foundational model owners may struggle to elicit similar sympathies from courts and legislators.

Another factor in this coming fight is that the world has begun to discover that generative AI is not the technology to end all other technologies, much less an end to humans. The hype of early summer 2023 has given way to a more sober assessment. It is another powerful tool for the modern technology stack, great at some things, but not so great at others. At a high level, foundational LLMs use similar technology as the domain-specific LLMs that ThoughtRiver and others have developed for specific industries and use cases. The big difference is the quantity and breadth of data used. We noticed early on that the foundational models struggle to compete on accuracy with our own LLMs with their highly focused and curated datasets when it comes to legal interpretation. The FTC probe is another reminder that AI output, when not conducted in a controlled environment, is not and I applaud Sam Altman for coming to the forefront of the conversation and pushing for regulation back in May. The fact of the matter is that accuracy can only be achieved with AI when you control the dataset - and when that dataset is the entire internet it is impossible to achieve accuracy.

However, 100% accuracy on sophisticated accuracy-dependent tasks such as contract review is also not achieved 100% of the time by humans or computers. In fact, research conducted by ThoughtRiver reveals that lawyers typically operate at around 88% accuracy in contract review tasks, a finding supported by other studies.

The FTC probe is bringing to light the need for governance and how humans and AI need to work together. AI is not here to replace humans but to accelerate their capabilities. This is yet another reminder of this.

View full post