OpenAI Custom GPTs: A Content Backdoor to Omniscience?

By Christian Prokopp on 2023-11-09

Today, I received access to the new custom GPT feature on ChatGPT, and it appears to do what Sam Altman demonstrated. The implications are far-reaching, beyond the death of the RAG business model. How can OpenAI achieve and capitalise on Omniscience in a world of organisational data silos and increasingly defensive content creators and legislators?

GPTs, mini RAGs, Omniscience, Data Silos and Monopoly

GPTs are mini RAGs

Like my competitors, I spent the summer building an LLM-based agent that could execute actions and answer questions with current, in-depth knowledge using Retrieval Augmented Generation (RAG) from data sources like documentation, user forums, blog posts, etc. I stopped the development because the value proposition did not (yet) align with the cost.

This week, OpenAI killed the basic business model by providing the ability to build custom GPTs with the option to upload knowledge in the form of files and add custom APIs. You can define GPT or agent behaviour with free text, which OpenAI creates through a setup chat with ChatGPT. It is a smooth user experience and an insight into future user interactions for setup and configurations. As I outlined recently, the basic RAG business idea is dying before it takes off, and startups must find niches, provide services around the new feature, or pivot further to survive. We are only a stone-throw away from companies and users uploading more complex data into GPT or being able to point it at their website as a source receiving a custom RAG app with minimal effort.

Omniscience through a Content Backdoor

OpenAI wants to be Apple and Facebook, not Blackberry and Myspace. Like the iPhone, OpenAI has a lead with ChatGPT, but it will not last. So what is a business to do? Build a platform and marketplace, create an inescapable network effect and then move the value from the user to the platform provider as so many did before (see Cory Doctorow's Ensh_ttification).

However, content providers, from news agencies to popular websites and individual artists, are starting to take note and want a piece of the pie. Legislation in various regions is slowly adding teeth and scoring small victories, for example, Apple and USB-C. OpenAI might find itself locked out from valuable content that could help its goal of Omniscience. Of course, some individual agreements with leading content sources can be made. But what about the long tail of data and content? And what about the locked-away data silos of corporations and the considerable value in ERPs, CRMs, files, Slack channels, etc?

OpenAI may have opened a backdoor to some of it. Pay attention when you try out the custom GPT feature in ChatGPT. In the advanced settings of the custom GPT you build, hidden under a fold-out, and by default enabled is Use conversation data in your GPT to improve our models. An odd formulation that implies value added to your GPT but means that OpenAI can use your conversational data to train their GPTs. It allows them to learn the content you upload through user interactions and bypass various technical and legal hurdles. OpenAI allows GPT makers and users to opt-out, but the default is opt-in, and that alone will capture most of the data.

This reminds me of Amazon's marketplace. Where millions of people sell products, and Amazon gets to mine the sales data. All the marketplace sellers provide a direct revenue stream and act as canaries to identify valuable products and trends, risking their money and time. When a product is successful, Amazon Basic can source and sell it knowing the pricing and popularity, displacing marketplace sellers.

OpenAI can employ a similar approach. An army of GPT providers can provide hard-to-reach content and long-tail data without risk and cost to OpenAI. The popularity of custom GPTs will indicate valuable sources and teach ChatGPT unknown or otherwise unreachable data.

Data Silos

Next are companies and their hard-to-reach data. There are two limitations. Firstly, integration is a considerable hurdle. For example, how do you plug your 1980s ERP mainframe into ChatGPT? Or even more mundane, you have Confluence, but it is on-premise, and your IT department lacks the time or skills to connect it to ChatGPT. If you want to do this and how is another question. But there is a good chance that FOMO (Fear Of Missing Out) and the shiny new toy effect will drive a lot of adoption before it is clear if it makes sense.

The second limitation is legal, for example, data privacy and intellectual property issues. We were worried about engineers pasting code and executives pasting company information into ChatGPT to make their lives easier. Imagine connecting your data silos directly to it. But what if OpenAI promises to refrain from using the information for training and white-labels the effort? What if Microsoft co-locates ChatGPT LLMs in your Azure cloud and promises not to leak data out of your environments? What if it can tap into the increasing omniscient knowledge base, make you more money, and let you call your company AI-enabled in your next earnings call? OpenAI is already rolling out internal-only GPTs for Enterprises.

Lastly, the drive to GPT applications will be based on company and customer data and cleverer agents. These should be more sophisticated than current GPTs, which are trivial in some respects. Suppose OpenAI can build an intelligent reasoning framework and more complex but easy-to-use process designers. In that case, some apps may transition or be powered by OpenAI agents and ChatGPT, and new and novel ones may emerge. Zapier's integration of ChatGPT gives a first sense of that potential future. That could be OpenIA's Play or App Store moment after ChatGPT was its iPhone moment; aptly, they named it GPT Store.

Monopoly and Ensh_ttification

If OpenAI is successful, it can keep private users on its platform, create an army of GPT makers sifting through the Internet and all kinds of data sources for kernels of new data and value, and tap into corporations. That alone will unlikely be enough of a network effect, and the switching cost will be low, assuming comparable offerings emerge from Amazon, Meta, Google or others. Of course, there is a slight chance that ChatGPT will become the Google of LLMs. Competitors can not differentiate themselves enough, with users gravitating to the known and tried product until it becomes synonymous with LLM interaction.

The more defensible position is probably the application route. If OpenAI can translate its lead in the dataset, model size and quality into a platform with usable applications, it could capture the market. In that case, network effect and switching costs would be high. Building and releasing an app next to all the other apps and where users are makes sense. Users gravitate increasingly to it, and it becomes self-reinforcing. At this point, OpenAI can extract more value through pricing, advertising or other means.

Conclusion

Most of the above is speculation. Even if correct, OpenAI is not operating in a vacuum. What will the other players do that could impact their success? Interestingly, there is yet to be a clear competitor. Amazon is likely to compete on the infrastructure and cloud computing aspect. Meta might create point solutions for its products and attack OpenAI with more open-source models. Google needs help building enticing end-user products, leaving Microsoft as a potential surprising in-house competitor. But in this fast-moving market, nothing is inevitable, and it will be fascinating to see the following months and years play out.

Christian Prokopp, PhD, is an experienced data and AI advisor and founder who has worked with Cloud Computing, Data and AI for decades, from hands-on engineering in startups to senior executive positions in global corporations. You can contact him at christian@bolddata.biz for inquiries.