OpenAI GPT-3: Content spam or more?

December 4th, 2022
OpenAI's ChatGPT has made the news recently as a next-generation conversational agent. It has a surprising breadth which made me wonder, could OpenAI generate specific technology content good enough to post, and what would that imply for the future?
Robot on a typewriter in a library (DALL·E generate)

First, I tried the research preview of ChatGPT.

Me: What is AWS Athena?

ChatGPT: Amazon Athena is a serverless query service that allows you to analyze data in Amazon S3 using SQL. Athena is easy to use, and you can get started with it quickly. It allows you to query data stored in Amazon S3 using SQL, and can be used to analyze a wide variety of data such as logs, clickstream data, and IoT data. With Athena, there is no need to set up, manage, or scale a data warehouse, which makes it a cost-effective and convenient option for querying data stored in S3.

Nice. Immediately, ChatGPT could be useful querying documentation or even the internet (hey Google) in a better way. But how about something more complex, like writing technical content? If you read Medium or other sources, you know that there is a wide range of human-written content, from the novice and incomprehensible to simple tutorials and analysis to deep, insightful technical domain expertise.

Last week, I wrote about AWS Athena, and in the past, teams I worked with discussed the pros and cons between Athena and Redshift. They are different tools for different purposes but have some overlap. It is a particular technical discussion. Surely, a general ML text model will be unable to generate anything meaningful about the topic.

Enter OpenAI 

I used the OpenAI GTP-3 (text-davinci-003) website's beta and prompted it to write a blog post about the subject.

Prompt: A blog post on AWS Athena and why it is better than AWS Redshift.

OpenAI content: As businesses increasingly move their data and tracking systems to the cloud, there is an increasing need for powerful and efficient ways of querying this stored data. This is where AWS Athena comes in.

AWS Athena is a query service built on top of Amazon’s cloud infrastructure that allows users to easily query and analyze data stored in the cloud in a variety of different formats. With Athena, users can run complex queries against their data without having to manage a separate cluster or rely on managing their own hardware.

One of the primary advantages that sets Athena apart from other cloud query services is its ability to quickly query data stored in the cloud with low latency. Athena queries are able to execute in a matter of seconds, which makes it much faster than other services like AWS Redshift, which can take minutes or even hours to execute. This means that users can quickly access the data they need without having to wait for long periods of time.

Another benefit is that Athena provides users with a wide range of data sources that can be queried, including Amazon S3, Amazon RDS, and other cloud storage services. This makes it much easier to query data from multiple sources in one place.

Finally, Athena is also much cheaper to use than AWS Redshift. With Athena, users only pay for the queries they run, allowing them to save significant amounts of money. By contrast, with Redshift, users have to pay for the cluster as well as the queries, which can add up to significant costs.

In conclusion, AWS Athena provides businesses with a powerful and efficient way of querying data stored in the cloud in a variety of different formats. With its ability to execute query quickly with low latency, support for a range of data sources, and lower costs than AWS Redshift, Athena is a great choice for businesses looking to efficiently query and analyze their cloud data.

Impressive. Some details are either inaccurate or depend on circumstance, but this was mostly correct and had some valuable insight. It is as good (or bad) as some posts I have read thrown together by the odd human, but also, it is far from what an eloquent domain expert could write. It does not come across as artificial or nonsensical, importantly.

And the reverse?

As I wrote, it is an open argument, so let's see what happens if we favour Redshift in the prompt as the better solution.

Prompt: A blog post about AWS Redshift and why it is better than AWS Athena.

OpenAI content: AWS Redshift is a cloud-based data warehouse service that is part of Amazon's larger suite of cloud-based products. It provides businesses with a scalable, secure, and cost-effective way to store and analyze their data. It offers features such as columnar storage, automatic compression, advanced query optimization, and much more.

Compared to AWS Athena, Redshift offers a number of advantages that make it a better choice for businesses looking to store and analyze their data.

First, Redshift is much faster than Athena. With Redshift, it is easy to quickly analyze large amounts of data. It offers advanced query optimization, allowing queries to be processed faster and more efficiently. Additionally, Redshift offers accelerated databases, which can increase query performance even further.

Second, Redshift is more secure than Athena. It offers improved security measures such as encryption at rest and in transit, role-based access control, and data auditing. With these features, businesses can rest assured that their data is safe and protected.

Third, Redshift offers much more scalability than Athena. Redshift's elasticity and scalability allows businesses to easily scale up or down their data analysis needs as the business grows. This makes it easier to manage and control costs associated with storing and analyzing data.

Finally, Redshift is much more cost-effective than Athena. It offers businesses flexible pricing options, making it easy to select the best plan for their particular needs.

We can see an inconsistency in the argument, e.g. both scenarios claim to be the faster and cheaper one. But you could argue with enough know-how and given the right scenarios either way. Is OpenAI that smart, or is it simply repeating common unsubstantiated claims from mediocre articles it learned? I doubt it is that clever. Still, the point is that it is good enough that you can argue about it and not immediately dismiss it as false or artificial.

AI content: Spam or Gold?

Sticking with OpenAI, I created the image of the robot with the typewriter at the beginning using Dall-E. Here is the kicker, the two blog posts and image generation cost only 20 cents and minutes of machine time. How much money and time would it cost to hire a mediocre technical writer and artist to create equivalent content?

Highly qualified humans can still create much more insightful and correct content. However, things are about to change on the lower end of content creation, which is about the art of good enough, cost and scale. And where that lower end begins will move rapidly with the quality of the machine learning models improving.

As a pessimist, I imagine the advent of a new wave of low-cost spam content indistinguishable from mediocre human content regurgitating existing content learned in new variations.

As an optimist, I imagine the advent of new content generation that synthesises unique insight from multiple data sources or more accessible information like translating documentation into compelling articles and posts.

Whatever the future holds, visual and text content generation will change dramatically.

Christian Prokopp, Bold Data, Founder

    Let's talk

    You have a business problem in need for data and analysis? Send us an email.

    Subscribe to updates

    Join Bold Data's email list to receive free data and updates.

Related Posts A deep dive into the (in)feasibility of RAG with LLMs

Llama looking through wooden fence
Over four months, I created a working retrieval-augmented generation (RAG) product prototype for a sizeable potential customer using a Large-Language Model (LLM). It became a ChatGPT-like expert agent with deep, up-to-date domain knowledge and conversational skills. But I am shutting it down instead of rolling it out. Here is why, how I got there and what it means for the future.

Is Athena Spark a Delta Lake alternative to Databricks?

Morning on a lake
Finally. AWS re:Invent 2022 brought the answer to both Databricks and Athena's worst limitations. Athena Spark promises to bring Delta Lake scale-out processing effortlessly and inexpensively.

Delta Lake vs Data Lake

Photo of a beautiful lake
Should you switch your Data Lake to a Delta Lake? At first glance, Delta Lakes offer benefits and features like ACID transactions. But at what cost?

One simple thing companies miss about their data

When you can't see the wood for the trees
There is one simple thing most companies miss about their data. It has been instrumental in my work as a data professional ever since.

Why I became a Solo Founder

Single lego figure walking in sand
I never wanted to be a solo founder. Yet, in 2021, I quit my job and started Bold Data to mine the Internet single-handedly. Trust me, it sounds as insane to write as to read. What on earth possessed me, and more importantly, would I do it again?

All Blog Posts

See the full list of blog posts to read more.
Subscribe for updates, free datasets and analysis.