How to create a Data Dictionary using ChatGPT

February 3rd, 2023
ChatGPT can combine data with natural language and has extensive information about most subjects. That lends itself to novel applications like creating informative data dictionaries.
An abstract watercolour art of a robot reading a dictionary

Let us ask ChatGPT for a public dataset we can use for this how-to.

> List public CSV datasets with links that I could use to demonstrate your ability to create a data dictionary from a CSV file.

Next, I downloaded one CSV file from the wine dataset and took a sample. ChatGPT can easily create a simple data dictionary table from it. But if we expand the question with some thought, it can make some valuable additions. For example, we can add SQL types, units of measure, descriptions expanded by ChatGPT's general know-how, and a summary for the table.

> Create a data dictionary from the wine quality dataset for the red wine quality.
Add a column for SQL data types and favour DECIMAL over FLOAT.
Add a column for the Unit of Measure.
Create description fields using your knowledge of red wine for each column with at least two sentences each.
Make them sound natural and not repetitive.
Precede the data dictionary table with a summary paragraph for data users.

The output is remarkable. Three of four columns have been added by ChatGPT using context and its knowledge base. Naturally, you would want to verify the details to ensure it fits your purpose, but it is an impressive first draft.

    Let's talk

    You have a business problem in need for data and analysis? Send us an email.

    Subscribe to updates

    Join Bold Data's email list to receive free data and updates.

Related Posts

Llamar.ai: A deep dive into the (in)feasibility of RAG with LLMs

Llama looking through wooden fence
Over four months, I created a working retrieval-augmented generation (RAG) product prototype for a sizeable potential customer using a Large-Language Model (LLM). It became a ChatGPT-like expert agent with deep, up-to-date domain knowledge and conversational skills. But I am shutting it down instead of rolling it out. Here is why, how I got there and what it means for the future.

Python TDD with ChatGPT

Being tested
Programming with ChatGPT using an iterative approach is difficult, as I have demonstrated previously. Maybe ChatGPT can benefit from Test-driven development (TDD). Could it aid LLMs as it does humans?

How to code Python with ChatGPT

Meditating robot
Can ChatGPT help you develop software in Python? Let us ask ChatGPT to write code to query AWS Athena to test if and how we can do it step-by-step.

No, ChatGPT is not 10x-ing developer performance

Bored robot
ChatGPT and similar language models have recently been gaining attention for their potential to revolutionise code generation and enhance developer productivity. I was curious to see what all the hype was about, so I decided to try it out for some development work.

OpenAI GPT-3: Content spam or more?

Robot on a typewriter in a library (DALL·E generate)
OpenAI's ChatGPT has made the news recently as a next-generation conversational agent. It has a surprising breadth which made me wonder, could OpenAI generate specific technology content good enough to post, and what would that imply for the future?

Delta Lake vs Data Lake

Photo of a beautiful lake
Should you switch your Data Lake to a Delta Lake? At first glance, Delta Lakes offer benefits and features like ACID transactions. But at what cost?

Will Tesla's AI break the insurance market?

Car accident
Insurance works because it shares costs in the face of uncertainty. What happens when Tesla removes uncertainty and distributes cost seemingly more fairly? First partially and eventually wholly? Will insurance fail, doing more harm than good?

Why I became a Solo Founder

Single lego figure walking in sand
I never wanted to be a solo founder. Yet, in 2021, I quit my job and started Bold Data to mine the Internet single-handedly. Trust me, it sounds as insane to write as to read. What on earth possessed me, and more importantly, would I do it again?

Free Amazon bestsellers datasets (May 8th 2022)

All you can eat free data
Get huge, valuable datasets with 4.9 million Amazon bestsellers for free. No payment, registration or credit card is needed.

All Blog Posts

See the full list of blog posts to read more.
Subscribe for updates, free datasets and analysis.