Python TDD with ChatGPT

February 2nd, 2023
Programming with ChatGPT using an iterative approach is difficult, as I have demonstrated previously. Maybe ChatGPT can benefit from Test-driven development (TDD). Could it aid LLMs as it does humans?
Being tested

Previously, I wrote Python code with ChatGPT starting simple and adding increasing details. That worked initially and then started to fall apart over time. Let us try a simpler example and define tests before asking ChatGPT to write code to pass the tests. It is a good practice that helps many developers.

First, I outline the behaviour and ask for the appropriate tests.

> Define a number of pytests for a function called is_asin. The function is_asin takes a string and returns a true or false. It returns true if the passed parameter passes the definition of an ASIN (Amazon Standard Identification Number). An ASIN is an alphanumeric identifier, 10 characters long and always uppercase. Write all the unit tests we need for the function is_asin.

Not bad, but two tests are wrong, and also I prefer the happy path test at the beginning.

> Remove the tests that return false for strings with only numbers or only characters. These are true and not false cases. Also move the test that returns true to the first place.

Better, let us have the unit test code.

> Write the code for the tests using pytest.

One issue is that the happy path only uses letters. We need a better test input string.

> Update the string used in test_valid_asin to use both numbers and letters. Output all the tests.

Good, now we can generate the function code.

> Write the is_asin function to pass all the tests and use type annotation.

Good try, but this is not going to pass. Let us see if ChatGPT can fix it without further details.

> Do you have any specific recommendations to improve is_asin?

The documentation is a good addition and probably something to ask for from the beginning. It should help ChatGPT to stay on track as we iterate. However, the happy path test fails when executed as expected. Let us see if we can get it fixed by being vague.

> The test test_valid_asin fails. Can you fix is_asin?

It still is failing. Let us try and give some more detail using simple language and not wordsmithing it for ChatGPT's benefit.

> The test test_valid_asin fails because is_asin should check all characters to be numbers or upper case characters. But it checks if they are both at the same time which cannot be true. Can you fix is_asin?

It still fails. Let us state a solution instead to help ChatGPT.

> The test test_valid_asin fails because is_asin should check all characters to be either an upper case letter or a number. Fix it.


The great part is that we can generate a good list of tests based on the description and some decent code to solve it. Clearly, that can be useful in the future for developers. ChatGPT also needs precise language where possible, which forces and helps with reflecting on the breakdown of the problem and its description as part of the development work.

However, the gap for ChatGPT is the lack of understanding of what it generates. Simple functions like this trip it up and need experience and the ability to understand the code in detail by the user.

As it stands, ChatGPT can help expert users with simple tasks. The opportunity for a significant productivity boost is to move both dials, i.e. to help inexpert users with complex tasks, ideally. The interesting question is if this future is one, five, ten or more years away.

Christian Prokopp, Founder

    Let's talk

    You have a business problem in need for data and analysis? Send us an email.

    Subscribe to updates

    Join Bold Data's email list to receive free data and updates.

Related Posts A deep dive into the (in)feasibility of RAG with LLMs

Llama looking through wooden fence
Over four months, I created a working retrieval-augmented generation (RAG) product prototype for a sizeable potential customer using a Large-Language Model (LLM). It became a ChatGPT-like expert agent with deep, up-to-date domain knowledge and conversational skills. But I am shutting it down instead of rolling it out. Here is why, how I got there and what it means for the future.

Javascript TDD with ChatGPT

Screen with Javascript code
Test-driven development in Javascript with ChatGPT-4 works. An example demonstrates it using a precise description and refined prompt engineering.

Deep Dive into Code with ChatGPT

Deep diver
Open Source libraries offer user documentation. But expert users and contributors have a deeper understanding of the inner workings stemming from a mental model and architecture derived from deep dives into the code. That understanding and model are helpful to employ the library more effectively, debug issues when using it, and teach interesting concepts on how to structure complex reusable code.

How to code Python with ChatGPT

Meditating robot
Can ChatGPT help you develop software in Python? Let us ask ChatGPT to write code to query AWS Athena to test if and how we can do it step-by-step.

No, ChatGPT is not 10x-ing developer performance

Bored robot
ChatGPT and similar language models have recently been gaining attention for their potential to revolutionise code generation and enhance developer productivity. I was curious to see what all the hype was about, so I decided to try it out for some development work.

Faster and Cheaper: ARM Graviton vs Intel and AMD x86 AWS EC2

Throughput graph
How Bold Data achieved an astonishing 2.3x improvement by switching from x86 to ARM.

OpenAI GPT-3: Content spam or more?

Robot on a typewriter in a library (DALL·E generate)
OpenAI's ChatGPT has made the news recently as a next-generation conversational agent. It has a surprising breadth which made me wonder, could OpenAI generate specific technology content good enough to post, and what would that imply for the future?

Delta Lake vs Data Lake

Photo of a beautiful lake
Should you switch your Data Lake to a Delta Lake? At first glance, Delta Lakes offer benefits and features like ACID transactions. But at what cost?

Why I became a Solo Founder

Single lego figure walking in sand
I never wanted to be a solo founder. Yet, in 2021, I quit my job and started Bold Data to mine the Internet single-handedly. Trust me, it sounds as insane to write as to read. What on earth possessed me, and more importantly, would I do it again?

Amazon bestsellers are big data

Your data's size matters
According to an adage, big data is anything too big for Excel, i.e. more than 1,048,576 rows. It is a bit cheek-in-tongue, but, as with many jokes, it is grounded in truth. Many business processes run on Excel to this day. That is an issue when analysing datasets like Amazon product data for valuable insight on pricing, production and supply planning, and new product or category development. Excel cannot load a single country's Amazon bestseller list. Even if you use more scalable systems, many will struggle to analyse the more comprehensive product catalogue, complex product and category relationships, or changes over time.

All Blog Posts

See the full list of blog posts to read more.
Subscribe for updates, free datasets and analysis.