Bing Chat argues and lies when it gets code wrong

February 11th, 2023
Microsoft could follow Google's $100bn loss. I tried the new Bing Chat (ChatGPT) feature, which was great until it went disastrously wrong. It even started arguing with me while being wrong and making source code up.
Pinocchio

To try Bing Chat, you must get on a waiting list, and then you are forced to use Edge. On a positive note, the experience is better than ChatGPT's version. Bing Chat is snappier, shows what is searched in the backend, and gives suggestions on continuing the conversation and the occasional references supporting its code and claims. Or so I thought.

I asked it to code a class to query Athena with Python. It looked good at first, but things went bad when I asked it to stream results into a feather file. In particular, it used an 'append' flag with the 'pyarrow.feather.write_feather' method, which is not in the documentation and the references it produced. One reference it produced used 'fastparquet.write()', which has an append flag. It may have confused the two.

When I gave Bing Chat a chance to correct itself, I was surprised when it wrote, "I am not wrong.", and continued with source code to prove its point only to prove itself wrong unknowingly. To top it off, when I asked to show me where it got the code from, it directed me correctly to the 'pyarrow/feather.py' source on Github. But there, I found that the source code differs completely from the one it showed me.

In summary, Bing Chat generated code for me. And:

  1. It imagined a non-existing flag and feature in an open-source library. 

  2. It refused to back down and argued it was right when given a chance to correct itself.

  3. Its proof (source code) for being right showed it was wrong.

  4. Worse, the proof was made up, and the referenced source is entirely different.

That is devastating. It produced incorrect code, failed to understand its mistake and faked source code with references. It did everything it could to throw me off and get things wrong.


Christian Prokopp, Founder BoldData.org

    Let's talk

    You have a business problem in need for data and analysis? Send us an email.

    Subscribe to updates

    Join Bold Data's email list to receive free data and updates.

Related Posts

Llamar.ai: A deep dive into the (in)feasibility of RAG with LLMs

Llama looking through wooden fence
Over four months, I created a working retrieval-augmented generation (RAG) product prototype for a sizeable potential customer using a Large-Language Model (LLM). It became a ChatGPT-like expert agent with deep, up-to-date domain knowledge and conversational skills. But I am shutting it down instead of rolling it out. Here is why, how I got there and what it means for the future.

Google Unveils "Bard," its Answer to ChatGPT

A cyberpunk bard in style of a Vincent van Gogh painting
The Battle of the AI Chatbots Begins: Google's Bard Takes on ChatGPT.

Understanding the Power of ChatGPT

A robot and a teacher in school
ChatGPT is a state-of-the-art language model developed by OpenAI, utilising the Transformer model and fine-tuned through reinforcement learning to produce accurate and ethical text responses.

How to create a Data Dictionary using ChatGPT

An abstract watercolour art of a robot reading a dictionary
ChatGPT can combine data with natural language and has extensive information about most subjects. That lends itself to novel applications like creating informative data dictionaries.

Python TDD with ChatGPT

Being tested
Programming with ChatGPT using an iterative approach is difficult, as I have demonstrated previously. Maybe ChatGPT can benefit from Test-driven development (TDD). Could it aid LLMs as it does humans?

How to code Python with ChatGPT

Meditating robot
Can ChatGPT help you develop software in Python? Let us ask ChatGPT to write code to query AWS Athena to test if and how we can do it step-by-step.

No, ChatGPT is not 10x-ing developer performance

Bored robot
ChatGPT and similar language models have recently been gaining attention for their potential to revolutionise code generation and enhance developer productivity. I was curious to see what all the hype was about, so I decided to try it out for some development work.

OpenAI GPT-3: Content spam or more?

Robot on a typewriter in a library (DALL·E generate)
OpenAI's ChatGPT has made the news recently as a next-generation conversational agent. It has a surprising breadth which made me wonder, could OpenAI generate specific technology content good enough to post, and what would that imply for the future?

Delta Lake vs Data Lake

Photo of a beautiful lake
Should you switch your Data Lake to a Delta Lake? At first glance, Delta Lakes offer benefits and features like ACID transactions. But at what cost?

Will Tesla's AI break the insurance market?

Car accident
Insurance works because it shares costs in the face of uncertainty. What happens when Tesla removes uncertainty and distributes cost seemingly more fairly? First partially and eventually wholly? Will insurance fail, doing more harm than good?

Why I became a Solo Founder

Single lego figure walking in sand
I never wanted to be a solo founder. Yet, in 2021, I quit my job and started Bold Data to mine the Internet single-handedly. Trust me, it sounds as insane to write as to read. What on earth possessed me, and more importantly, would I do it again?

Free Amazon bestsellers datasets (May 8th 2022)

All you can eat free data
Get huge, valuable datasets with 4.9 million Amazon bestsellers for free. No payment, registration or credit card is needed.

All Blog Posts

See the full list of blog posts to read more.
Subscribe for updates, free datasets and analysis.