Free Amazon bestsellers datasets (May 8th 2022)

May 10th, 2022
Get huge, valuable datasets with 4.9 million Amazon bestsellers for free. No payment, registration or credit card is needed.
All you can eat free data

Download the bestsellers of Amazon.com (2,090,907 products), Amazon.de (1,431,524 products), and Amazon.co.uk (1,377,192 products). The products list all categories in which they rank in the top 100, the product name, reviews, review average, offer price, number of offers, and extra tag data like author, type, brand, etc.

The datasets* contain all Amazon Germany, Uk and US bestsellers, i.e. top 100 products of all categories. Bold Data retrieved them on the 8th and the 9th of May 2022. For a detailed list of data attributes (column names) and their description go to the end of the post.

Note, Bold Data can provide this data updated with any frequency on request and as time-series data including trends on categories, price, reviews, ranking, etc. or any other data. Get in touch by email to start a conversation.

Who uses Amazon bestseller data?

Amazon bestseller data is being used by startups and established businesses to analyse their pricing, supply, new product and category strategies, for example. Other uses include students studying data analytics, business intelligence, data science or machine learning, or doing online learning or competitions with Coursera or Kaggle, for example. Other use cases include researchers at universities who analyse market changes, retail and e-commerce.

Amazon.com bestsellers dataset

One thousand products

A random sample of one thousand products from Amazon.com.

One million products

A random sample of one million products from Amazon.com.

Full dataset

The full bestseller dataset consisting of 2,090,907 products was retrieved from Amazon.com and stored as a gzipped CSV file. Please send an email request to receive free access.

Amazon.co.uk bestsellers dataset

One thousand products

A random sample of one thousand products from Amazon.co.uk.

One million products

A random sample of one million products from Amazon.co.uk.

Full dataset

The full bestseller dataset consisting of 1,377,192 products was retrieved from Amazon.co.uk and stored as a gzipped CSV file. Please send an email request to receive free access.

Amazon.de bestsellers dataset

One thousand products

A random sample of one thousand products from Amazon.de.

One million products

A random sample of one million products from Amazon.de.

Full dataset

The full bestseller dataset consisting of 1,431,524 products was retrieved from Amazon.de and stored as a gzipped CSV file. Please send an email request to receive free access.

Need more data or help?

If you need help with the data contact Christian the founder of Bold Data. If you want to stay up to date with information on the datasets and future datasets subscribe to the email list (see the bottom or top right for links).

If you have specific dataset needs and want to inquire about Bold Data's services do contact Christian. This can be specific to Amazon, e.g. frequent updates, detailed product data or different countries. It can also be completely different websites, datasets or analyses you are interested in.


Data attributes

Below are the dataset column names and their meaning.

sku: The unique product identifier (ASINs in these datasets).

name: The product name.

review_avg: The average review rating.

review_count: The total number of reviews.

ranks: All category identifiers and associated best selling rank for the category.

min_rank: Best (smallest) rank across bestseller categories.

max_rank: Worst (highest) rank across bestseller categories.

ranks_count: Number of bestseller categories the product was found in.

offer: Best offer price in local currency, e.g. GBP, USD, EUR.

offers: Number of offers (may include used or warehouse offers).

tag1: Additional data like product type, author, brand, etc.

tag2: Additional data like product type, author, brand, etc.

request_date: Date the data was retrieved.


*Note that no guarantees are made about the completeness or accuracy of the data and no liabilities arise from the download or use of the data. The data was collected from public sources as is and may contain restricted data like trademarks, language deemed inappropriate in certain circumstances, erroneous data or other unforeseen limitations.

The data may be used for private or commercial analysis and decision-making use only. Redistribution or resale of the data is prohibited unless explicitly agreed in writing by Bold Data Ltd. Where the data is used, e.g. for analysis, diagrams, charts or otherwise, attribution to the source, e.g. "Bold Data, https://www.bolddata.org" or equivalent, must be made.

    Let's talk

    You have a business problem in need for data and analysis? Send us an email.

    Subscribe to updates

    Join Bold Data's email list to receive free data and updates.

Related Posts

Llamar.ai: A deep dive into the (in)feasibility of RAG with LLMs

Llama looking through wooden fence
Over four months, I created a working retrieval-augmented generation (RAG) product prototype for a sizeable potential customer using a Large-Language Model (LLM). It became a ChatGPT-like expert agent with deep, up-to-date domain knowledge and conversational skills. But I am shutting it down instead of rolling it out. Here is why, how I got there and what it means for the future.

Bad data: Nameless Amazon bestsellers

Bestsellers missing names
Many Amazon marketplace customers know that its huge product catalogue has data quality issues. However, they might expect its top sellers, which they frequently see and buy, to be accurate. Bold Data, which is processing 100s of millions of products daily, has a unique ability to find hidden insights and issues. For example, active Amazon bestsellers with names resulting from data processing errors.

Public data is of public interest

Open sign
Public data has an enormous commercial and social impact. For example, in Ukraine, it affects war and peace, and with the Coronavirus, it involves life and death. We must keep public data accessible for the public good.

All Blog Posts

See the full list of blog posts to read more.
Subscribe for updates, free datasets and analysis.