Since high school, for close to thirty years, I have wanted to start something of my own. Then, with a few friends, we tried to develop a computer game. This was long before Unity, Youtube guides, online courses, and other tools. We had no money or experience, and we started head-first into it. Unsurprisingly, it failed, but we learned a few things. It was more complex than we imagined; Dunning-Kruger effect, anyone? We spent too much time discussing how we would distribute our imaginary earnings, which sounds probably familiar to many startups and VCs. Still, we managed to develop a bespoke 3D engine with decent performance, 3D models and level designs before the project collapsed. Most importantly, we stayed friends and still are today.
A bit later, I did some self-employed work during and after my undergraduate studies. Still, the steps to a sustainable business were unclear, and I focused solely on technical problems. For example, I started a PHP framework to simplify development work in PHP's early days. I built the basic framework but had no idea what to do next. Coming from a working-class background, I knew no entrepreneurs nor how a business works. Also, this was just after the Dot-come bubble burst around 2000, so the timing was not on my side.
I moved from Germany to the UK, Asia, and Australia. In Australia, I did a Master of Commerce (MCom). Firstly to get a visa and stay in the country with my girlfriend, and secondly, as a compromise. I had a BSc and wanted to learn more about business, but I did not want an MBA. As a young engineer, MBAs seemed too detached from the "real world". Not that I had quantifiable evidence, merely my unfounded opinion. The MCom was worthwhile despite the strain of working to pay for the study in parallel. It taught me about accounting, finance, economics, and scientific research, leading to my PhD. If the latter was a good or bad choice is a topic for another post.
I added a research semester to my MCom and worked with SAP Research and a professor on some Natural Language Processing (NLP) research. I enjoyed research and wanted to challenge myself. Looking for a PhD placement, I remember a conversation with a professor. One of my ideas was to research globally distributed databases with fewer consistency and data model restrictions. She specialised in databases but found the idea not interesting. In the following years, I watched how the NoSQL market exploded with new products and funding flooding the headlines. I learned, again, that ideas are one thing, but timing and support are another.
I did receive a scholarship and did my PhD in information retrieval with a bit of machine learning. You could call it Data Science today. There was a brief interlude where I found myself in a country on the verge of a civil war with my newborn daughter without a passport and a backpack full of diapers instead of trying to finish my PhD thesis. But that too is a topic for another post. Ultimately, I decided academia was not for me and moved to the UK to work in startups.
The second startup I worked at is particularly relevant to my story. It was what people envision when they hear startup. We were housed in an old office building, scavenging for abandoned, usable furniture from deserted rooms. The engineering team was a handful of bright people working on laptops, continuously releasing software to production in the public cloud, which was still novel at the time. Our critical infrastructure was our router, which was unreachable when someone used the dodgy microwave placed in the same room as the engineering team. So no releases at lunchtime. Still, we had a contingency for emergencies, a 3G USB stick. One for a team of six. It was awesome and some of the best fun I have had in my professional life.
At this startup, my work focused on crawling and scraping Amazon's product data to enrich our product catalogue for our B2B e-commerce platform. We use newfangled technology like Hadoop and Hive in the cloud, no less. But the data mining activity was not a core activity or differentiator. Yet, when we tried to outsource it, we learned that we outperformed the crawling and scraping market in quality, speed and cost by a considerable margin. At that point, we collected over 250 million products. So we had to keep it in-house.
We also learned that our dataset was valuable to many retailers competing with Amazon. They did not have and still do not have the deep insight Amazon gets from its marketplace. Amazon can dip into 100s of millions of products and their sales history and billions of customer interactions and use it to analyse and optimise its product catalogue. It is a data monopoly of tremendous competitive value. For example, how do product category manager who manages hundreds, maybe thousands of products compete with that insight into pricing, demand and supply, and customer insight? That was where our dataset and ability to use it for analytics on pricing, new product development, categories, availability, etc., sparked an immense interest.
Eventually, Google bought the startup. For a moment, I seriously considered taking my know-how and rebuilding the data mining aspect of the startup as a new business. But while I learned a lot more about startups and data. I was only experienced as an individual contributor and had no co-founder or money. However, I did have a family and bills to pay by now. And being bought by Google meant I could choose from many exciting and well-paying job offers that materialised overnight.
At the time, I chose to join a startup in big data consulting as the principal consultant. It was a huge gamble and intentionally yanked me out of my comfort zone from behind a screen. I was thrown immediately into the deep end, having to work face to face with a customer's team from day one. True to what you expect from an early startup, it had little to no collateral or processes. It was challenging, substantial and invaluable learning.
I had no knowledge or appreciation for sales, for example. But having to architect, sell and manage major technical programmes in large organisations was an eye-opener on interpersonal, political and business aspects. I was lucky to work with an experienced, intelligent, technically competent sales director from whom I learned much. We became a well-tuned team managing technical and commercial challenges in complex conversations and negotiations with numerous stakeholders anticipating the concerns and needs of our customers, trusting and always supporting each other.
As the startup grew, I became responsible for the technical teams and elements of the business, helping shape the organisational structure, collateral, processes and intellectual property. Eventually, we exited to Teradata, where I helped scale the business further for a bit longer. I took away a few learnings. I decided not to scale a business with billable hours again. Scaling with people is hard and gets more complex with every magnitude scale. Also, I did not want to shoulder large parts of the senior responsibility without being a founder again. You get less strategic say, have no place at the table on the most critical decisions, and you won't get a comparable payout on exits. (Cue the smallest violin in the world. I know that I live a privileged life; there is no need to shed a tear for me.)
After Teradata, I did some interims positions at small and global organisations in various industries. I even pitched at a friend's startup accelerator an idea for a Data Science as a Service company with the twist on using consulting only as an enabler and accelerator with deployed ML services as a recurring revenue stream. It did not get picked up. The idea was timely and appropriate, looking at companies like DataRobot and the development in ML advances and automation today. But I dropped it, not wanting to do it alone and being busy making money elsewhere. It is hard to walk away from a comfortable life and risk everything. Notably, you make that risky choice not only for yourself but also for your family and potential co-founders. And the people I considered for co-founding had the same issue, very comfortable positions in market-leading companies. There is a reason why many people do startups early in life with less to lose, less experience of the complexities and more to gain.
My last job was at a health and wellness retailer, creating a data platform including Data Engineering, Business Intelligence, and Data Science teams and capabilities. I joined because they hired a seasoned, excellent CTO. His mission was to take them from low-tech and fully outsourced to high-tech, engineering and data-driven business in the envelope of a significant strategic effort to reinvent the company from a brick and mortar retailer to a modern enterprise with unified channels and novel services. Business transformation is a worn-out term, but it applies here. It was a tremendous challenge, and the CTO was assembling a great team. What went well and did not and the learnings are material for future posts.
After one to two years in the role, it struck me. I was looking into sourcing data once again. Category buyers, marketing, supply chain and others could benefit from various market insights and already acquired bits and pieces here and there. But I found that the data mining industry had not moved much in nearly a decade. Web crawling and scraping with proxies is still the standard method, and the prices and complexity remain high. Curated datasets are expensive, less timely, and custom-made or limited. Inexpensive, accurate, timely data and analysis were as hard to come by as before.
Importantly, no one seems to have taken a step back and tackled the problem as a whole. It is not a retail or e-commerce problem only. It is not a single data source issue either. The public Internet contains hard-to-reach data that can provide or enhance analytics to support decisions across countless industries. But it is only valuable if there is timely, flexible, easy and inexpensive access. Today, you have to build your own data mining with inadequate services or spend huge sums on cumbersome contracts and datasets with many providers. Then, you must invest on top to wrestle something of value from the disparate datasets. It is slow, costly, complex and cumbersome.
Why not help people and systems make better decisions with publicly available but hard-to-reach data by collecting, cleaning, structuring and analysing the data for them with automated, modern, inexpensive methods? In simple terms, data mine the public Internet, wrap an API, data stores and analytics services around it. Customers can use it for countless decisions like pricing, marketing, insurance risk calculations, predicting residual car lease valuations, new product development, category selections, inventory and supply management, finding copyright infringements, and a million other decisions.
The market for data and better decisions exists. It did ten years ago, and it does so even more today. You can witness it in the surge of Data Science, Machine Learning (ML) and Artificial Intelligence (AI) startups that focus on the analytics aspect. But we all know good, big data is more effective than complex algorithms. Why has a global data platform not been done? The best explanation I have is that it is challenging, slow to break in and not sexy or cutting edge on the surface. All the money and brains go after the year's hype, be it ML, AI, crypto, NFT or something else.
Here I was twenty to thirty years later. I had another idea, a little money and some business-relevant experience, but no co-founders. The list of people I would love to partner with are still doing too well to consider this level of risk-taking a reasonable option. So hubris took hold of me. Maybe, just maybe, I could pull the first steps off myself? It came with the bonus of being guilt-free if things went pear-shaped instead of feeling responsible for someone betting their time and money on my idea. Lastly, I could run it my way, i.e. a self-funded shoestring operation with cost as a critical differentiator and survival factor.
In June 2021, I left my job and founded Bold Data Ltd on the mission to build a scale-out platform to mine the valuable parts of the public Internet myself. It sounds ludicrous now. At the time, I engrossed myself in work and long days, which helped me not dwell on the overwhelming scale or indulge in crippling self-doubt. I started with familiar territory, Amazon data.
So was it a good idea, and how am I doing today? I will dive into some details in future posts, but here are the key points one year after solo founding a startup taking on internet-scale data mining problems.
So far, it has cost me money and has not made me any. It is demanding with many long hours and no one to lean on but yourself. I do not tend to be lonely and have a network of colleagues and friends I can talk to occasionally. But they have jobs, so you forgo the watercooler talk or the chance to blow some steam or get some ideas over coffee or lunch from your colleagues. Luckily, I have a family, and while they could not be less interested in my day-to-day technical work, they are a great source of stability and comfort. Also, I found it valuable to reach out to people in my extended network to get feedback. It is incredible how helpful people you worked with a decade ago or strangers are if you just ask nicely.
On the technical side, I will not share the details to protect my intellectual property, but I found the world moved on in some ways and is stuck in others. Most importantly, the supposedly best scaling technology is not always suitable if you look at the problem diligently from a get-stuff-done fast, cost and scale perspective. To be fair, I also have an insanely low budget because I think of the following one or two magnitudes of scale, which excludes some tech and traditional data mining approaches right from the start.
A year on, I developed a robust global multi-cloud solution that runs and self-heals 24/7 without me handholding it, a crucial requirement. It mined over 200 million products from Amazon US, UK, and GER in detail and collected over a billion updates. I could make the numbers bigger by spending more money on mining or by including other aspects of the operation. The point is that the platform works and scales.
Next, I could add more countries, other e-commerce websites or other industries' data altogether. But I took a big gamble by building so much without a customer. It is time to pivot in front of users as soon as possible. Now I focus on the website, outreach to my network, and creating content for SEO and LinkedIn to create inbound leads for either the existing Amazon data or additional datasets to capture interest and convert it into revenue. It meant a drastic switch away from the backend engineering to frontend and more of a marketing and sales focus, with plenty of challenges and opportunities to learn.
I was hesitant to switch gears from building to marketing and selling because there is always another thing that you should, could or must do first, especially if there is no one else to hand over to. But I knew it was time. Yet when I did switch, nothing happened, expectedly but still daunting. All can fail if no one shows up to try what you got. Getting traction is another marathon of its own. To begin with, it does feel like shouting in the wind on a vast bazaar of ideas.
It is still very early days, and I have seen some inbound requests slowly. Initially, from suspect individuals who ghosted me on the slightest question on their use-case, some students, researchers, and some small businesses started looking for some free datasets to get started. Recently, I picked up some interest from one of the best-known global FMCG brands and a renowned consultancy. While this is still a way to go to ARR or MRR, the responses are positive, and people are willing to invest time to explore Bold Data, which is a great start.
A year on, absolutely. I have hit many hurdles, and I overcame them all. I loved switching back from meetings all day to developing all day in various engineering roles and now changing again into marketing and selling. I know it is temporary, though. Success or failure, my role will change again.
Would I do it again if you asked me in a year? We will see.