Navigating the Vector Database Boom: A Six-Month Retrospective
Deep dive into usage & community growth
In April 2023, growth in the AI sector significantly affected the Vector Database market. These databases are essential for managing and analyzing large-scale vector data, a common need in machine learning and AI. Many startups in this sector received substantial funding then. Now, six months later, it's an excellent time to review their progress and growth.
We'll focus on the five leading databases: Pinecone, ChromeDB, Weaviate, Milvus, and Qdrant, exploring their user and community expansion.
To measure usage growth, we'll look at client libraries' download numbers. However, these numbers, influenced by CI tools and bots, don't directly reflect user count. A rough estimate could be found by dividing them by 100, yet comparing them with companies' published user data and extrapolating from there would provide a more accurate insight.
Pinecone is the market leader in the Vector Databases sector, raising Series B $100M at a $750M valuation in April. It’s not an open-source project, but you could still understand the usage of its client libraries.
It gets 364,376 weekly downloads across Python and JS clients.
In the second place, we have the vector database Chroma, which scored $18M in seed funding at a $75M valuation.
In total, Chroma gets 209,385 weekly downloads. They are also launching their Docker distribution (904 daily pulls now), which will gain traction in a few months.
Impressive growth over the last six months puts Chroma in second place at 1/10th of the leader's valuation.
Weaviate is another vector database, raising Series B $50M at a $200M valuation in April. Let’s break it down as well.
It gets 178,883 weekly downloads across Python and JS. Another significant portion of downloads come from Docker, with 8,026 daily pulls (2.5M total).
Weaviate is showing higher growth in usage, almost catching Pinecone in the Python ecosystem.
Zilliz, the company behind Milvus, raised $60M Series B at ~$250M valuation.
Secures 112,617 weekly downloads across Python and JS. Milvus has the most significant Docker market share with 42,253 daily pulls (6.9M total).
Despite a stagnation in growth over the last six months, Milvus maintains a high user base.
Qdrant raised their seed also in April at $7.5M, which puts them in the $30-40M valuation range.
It gets 83,835 weekly downloads across Python and JS. Qdrant also distributes as a Docker container with 15,345 daily pulls (4M total).
Qdrant exhibits lower usage numbers with growth patterns that are yet to show a defined trajectory.
GitHub & Community Growth
In the open-source ecosystem, community engagement often reflects the vibrancy and support around a project. GitHub stars serve as a rough measure of a project's popularity, whereas the metric of High Signal stars (stars from active and reputable GitHub users) provides a more refined insight.
Milvus leads the pack in GitHub stars but has fewer High Signal stars (stars from power GH users). Given that it’s a Chinese company, it’s no surprise that stargazers are primarily located in China. Lower 61% High Signal / Total Stars ratio, a bit high Fake Stars metrics, but pretty good on Issues / PR rates.
Following closely is Qdrant, with a healthier 73% High Signal / Total Stars ratio, indicating a strong engagement from the global developer community. Most stargazers are also from China. It showcases the average Fake Stars metric and Issues / PR rates.
Qdrant also has a Discord, where you can see April 2023 euphoria and a decline in interest afterward.
Chroma comes third in terms of the total star count. Most stargazers are from the US, with 69% High Signal / Total Stars. A bit low numbers on Commit, PR, and Issue Rates, though.
They also have Discord, where you can compare April numbers with October. However, overall community size and activity are 5x higher than Qdrant.
Weaviate is in the last place on stars but gives back in terms of quality. 71% High Signal / Total ratio. Low fake stars, most stargazers are from the US and healthy Execution performance numbers.
Among the five databases, Chroma stands out with compelling traction and sustained growth, placing it as a strong contender in the market. Pinecone continues to lead but has been losing market share to rising stars like Weaviate, now holding the third spot. Milvus and Qdrant round off the fourth and fifth positions, respectively.
Couldn’t find your favorite database in the analysis? Fear not! I track over twenty vector DBs, but in this post, we’ve focused on five that provide a solid baseline for understanding market traction. Feel free to reach out if you’re an investor or a tech enthusiast and would like to delve deeper or benchmark another company/sector.
Thanks for reading Unfair Advantage! Subscribe for free to receive new posts and support my work.