SourceScore

Topic hub · 7 claims

Vector databases — storing and searching embeddings at scale

Databases optimized for similarity search over dense vector embeddings. The retrieval backbone of every production RAG pipeline.

Why dedicated vector DBs

Standard databases (Postgres, MySQL, MongoDB) handle exact-match + range queries. Vector queries are different: given a 1536-dimensional query vector, return the K nearest neighbors by cosine similarity from a corpus of millions of vectors, in <100ms. The data structures (HNSW, IVF, PQ) and tuning trade-offs are non-trivial. Dedicated vector DBs ship those primitives.

The four main options

FAISS (Facebook AI 2017) is a library, not a database — fastest, no service to run, embed in your app. Pinecone (founded 2019) is the managed-cloud leader — easiest production deployment, costs scale with index size. Weaviate, Qdrant, and Milvus are open-source + managed-cloud — Qdrant is the easiest local + production option for most teams. Chroma is the simplest dev-loop option (single-file SQLite-backed).

The Postgres option

pgvector — a Postgres extension — has matured enough by 2025 that for teams already on Postgres, adding pgvector beats adding a separate vector DB. Trade-off: pgvector's similarity-search performance lags purpose-built vector DBs at >10M vectors, but is competitive below that threshold.

Defined terms (3)

Vector database
A database optimized for storing and similarity-searching high-dimensional vector embeddings. Foundational to RAG retrieval at scale.
HNSW
Hierarchical Navigable Small World — the dominant approximate-nearest-neighbor algorithm. Used by FAISS, Pinecone, Weaviate, Qdrant, pgvector.
pgvector
Postgres extension that adds vector storage + similarity search. Lets teams already on Postgres avoid a separate vector DB.

All claims in this topic (7)

Related

Framework integrations