How do Computers Search a Million Vectors Without Checking Every Single One?

Marouane Boufarouj
Marouane Boufaroujin Algorithms · March 22, 2026 · 25 min

This article was originally written in Arabic and translated to English with a AI help. If you happen to speak Arabic, I'd love for you to check out the original version.

Imagine you gave ChatGPT a 1,000-page PDF and asked about one tiny detail. In a fraction of a second, it pulls the answer from a sea of words. How did it do it? Did it read the whole book in that instant? Impossible.

The secret is that the book was partitioned and transformed into Vectors (arrays of numbers) using an embedding model. When you ask a question, your query is also converted into a Vector, and the system looks for the 3 or 4 text chunks that are most semantically similar.

The real challenge: How does this system find these chunks among millions of vectors in the blink of an eye without checking every single one?

The clever solution is to build a Graph—a network of connections between these vectors—and navigate it using a greedy approach until you reach the closest match. This solution is called Navigable Small World (NSW).


Part 1: KNN - The Brute Force Approach

KNN (K-Nearest Neighbors) in the context of vector search is the "manual labor" method: you have a query vector q, you calculate the distance between it and every vector in your dataset, and you pick the top k.

def brute_force_search(query, all_vectors, k): distances = [] for v in all_vectors: d = euclidean_distance(query, v) distances.append((v, d)) distances.sort(key=lambda x: x[1]) return distances[:k]

The Cost: O(n×d) per query (where n is the number of vectors and d is the number of dimensions).

Each dimension is a single number in the vector, representing a specific aspect of the text's meaning.

If you have a million vectors with 768 dimensions? You'd need 768 million calculations just to answer one question. The result would be 100% accurate, but the system would be too slow for any practical application.

The question NSW answers: Can we organize these vectors into a data structure that allows us to find the top k vectors in O(logn) without scanning the entire dataset?


Part 2: Why a Graph?

Before graphs, people tried various solutions like search trees (KD-trees) or hashing (LSH). However, these methods often fail when vector dimensionality increases—a problem known as the Curse of Dimensionality.

Graphs take a completely different route.

The Core Idea

If you connect every vector to its closest neighbors, "proximity" becomes a Path you can walk. Instead of searching everything, you enter the graph at any random point and ask the neighbors: "Which of you is most similar to this query?"

You move to the closest point and repeat the process (Greedy Search) until you reach the best possible nearest neighbors.

What is Small World Topology?

This concept means our graph must have two types of connections:

  • Short-range links: These provide precision as you get closer to the target.
  • Long-range links: These allow you to skip across massive distances in the blink of an eye.

Analogy: Think of a road network. The Highway is the long-range link that gets you from one side of the country to the other in record time. City streets are the short-range links that take you to a specific address. Without the highway, you'd have to drive through every single small town along the way.


Part 3: NSW - How We Build the Graph

The original paper (Malkov et al., 2014) suggests a building method that is both simple and brilliant: Sequential Insertion. Vectors enter the graph one by one:

  1. Search the existing Graph: For every new vector, we perform a quick search to find the best M neighbors (the closest existing points).
  2. Create Edges: we create bidirectional links between the new vector and these neighbors.

This process transforms the dataset from scattered points into a Connected Network.

The Code:

def insert(self, new_node: Node): if len(self.nodes) == 0: self.entry_point = new_node self.nodes.append(new_node) return if len(self.nodes) < self.m: for node in self.nodes: new_node.neighbors.append(node) node.neighbors.append(new_node) self.nodes.append(new_node) return # Standard neighbor search neighbors = self.search(new_node.vector, [self.entry_point], self.m) for neighbor in neighbors: new_node.neighbors.append(neighbor) neighbor.neighbors.append(new_node) self.nodes.append(new_node)

When we call insert, the code handles three phases:

  1. The First Node: If the graph is empty, this point becomes the entry_point—the foundation for all future searches.
  2. The Bootstrap Phase: While we have fewer than M points, we connect the new point to all existing points. This creates those vital long-range links.
  3. The Greedy Search: Once the graph is established, we use self.search to find the M closest neighbors for the new arrival and link them bidirectionally.

Where do long-range links come from?

There isn't a single line of code that says "make a long link." These links form naturally because the first vectors added to the graph are forced to connect to each other even if they are far apart (because there are no other options). They become the "highways" of the system later on.


Part 4: NSW Search - How we Navigate

Imagine the graph is ready. A user submits a query. Our goal is to find the closest points without looking at all million points. We start at the entry_point and move greedily: always moving to the neighbor that looks closest to the target.

To manage this movement and ensure accuracy, we use two Heaps:

  • Candidates (Min-Heap): The "waiting list." It contains all discovered points whose neighbors haven't been checked yet. We prioritize the point closest to the query.
  • Results (Max-Heap): The "leaderboard." It stores the best k results found so far.

The Smart Stopping Condition

The search doesn't run forever. One specific logic gives NSW its speed:

closest_node, closest_dist = candidates.pop_min() furthest_node, furthest_dist = results.peek_max() if closest_dist > furthest_dist: break

The logic is simple: If the closest candidate I have (the best possible one I can explore next) is further than the worst result currently on my leaderboard... then I've hit a dead end. Any new neighbor I find will only be further away. We stop and return the results.


Part 5: Results — Brute Force vs. NSW

I tested this on a dataset of chess articles. Here are the findings:

MetricBrute ForceNSW
Accuracy (Recall)100%~99-100% (High)
ComplexityLinear O(n)Logarithmic O(logn)
Search Ops (1M vectors)1,000,000~200
  • Recall: On small datasets, NSW provides the exact same results as brute force.
  • Efficiency: With only 25 articles, Brute Force wins because it has zero "build time." But at a million documents, Brute Force becomes impossible, while NSW remains lightning-fast.

What's Next?

NSW is powerful, but it has one flaw: as data grows, the "entry point" might be very far from the answer. The algorithm wastes time "wandering" through local neighborhoods just to get to the right region.

This is where HNSW (Hierarchical Navigable Small World) comes in, using an idea inspired by Google Maps.

Imagine you are zoomed in to street level in Mexico City, but you want to find a street in Taghazout, Morocco. In Standard NSW, you'd have to slide across the map at the same zoom level, crossing the Atlantic millimeter by millimeter. In HNSW, navigation is multi-layered:

  • Level 1 (The Big Jump): The system zooms out to see continents. One "hop" takes you from America to Africa.
  • Level 2 (Regional Search): The system zooms in to see countries. You move toward the Agadir region.
  • Level 3 (Final Destination): The system zooms in to 100%. You are back at street level, landing exactly in Taghazout.

The HNSW Paper (2018)

Efficient and Robust Approximate Nearest Neighbor Search using HNSW Graphs — This is the legendary paper that set the standard for modern Vector Databases like Milvus, Weaviate, and Pinecone.

The full code for this experiment is available on GitHub — Project name: jiwar.