Common pitfalls

Table of Contents

Score fluctuations
Missing documents and orphaning

Score fluctuations

When setting up semantic vector search or Neural Hybrid Search, you’ll need to decide what type of Solr replica to use. Solr supports three types of replicas: Near Real Time (NRT), Transaction Log (TLOG), and PULL. Lucidworks recommends using TLOG/PULL replicas.

If you use NRT replicas, there might be noticeable shifts in score results. This is most noticeable with large datasets. These shifts occur because each NRT replica maintains its own index and own HNSW graph. As HNSW is an Approximate Nearest Neighbor (ANN) algorithm, it will not return exactly the same results for differently constructed graphs. You can mitigate this variance for NRT replicas by increasing efSearch HNSW parameters in Solr though parameter topK*2.

For more information, see Neural Hybrid Search replica choice.

Missing documents and orphaning

Sometimes when using vector search, nodes become disconnected and documents become unreachable. This issue is more likely to occur when using vector search on its own (and not as part of Neural Hybrid Search). Disconnected nodes, also known as orphaned nodes, are a known issue with Lucene’s implementation of HNSW approximate nearest neighbors search (which Solr’s dense vector search depends on). If you encounter orphaned nodes, you can increase the HNSW Solr schema parameters hnswBeamWidth and hnswMaxConnections, save the schema, clear your index, and then re-index your collection.