High-Performance Query Processing with Auto-Scaling

To further illustrate key concepts about the Fusion 5 architecture, let’s walk through how query execution works and the various microservices involved. There are two primary take-aways from this section. First, there are a number of microservices involved in query execution, which illustrates the value and importance of having a robust orchestration layer like Kubernetes. Second, Fusion comes well-configured out of the box so you don’t have to worry about configuring all the details depicted in the diagram below:

Figure 2: Fusion query execution

query execution At point A (far right), background Spark jobs aggregate signals to power the signal boosting stage and analyze signals for query rewriting (head/tail, synonym detection, and so on). At point B, Fusion uses a Solr auto-scaling policy in conjunction with K8s node pools to govern replica placement for various Fusion collections. For instance, to support high performance query traffic, we typically place the primary collection together with sidecar collections for query rewriting, signal boosting, and rules matching. Solr pods supporting high volume, low-latency reads are backed by a HPA linked to CPU or custom metrics in Prometheus. Fusion services store configuration, such as query pipeline definitions, in Zookeeper (point C lower left).

At point 1, (far left), a query request comes into the cluster via a Kubernetes Ingress. The Ingress is configured to route requests to the Fusion API Gateway service. The gateway performs authentication and authorization to ensure the user has the correct permissions to execute the query. The Fusion API Gateway load-balances requests across multiple query pipeline services using native Kubernetes service discovery (point 2).

The gateway issues a JWT to be sent to downstream services (point 3 in the diagram); this diagram is from the perspective of a request. An internal JWT holds identifying information about a user including their roles and permissions to allow Fusion services to perform fine-grained authorization. The JWT is returned as a Set-Cookie header to improve performance of subsequent requests. Alternatively, API requests can use the /oauth2/token endpoint in the Gateway to get the JWT using OAuth2 semantics.

At point 4, the query service executes the pipeline stages to enrich the query before sending it to the primary collection. Typically, this involves a number of lookups to sidecar collections, such as the <app>_query_rewrite collection to perform spell correction, synonym expansion, and rules matching. Your query pipeline may also call out to the Fusion ML Model service to generate predictions, such as to determine query intent. The ML Model service may also use an HPA tied to CPU to scale out as needed to support desired QPS (point 5 in the diagram).

After executing the query the primary collection, Fusion generates a response signal to track query request parameters and Solr response metrics, such as numFound and qTime (point 6). Raw signals are stored in the signals collection, which typically runs in the analytics partition in order to support high-volume writes.

Behind the scenes, every Fusion microservice exposes detailed metrics. Prometheus scrapes the metrics using pod annotations. The query microservice exposes per stage metrics to help understand query performance (point 7). Moreover, every Fusion service ships logs to Logstash, which can be configured to index log messages into the system_logs collection in Solr or to an external service like Elastic (point 8).