Looking for the old docs site? You can still view it for a limited time here.

Increase SQL Resource Allocations

So as to not conflict with the CPU and memory settings used for Fusion driver applications (default & script), the Fusion SQL service uses a unique set of configuration properties for granting CPU and memory for executing SQL queries.

You can use the Configurations API to override the default values shown here.

Configuration Property and Default Description

fusion.sql.cores

1

Sets the max number of cores to use across the entire cluster to execute SQL queries. Give as many as possible while still leaving CPU available for other Fusion jobs.

fusion.sql.executor.cores

1

Number of cores to use per executor

fusion.sql.memory

1g

Memory per executor to use for executing SQL queries

fusion.sql.default.shuffle.partitions

20

Default number of partitions when performing a distributed group-by-type operation, such as a JOIN

fusion.sql.bucket_size_limit.threshold

30,000,000

Threshold that determines when to use Solr streaming rollup instead of facet when computing aggregations; rollup can handle high cardinality dimenions but is much slower than using facets to compute aggregate measures.

fusion.sql.max.no.limit.threshold

10,000

Sets a limit for SQL queries that select all fields and all rows, that is, select * from table-name.

fusion.sql.max.cache.rows

5,000,000

Don’t cache tables bigger than this threshold. If a user sends the cache-table command for large collections with row counts that exceed this value, then the cache operations will fail.

fusion.sql.max_scan_rows

2,000,000

Safeguard mechanism to prevent queries that request too many rows from large tables. Queries that read more than this many rows from Solr will fail; increase this threshold for larger Solr clusters that can handle streaming more rows concurrently.

Tip
The Fusion SQL service is designed for executing analytics-style queries over large data sets. You need to provide ample CPU and memory so that queries execute efficiently and can leverage Spark’s in-memory caching for joins and aggregations.

Here’s an example of increasing the resources for the Fusion SQL service:

curl -H 'Content-type:application/json' -X PUT -d '8' "http://localhost:8765/api/v1/configurations/fusion.sql.cores"
curl -H 'Content-type:application/json' -X PUT -d '8' "http://localhost:8765/api/v1/configurations/fusion.sql.executor.cores"
curl -H 'Content-type:application/json' -X PUT -d '2g' "http://localhost:8765/api/v1/configurations/fusion.sql.memory"
curl -H 'Content-type:application/json' -X PUT -d '8' "http://localhost:8765/api/v1/configurations/fusion.sql.default.shuffle.partitions"

If you change any of these settings, you must restart the Fusion SQL service with ./sql restart (on Unix) or sql.cmd restart (on Windows).

The Fusion SQL service is a long-running Spark application and, as such, it holds on to the resources (CPU and memory) allocated to it using the aforementioned settings. Consequently, you might need to reconfigure the CPU and memory allocation for other Fusion Spark jobs to account for the resources given to the Fusion SQL service. In other words, any resources you give to the Fusion SQL service are no longer available for running other Fusion Spark jobs. For more information on adjusting the CPU and memory settings for Fusion Spark jobs, see the Spark configuration settings.