Starting Installation

If you are using Ambari to deploy Solr and connector JARs to your cluster, installation is simply a matter of adding Solr as a new service and deploying to the required nodes.

The package downloaded by the Ambari process will be located in the /opt/lucidworks-hdpsearch directory.

Ambari will manage the installation of Solr, and will start Solr on each requested node. It will also connect Solr to the running ZooKeeper instance for the cluster.

The additional connector jars will be available on each node, but must be manually loaded to HDFS as needed to submit index requests to Solr. The documentation for each connector includes details on where each jar must be located in order to be used.

Setting Configuration and Startup Options

During setup with Ambari, most of the configuration options can be left at their defaults. However, when getting started, there are a few options you may want to consider changing.

These options are available in the Advanced example-collection section of the Configs screen:

  • Create sample collection: This option will create a sample collection the first time Solr starts. By default, this option is true.

    This option is to help you get started more quickly with Solr. However, if you are already familiar with using Solr you can set this option to false and create your collections manually at a later time.

  • Solr configuration directory: The sample configset to use. The default is data_driven_schema_configs, which provides a minimal setup of Solr for you to customize as needed. The section Available Configsets describes the options in more detail.

  • Sample collection name: The name of the sample collection. The default is "collection1", but this can be customized to any name you’d prefer.

The configuration and startup options configured with Ambari expose many Solr settings that would otherwise require editing solr.in.sh.

Other documentation for Solr may instruct you to modify solr.in.sh by hand. However, be aware that if you modify the Solr configuration with Ambari, you should not later hand-edit solr.in.sh. Instead, you should first look for the option in the Ambari interface and modify it via the Ambari UI to avoid overriding settings made via the UI.

Startup Option Reference

The Solr configuration with Ambari also allows several other customizations, described below.

In the Ambari UI, each setting group is a separate section that is collapsed by default. Click the group name in the UI to expand the section to view or modify the settings.

Advanced example-collection

Example Collection Config

This section provides configuration options for an example collection when first starting Solr.

Solr configuration directory

The sample configset to use. The default is data_driven_schema_configs, which provides a minimal setup of Solr for you to customize as needed. The section Available Configsets provides more details about the available defaults and how to make your own configset if needed.

Create sample collection

This option will create a sample collection the first time Solr starts. By default, this option is true.

Sample collection name

The name of the sample collection to create. The default is collection1, but this can be changed to any name you’d prefer.

Number of shards

Shards are logical partitions of the Solr index, distributed through the cluster. This setting defines how many shards to split an index into. The default is 2, meaning that the collection will be split across 2 nodes of the cluster.

Number of replicas

Replicas are physical copies of a shard, and this sets the number of replicas to create when creating the sample collection. The default is 1, meaning 1 copy of the collection will be placed on each node.

Advanced solr-cloud

SolrCloud Config

This section enabled SolrCloud and defines the ZooKeeper chroot.

Enable SolrCloud mode

This option starts Solr in SolrCloud mode. This option can not be changed as running Solr in Standalone mode when installed with HDP Search is not supported.

ZooKeeper directory for Solr files

The root directory for Solr configuration files in ZooKeeper (the chroot). The default is /solr, but can be modified before initial startup. The root directory should never be changed after initial startup.

Advanced solr-hdfs

HDFS Config

This section enables configuration options for running Solr on HDFS.

Enable support for Solr on HDFS

This option defines if Solr should be configured to store its indexes in HDFS. It’s recommended to do this and leave this option as true. This will modify Solr’s start commands and configuration files to enable the proper parameters (which are described in more detail in the section HDFS-Specific Changes).

Delete write.lock files on HDFS

Solr tries to protect its indexes by locking them from accidental writes. In normal operation, the lock files are cleaned up without any manual intervention.

In the case of an improper shutdown or another type of hard stop to Solr, these lock files can be left behind and prevent Solr from restarting. This option allows Ambari to automatically remove any lock files on startup, preventing startup issues. It’s recommended to leave this option as enabled.

HDFS directory for Solr indexes

The HDFS directory path where Solr indexes should be stored. This is set to /solr by default, but can be modified to any other path. If the path does not exist, it will be created.

Advanced solr-config-env

Solr Config

This section provides configuration options for customizing ports, JVM heap size and where files and indexes are stored.

Solr configuration directory

The directory path to store Solr’s configuration files.

Solr server directory

The directory path to store Solr’s index files.

Solr log directory

The directory path to store Solr’s log files.

Solr heap size for the JVM

Sets the minimum heap (-Xms) and maximum heap (-Xmx) sizes for Solr’s JVM. The default is 512Mb. This is likely too small for production systems, so you should modify this as your system grows and you add users and content.

Solr port

Allows modifying the port that Solr runs on. The default is 8983.

Solr service log directory

The directory path to store Solr’s service logs.

Solr PID directory

The directory path to store a file that contains the process id (pid) of the Solr service.

solr.in.sh template

The default solr.in.sh file that is included with all Solr installations. This allows you to customize properties not covered here.

Several of the properties available in Ambari configuration are also available in solr.in.sh. Take care not to edit properties in both places to avoid unexpected behavior.

Advanced solr-ssl

SSL Config

This section provides configuration options for enabling SSL encryption in Solr. More information about SSL and Solr is available in the Apache Solr Reference Guide section Enabling SSL.

Enable SSL support

Allows enabling SSL encryption for communication with and between Solr nodes. The default is false.

SSL keystore file

The path to the keystore. The default is /etc/solr-ssl.keystore.jks, so you will need to modify this to the path to your keystore.

Solr keystore password

The password to access the keystore.

Need client authentication

If true, the client must authenticate in order to access the system. The default is false.

If you don’t need clients to authenticate, you can alternately use "Want client authentication", which is less restrictive. Note that if "Need client authentication" is true, the alternative option must be false.

Solr truststore file

The path to the truststore. The default is /etc/solr-ssl.keystore.jks, so you will need to modify this to the path to your truststore.

Solr truststore file

The password to access the truststore.

Want client authentication

If true, the client can authenticate, but is not required to do so in order to access the system. The default is false.

If you need clients to authenticate, you can alternately use "Need client authentication", which is more restrictive with access. Note that if "Want client authentication" is true, the alternative option must be false.

Advanced solr-log4j

This section provides a template log4j.properties file to customize Solr’s logging features.

Advanced solr-metrics

This section provides a way to customize the metrics that are sent by HDP Search to the Ambari Metrics System for aggregation by the Metrics System.

HDP Search’s metrics come from Solr, specifically a Solr request handler called the MBeanRequestHandler, which provides access to internal statistics. The handler is described in more detail in the Apache Solr Reference Guide section MBean Request Handler.

Solr’s metrics are split into several groups. Ambari’s configuration options allow you to decide what metrics you will collect. There are several options:

Enable Solr Metrics

Allows enabling metrics collection by the Ambari Metrics System.

Solr Metrics configuration directory

The directory where the metrics configuration file is found. The default is /opt/lucidworks-hdpsearch/metrics/conf.

Solr Metrics log directory

The directory where metrics logs will be stored. The default is /opt/lucidworks-hdpsearch/metrics/log.

Solr Metrics PID directory

The directory where the metrics process ID will be stored. The default is /opt/lucidworks-hdpsearch/metrics/var.

Solr Cache stats

Enable this for metrics for Solr’s caches to be sent to the Ambari Metrics Collector. The default Grafana dashboard included with HDP Search has a section for these statistics ("Caches"); if this is disabled that section of the default dashboard may return an error.

More information on Solr’s caches is available from the Apache Solr Reference Guide section on Caches.

Solr Core stats

Enable this for metrics for Solr’s searcher to be sent to the Ambari Metrics Collector.

The searcher is responsible for processing all queries. This data is not reflected in the default Grafana dashboard (in preference to individual stats from the QueryHandler, described below).

Solr QueryHandler stats

Enable this for metrics for Solr’s query request handlers to be sent to the Ambari Metrics Collector. The default Grafana dashboard included with HDP Search has a section for these statistics ("Queries"); if this is disabled that section of the default dashboard may return an error.

This section of the metrics tracks all queries by the individual request handlers used for each query request. When this is requested data for all request handlers will be sent to the Ambari Metrics System, but only 5 are chosen for display in the Grafana dashboard: /select, /query, /get, /export and /browse.

More information is available from the Apache Solr Reference Guide section RequestHandlers and SearchComponents in SolrConfig.

Solr UpdateHandler stats

Enable this for metrics for Solr’s update handlers to be sent to the Ambari Metrics Collector. The update handler processes all documents a user has requested to be added to the system.

The default Grafana dashboard included with HDP Search has a section for these statistics ("Indexing"); if this is disabled that section of the default dashboard may return an error.

Solr System stats

Enable this for metrics for Solr’s overall system health to be sent to the Ambari Metrics Collector. These stats are used for the default Ambari alerts (Solr CPU Usage and Solr Memory Usage); if they are disabled, those alerts will not be able to provide warnings in case critical thresholds are met or exceeded.

solr_metrics_properties

A template properties file to further customize Solr’s metrics collection.

Custom Sections

Custom configuration sections are displayed, but are not currently used. If you enter any local parameters in those sections, they will possibly be ignored.