3.0.0 Release NotesSubscribe

Release date: 25 January 2017

Component versions:

  • Solr 6.3.0

  • ZooKeeper 3.4.6

  • Spark 1.6.3

New features

  • Launcher

    The new home screen for the Fusion UI provides contextualized tools for search, analytics, and dev ops.


  • Quickstart UI

    The Fusion UI now provides a Quickstart interface for setting up your first datasource. You can try your own dataset, or take a tour of Fusion’s features using a built-in sample dataset.

    Quickstart 0

    The Quickstart launches automatically the first time you log in to Fusion. You can also launch it by navigating to the Launcher and clicking Try the Quickstart. Whether you’re a new Fusion user or an expert, the Quickstart provides a simple way to quickly ingest a dataset before refining the index or query pipelines.

  • Query Workbench

    The Fusion UI introduces the Query Workbench, a tool for developing query pipelines. The Workbench replaces the Search UI and allows users to evaluate and fine-tune their search results in real time. It combines the ability to configure fully-customizable pipelines and modify relevant settings while monitoring the results of the changes in a live search UI.

  • Improved startup scripts

    The Fusion agent launches and manages all components in a Fusion deployment.

  • Support for desktop uploads

    Now you can use the Fusion UI to select a file on your desktop host and upload it for indexing.

  • New APIs

    • Parsers API

    • Catalog API

      For data analytics applications, the Catalog API provides access to Fusion data assets. It includes endpoints for finding, retrieving, and manipulating projects and assets using basic keyword and metadata-driven search, using SQL or Solr queries.

    • Experiments API

      Now you can use the Experiments API to compare different configuration variants and determine which ones are most successful. For example, configure different variants that use different query pipelines or recommendations, then analyze and compare search activity to see which variant best meets your goals.

    • Objects API

      The Objects API lets you migrate objects between Fusion instances.

  • Fusion services as Windows services

    Fusion 3.0 can be installed as a set of Windows services using the bin/install-services.cmd command. Start and stop the Fusion Windows services using bin\start-services.cmd and bin\stop-services.cmd. See Installing Single-Node Fusion for details.

  • New user roles

    • The new developer role has all the read/write permissions required for search.

    • The new search role has the read-only search permissions required for Lucidworks View.

      The collection-admin, search, and ui-user have been eliminated in this release.


  • Memory footprint reduction (core + JVM)

    In version 3.0, Fusion’s core and JVM consume 20–40% less memory compared to earlier versions. Memory is conserved most effectively when Spark is not used. When Spark is used, memory usage is scaled dynamically in proportion to the size of the dataset.

  • Improved index pipeline development tools

    The Index Pipeline Simulator introduced in Fusion 2.4 is now the Index Workbench. The new Index Workbench is a powerful tool for previewing the impact of the configuration of Parsers and Index Pipelines before saving the changes.

  • Improvements to the Hadoop connector

    The Hadoop connector now includes a Fusion client to allow the output from a Hadoop job to go to a Fusion index pipeline, instead of directly to Solr.

    Several new parameters are introduced in 3.0:

    • List of Fusion Endpoints (fusion_endpoints): This is an index pipeline in Fusion. The host & port are provided, but the rest of the pipeline needs to be provided.

    • Fusion client`s Authentication (fusion_realm): The type of authentication that the Hadoop job should use when sending documents to the pipeline. The available options are NATIVE and KERBEROS. If Native is chosen, a Fusion username and password need to also be provided. If Kerberos is chosen, a Kerberos principal needs to be provided.

    • User/Principal (fusion_user): The Fusion username or Kerberos principal to use to authenticate.

    • Password (fusion_password): The password for the Fusion user, if NATIVE auth is being used. If Kerberos auth is used, this can be blank.

    • Login Config (fusion_login_config): The path to the JAAS file that has the Kerberos config to authenticate to Fusion.

    • Config App Name (fusion_login_app_name): The name of the section of the JAAS file with the Kerberos config to authenticate to Fusion.

    We replaced one complex configuration parameter, "Job Jar Args" (job_jar_args), with several more specific parameters for simpler configuration:

    • Input source (hadoop_input): The path to the data in HDFS. This replaces the need to specify -i in the job jar arguments.

    • Mapper (hadoop_mapper): This is now a pull-down list of the available Ingest Mappers. This replaces the need to specify -cls in the job jar arguments.

    • Number of reducers (reducers): This is a Hadoop concept. The default is to use 0 reducers, which is usually adequate. This replaces the need to specify -ur in the job jar arguments.

    • Additional Job Jar arguments (mapper_args) can be defined with a key-value list, where the user clicks the green "plus" icon to add a new name-value for a parameter. Depending on the selected mapper, you may need to add arguments here. The available arguments are provided in a pull-down list.

  • Aggregation jobs enabled by default when signals are enabled

    When signals are enabled for any collection, Fusion automatically creates the secondary collections <collection-name>_signals and <collection-name>_signals_aggr, and creates a click signals aggregation job called click-signals-<collection-name> that is scheduled to run every two minutes.

  • Support for Jive groups

    The new places_to_crawl key lets you list Jive spaces, groups, projects, or personal containers to crawl.

  • Better OAuth integration for connectors

Other changes

  • The CSV Parsing and JSON Parsing index pipeline stages are deprecated. Their functions are now performed by parsers.

  • Performance during evaluation of JavaScript conditions in pipeline stages is significantly improved. Note that existing complex JavaScript snippets that consist of multiple instructions must be rewritten as explicit functions.

  • Fusion configuration properties are now located in conf/fusion.properties.

  • Logstash and Slack connectors have been removed.

  • The Search UI has been removed; use the Query Workbench instead.

  • Sending pipeline documents to the Index Pipeline API now requires using the special Content-Type: application/vnd.lucidworks-document. Documents with Content-Type: application/json will not be parsed correctly.

  • Pipeline configurations now update correctly when properties are removed.

Known Issues

  • Creating a collection with invalid Solr config can delete all Fusion data.

    A bug in Solr 6.3 creates a risk of data loss when a collection is created using an invalid configuration, resulting in a missing or empty $FUSION_HOME/data directory.

    Fusion 3.0.1 will ship with Solr 6.4, which will include a fix for this bug. Until then, use care when uploading new configurations, or use an earlier version of Solr.

  • SSL configuration is not straightforward.

    Though Fusion 3.0 includes simpler overall configuration using the new fusion.properties file, SSL configuration is not yet simplified. See the instructions for enabling SSL in the Fusion UI to learn how to perform this configuration in Fusion 3.0.0.

  • Aggregation jobs consume too much disk space under apps/spark-dist/work.

    When running in cluster mode with signals enabled, aggregation jobs can cause the disk to fill up. A workaround for this issue is described in this knowledge base article.

  • The Tika parser and Tika stage Return parsed content as XML option works only if the input document is also HTML/XML.