3.1.0 Release Notes

Release date: 20 June 2017

Component versions:

  • Solr 6.5.1

  • ZooKeeper 3.4.6

  • Spark 2.1.1

New features

  • Connectors are now a la carte

    Fusion now ships with a set of basic connectors, reducing the download size. Additional connectors can be downloaded separately. See Connectors Configuration Reference for a list of basic connectors and instructions for installing additional connectors.

  • Spark recommenders

    Fusion’s recommendations are now powered by Spark, with new query pipeline stages and a new recommendation jobs interface.

    See Spark jobs for a list of built-in Spark jobs for generating recommendations.

    The Recommend More Like This query pipeline stage introduced in an earlier release is no longer experimental, and there are three new query pipeline stages for recommenders, listed below.

  • New Recommend Items for Item query pipeline stage

  • Object Explorer

    The Fusion UI now offers a means of visualizing Fusion objects and their relationships. You can open the Object Explorer by typing CTRL-K anywhere in the Fusion UI.

  • New Sharepoint Online connector for Sharepoint Office 365

    A new connector supports crawling Sharepoint Office 365 online sites. To crawl locally-hosted Sharepoint sites, use the pre-existing Sharepoint connector.

  • New APIs

    • Links API

      This API manages the links between Fusion objects. You can view the links graphically using the Object Explorer.

    • Jobs API

      This API now configures job schedules. The Scheduler API is deprecated in favor of this one.

    • Tasks API

      Tasks are one of the job types that can be scheduled using the new Jobs API or the Fusion UI.

    • Groups API

      A group is a means of tagging objects with a shared, arbitrary identifier. The Groups API creates and configures groups. You can also view and modify groups in the Object Explorer.

  • New parsers

    • HTML parser

      HTML can now be parsed separately, instead of using the Apache Tika parser. The new HTML parser supports the use of JSoup selectors to extract HTML and CSS elements into new documents or fields.

    • XML parser

      XML can also be parsed separately, and XML nodes can be extracted into new documents.

  • Security between Fusion and Solr

    Fusion can now communicate with Solr over SSL, with basic authentication or Kerberos.

  • New garbage collection configuration options

    New options are available in fusion.properties to control garbage collection. Several options are pre-configured, and you can create new options by defining them in fusion.properties, for example as gcOptions.myCustomOption = myValue.

Improvements

  • Improved jobs and schedules management

    Jobs and schedules are now separate. In this release, a job is any runnable Fusion object. A job configuration may include zero or more schedules.

    • There are three job types: tasks, Spark jobs, and datasource jobs.

    • The "scheduler jobs" used in previous releases are now a job type called tasks.

      Note
      When upgrading from a previous release, scheduler jobs will remain visible in the (deprecated) Scheduler API, but they will not be visible in the 3.1 UI. A migration script is provided to move scheduler jobs into the new framework.
    • Aggregations and recommendations are now subtypes of the Spark job type, managed using the Spark Jobs API.

    • A new Jobs API manages schedules for jobs.

    • A new interface at Search > Jobs lets you define tasks and Spark jobs, and schedule all types of jobs.

    • The interface at DevOps > Scheduler has been redesigned.

  • Improved blob store management

    • The Fusion UI now provides a blob manager features at DevOps > Blobs.

    • The Blob Store API now allows specifying a resourceType when uploading or retrieving blobs.

    • Blob IDs may now contain slashes (/).

  • Distributed pipelines

    Index pipelines can now be invoked on a different node than the one on which the connector is running.

  • Connector enhancements:

    The f.includedMimeTypes and f.excludedMimeTypes parameters have been removed from all connectors that previously supported them.

    • Jira connector enhancements

      Improved fetching performance, support for parsers, support for security trimming.

    • Web connector enhancements

      New parameters:

      • f.crawlJS

        This new option enables the connector to accurately crawl Javascript-powered Web sites. See Website Connector and Datasource Configuration.

        Note
        This feature requires Oracle JDK with JavaFX, or OpenJDK with OpenJFX.
      • f.basicAuth

      • f.digestAuth

      • f.formAuth

      • f.ntlmAuth

      • f.samlAuth

      • f.jsAjaxTimeout

      • f.jsPageLoadTimeout

      • f.jsScriptTimeout

      • parserRetryCount

    • Jive connector enhancements

      The Jive connector now parses lists and map data. No special configuration is needed.

    • Box.com connector enhancements

      The Box.com connector has better support for incremental crawling in Fusion 3.1. No special configuration is required; the new batch_incremental_crawling configuration key is "true" by default. The default value of retainOutlinks has changed from "true" to "false". Below is a list of all new configuration parameters for this connector:

      • batch_incremental_crawling

      • f.fs.childrenPageSize

      • f.fs.distributedCrawlCollectionName

      • f.fs.distributedCrawlDatasourceIndex

      • f.fs.excludedExtensions

      • f.fs.nestedFolderDepth

      • f.fs.numDistributedDatasources

      • f.fs.numPreFetchIndexCreationThreads

      • f.fs.partitionBucketCount

      • f.fs.user_excludes

    • The Google Drive connector has these new configuration keys:

      • batch_incremental_crawling

      • f.fs.enableSecurityTrimming

      • f.fs.connectTimeout

      • f.fs.indexTrash

      • f.fs.readTimeout

      • parserRetryCount

    • The Jira connector has these new configuration keys:

      • parserId

      • f.issueFieldsToIndex

      • f.retryDelay

      • f.stopRetry

    • The Drupal connector and the SharePoint connector have one new configuration parameter, parserRetryCount, which specifies the maximum number of times the configured parser will try getting content before giving up.

  • Query pipeline enhancements:

    • The Recommendation Boosting stage is now called Boost With Signals and has these new configuration parameters:

      • boostingMethod

      • boostingParam

      • scaleRange

    • The User Recommendation Boosting stage is now called Recommend Items for User and has these new configuration parameters:

    • resultsLocation

    • modelIdField

    • scaleRange

    • foldInUpdates

    • boostFieldName

    • boostingMethod

    • boostingParam

    • userIdParam

    • userIdField

    • itemIdField

    • weightField

    • rawSignalsCollection

+ For more information, see Recommendations and Boosting.

  • Index pipeline enhancements:

    • The Machine Learning index stage has a new parameter, docFeatureFieldName, which specifies a field in your input documents to feed into the machine learning model. It could be a field dedicated to this purpose, or it could be the entire document in one catch-all field, such as body_s. For an example of how to use this parameter, see the blog post Machine Learning in Lucidworks Fusion.

  • Improved Spark Jobs API

    This release includes new endpoints:

  • Improved Objects API

    The export endpoint has two new parameters:

    • filterPolicy

      One of: 'system' (filter system objects when exporting a particular type of object) or 'none' (export all objects of that type).

    • deep

      'True' to include all linked objects.

  • More compact logging

    Log message output is now less verbose, reducing the required log storage space.

Other changes