3.1.0 Release Notes

Note
A migration tool is available for upgrading to Fusion 3.1 from versions 3.0.x

Release date: 20 June 2017

Component versions:

  • Solr 6.5.1

  • ZooKeeper 3.4.6

  • Spark 2.1.1

New features

  • Connectors are now a la carte

    Fusion now ships with a set of basic connectors, reducing the download size. Additional connectors can be downloaded separately. See Connectors Configuration Reference for a list of basic connectors and instructions for installing additional connectors.

  • Spark recommenders

    Fusion’s recommendations are now powered by Spark, with new query pipeline stages and a new recommendation jobs interface.

    See Spark jobs for a list of built-in Spark jobs for generating recommendations.

    The Recommend More Like This query pipeline stage introduced in an earlier release is no longer experimental, and there are three new query pipeline stages for recommenders, listed below.

  • New Recommend Items for Item query pipeline stage

  • Object Explorer

    The Fusion UI now offers a means of visualizing Fusion objects and their relationships. You can open Object Explorer by pressing Ctrl+K (on a PC) or Command ⌘+K (on a Mac) anywhere in the Fusion UI.

  • New Sharepoint Online connector for Sharepoint Office 365

    A new connector supports crawling Sharepoint Office 365 online sites. To crawl locally-hosted Sharepoint sites, use the pre-existing Sharepoint connector.

  • New APIs

    • Links API

      This API manages the links between Fusion objects. You can view the links graphically using the Object Explorer.

    • Jobs API

      This API now configures job schedules. The Scheduler API is deprecated in favor of this one.

    • Tasks API

      Tasks are one of the job types that can be scheduled using the new Jobs API or the Fusion UI.

    • Groups API

      A group is a means of tagging objects with a shared, arbitrary identifier. The Groups API creates and configures groups. You can also view and modify groups in the Object Explorer.

  • New parsers

    • HTML parser

      HTML can now be parsed separately, instead of using the Apache Tika parser. The new HTML parser supports the use of JSoup selectors to extract HTML and CSS elements into new documents or fields.

    • XML parser

      XML can also be parsed separately, and XML nodes can be extracted into new documents.

  • Security between Fusion and Solr

    Fusion can now communicate with Solr over SSL, with basic authentication or Kerberos.

  • New garbage collection configuration options

    New options are available in fusion.properties to control garbage collection. Several options are pre-configured, and you can create new options by defining them in fusion.properties, for example as gcOptions.myCustomOption = myValue.

Improvements

  • Improved jobs and schedules management

    Jobs and schedules are now separate. In this release, a job is any runnable Fusion object. A job configuration may include zero or more schedules.

    • There are three job types: tasks, Spark jobs, and datasource jobs.

    • The "scheduler jobs" used in previous releases are now a job type called tasks.

      Note
      When upgrading from a previous release, scheduler jobs will remain visible in the (deprecated) Scheduler API, but they will not be visible in the 3.1 UI. A migration script will be provided to move scheduler jobs into the new framework.
    • Aggregations and recommendations are now subtypes of the Spark job type, managed using the Spark Jobs API.

    • A new Jobs API manages schedules for jobs.

    • A new interface at Search > Jobs lets you define tasks and Spark jobs, and schedule all types of jobs.

    • The interface at DevOps > Scheduler has been redesigned.

  • Improved blob store management

    • The Fusion UI now provides a blob manager features at DevOps > Blobs.

    • The Blob Store API now allows specifying a resourceType when uploading or retrieving blobs.

    • Blob IDs may now contain slashes (/).

  • Connector enhancements:

    • Non-embedded pipelines

      In Fusion 3.0 and earlier, each connector sent content to an embedded IndexPipeline within the same connector node. In Fusion 3.1, send each document to one of any number of IndexPipeline nodes, using multi-threaded load balancing. This allows you to scale the indexing aspect of processing. An IndexPipeline node exists wherever the Fusion API service is running.

      To revert to the previous embedded IndexPipeline behavior, open conf/fusion.properties and add the following property to connectors.jvmOptions:

      -Dcom.lucidworks.connectors.pipelines.embedded=true

      Then restart the connectors JVM by running bin/connectors restart.

    • The f.includedMimeTypes and f.excludedMimeTypes parameters have been removed from all connectors that previously supported them.

    • Jira connector enhancements

      Improved fetching performance, support for parsers, support for security trimming.

    • Web connector enhancements

      New parameters:

      • f.crawlJS

        This new option enables the connector to accurately crawl Javascript-powered Web sites. See Website Connector and Datasource Configuration.

        Note
        This feature requires Oracle JDK with JavaFX, or OpenJDK with OpenJFX.
      • f.basicAuth

      • f.digestAuth

      • f.formAuth

      • f.ntlmAuth

      • f.samlAuth

      • f.jsAjaxTimeout

      • f.jsPageLoadTimeout

      • f.jsScriptTimeout

      • parserRetryCount

    • Jive connector enhancements

      The Jive connector now parses lists and map data. No special configuration is needed.

    • Box.com connector enhancements

      The Box.com connector has better support for incremental crawling in Fusion 3.1. No special configuration is required; the new batch_incremental_crawling configuration key is "true" by default. The default value of retainOutlinks has changed from "true" to "false". Below is a list of all new configuration parameters for this connector:

      • batch_incremental_crawling

      • f.fs.childrenPageSize

      • f.fs.distributedCrawlCollectionName

      • f.fs.distributedCrawlDatasourceIndex

      • f.fs.excludedExtensions

      • f.fs.nestedFolderDepth

      • f.fs.numDistributedDatasources

      • f.fs.numPreFetchIndexCreationThreads

      • f.fs.partitionBucketCount

      • f.fs.user_excludes

    • The Google Drive connector has these new configuration keys:

      • batch_incremental_crawling

      • f.fs.enableSecurityTrimming

      • f.fs.connectTimeout

      • f.fs.indexTrash

      • f.fs.readTimeout

      • parserRetryCount

    • The Jira connector has these new configuration keys:

      • parserId

      • f.issueFieldsToIndex

      • f.retryDelay

      • f.stopRetry

    • The Drupal connector and the SharePoint connector have one new configuration parameter, parserRetryCount, which specifies the maximum number of times the configured parser will try getting content before giving up.

  • Query pipeline enhancements:

    • The Recommendation Boosting stage is now called Boost With Signals and has these new configuration parameters:

      • boostingMethod

      • boostingParam

      • scaleRange

    • The User Recommendation Boosting stage is now called Recommend Items for User and has these new configuration parameters:

    • resultsLocation

    • modelIdField

    • scaleRange

    • foldInUpdates

    • boostFieldName

    • boostingMethod

    • boostingParam

    • userIdParam

    • userIdField

    • itemIdField

    • weightField

    • rawSignalsCollection

+ For more information, see Recommendations and Boosting.

  • Index pipeline enhancements:

    • The Machine Learning index stage has a new parameter, docFeatureFieldName, which specifies a field in your input documents to feed into the machine learning model. It could be a field dedicated to this purpose, or it could be the entire document in one catch-all field, such as body_s. For an example of how to use this parameter, see the blog post Machine Learning in Lucidworks Fusion.

  • Improved Objects API

    • The set of valid values for the type parameter has changed:

      • New object types are group, link, task, job, and spark.

      • The aggregation object type has been removed. To export aggregations, specify the spark object type.

      • The schedule object type has been removed. To export schedules, specify the job object type.

    • The export endpoint has two new parameters:

      • filterPolicy

        One of: 'system' (filter system objects when exporting a particular type of object) or 'none' (export all objects of that type).

      • deep

        'True' to include all linked objects.

  • More compact logging

    Log message output is now less verbose, reducing the required log storage space.

Other changes

  • The names of some connector plugin types have changed. The table below compares the names of connectors from 3.0 and earlier with their 3.1 equivalents.

    Tip
    To see the configuration schema for any plugin, append the strings below to the /api/apollo/connectors/ endpoint, as in this example: curl -u user:pass http://localhost:8764/api/apollo/connectors/plugins/lucid.zendek/types/zendesk
    Table 1. Connector plugin types
    3.0 and earlier 3.1

    plugins/lucid.anda/types/alfresco

    plugins/lucid.cmis/types/cmis

    plugins/lucid.anda/types/box

    plugins/lucid.box/types/box

    plugins/lucid.anda/types/dropbox

    plugins/lucid.dropbox/types/dropbox

    plugins/lucid.anda/types/drupal

    plugins/lucid.drupal/types/drupal

    plugins/lucid.anda/types/file

    plugins/lucid.file/types/file

    plugins/lucid.anda/types/github

    plugins/lucid.github/types/github

    plugins/lucid.anda/types/googledrive

    plugins/lucid.googledrive/types/googledrive

    plugins/lucid.anda/types/javascript

    plugins/lucid.javascript/types/javascript

    plugins/lucid.anda/types/jira

    plugins/lucid.jira/types/jira

    plugins/lucid.anda/types/sharepoint

    plugins/lucid.sharepoint/types/sharepoint

    plugins/lucid.anda/types/subversion

    plugins/lucid.subversion/types/subversion

    plugins/lucid.anda/types/web

    plugins/lucid.web/types/web

    plugins/lucid.azure/types/azure

    plugins/lucid.azure/types/azure

    plugins/lucid.couchbase/types/couchbase

    plugins/lucid.couchbase/types/couchbase

    plugins/lucid.fileupload/types/fileupload

    plugins/lucid.fileupload/types/fileupload

    plugins/lucid.fs/types/ftp

    plugins/lucid.ftp/types/ftp

    plugins/lucid.fs/types/hdfs

    plugins/lucid.hdfs/types/hdfs

    plugins/lucid.fs/types/s3)

    plugins/lucid.s3/types/s3

    plugins/lucid.fs/types/smb

    plugins/lucid.smb/types/smb

    plugins/lucid.hadoop.apache2/types/hadoop

    plugins/lucid.hadoop-apache2/types/hadoop-apache2

    plugins/lucid.jdbc/types/jdbc

    plugins/lucid.jdbc/types/jdbc

    plugins/lucid.jive/types/jive

    plugins/lucid.jive/types/jive

    plugins/lucid.mongodb/types/mongodb

    plugins/lucid.mongodb/types/mongodb

    plugins/lucid.push/types/push

    plugins/lucid.push/types/push

    plugins/lucid.salesforce/types/salesforce

    plugins/lucid.salesforce/types/salesforce

    plugins/lucid.servicenow/types/servicenow

    plugins/lucid.servicenow/types/servicenow

    plugins/lucid.slack/types/slack

    plugins/lucid.slack/types/slack

    plugins/lucid.solr/types/solr

    plugins/lucid.solr/types/solr

    plugins/lucid.solrxml/types/solrxml

    plugins/lucid.solrxml/types/solrxml

    plugins/lucid.twitter.search/types/twitter_search

    plugins/lucid.twitter-search/types/twitter-search

    plugins/lucid.twitter.stream/types/twitter_stream

    plugins/lucid.twitter-stream/types/twitter-stream

    plugins/lucid.zendesk/types/zendesk

    plugins/lucid.zendesk/types/zendesk

  • As of this release, Fusion fully supports machine learning.

  • The Scheduler API and the Aggregator API are deprecated and will be removed in a future release. Use the new Jobs API instead.

Known issues

  • Time-based partitioning does not work with the searchlogs collection.