Fusion Server 4.2.0 Release Notes

Table of Contents

New features
Improvements
Other changes
Known issues

Release date: 28 February 2019

Component versions:

Solr 7.5

ZooKeeper 3.4.13

Spark 2.3.1

Jetty 9.4.11.v20180605

Ignite 2.3.0

More information about support dates can be found at Lucidworks Fusion Product Lifecycle.

See also the App Studio 4.2.0 release notes.

New features

Dynamic, real-time system visualizations with the new DevOps Center

The DevOps Center generates real-time dashboards for visualizing metrics throughout your Fusion system, plus a log viewer where you can explore events and focus on specific event types. You can also export metrics and events for any timeframe, in CSV format, for external analysis.
Faster query pipeline performance with asynchronous stage processing

Query pipelines can now be forked for parallel processing so that faster stages can proceed while requests to external resources wait for their responses.

In this release, you can enable asynchronous execution in the following query stages:
A new Merge Async Results stage joins the forked pipeline before the final Solr stage.

For complete instructions, see Query Pipelines.
Sitecore support

The new Sitecore connector provides full crawl and incremental crawl support for versions 8.x and 9.x of the popular Sitecore CMS, indexing both document content and metadata.

A new Dropbox connector supports the latest Dropbox API.

New Solr Update XML parser

This is a simple parser for Solr’s various update formats (XML, CSV, JSON, and javaBin).

New query pipeline stages

These new query pipeline stages support new Fusion AI 4.2.0 features as well as the asynchronous pipeline processing described above.
- Query rewriting stages:
  
  Text Tagger
  
  Apply Rules
- Response rewriting stages:
  
  Modify Response with Rules
  
  Response Document Exclusion Stage
  
  Response Document Field Redaction Stage
  
  Response Pairwise Swap
  
  Response Shuffle Stage
- Merge Async Results stage for asynchronous processing (see above).
The default query pipeline now includes the new Text Tagger, Apply Rules, and Modify Response with Rules stages.
New collections
- query_rewrite_staging
  
  Rules and certain Spark job results are written to this collection temporarily. See Query Rewriting for details.
- query_rewrite
  
  Query pipelines read rules and job results from this collection in order to perform query rewriting. Docs are migrated to this collection from the query_rewrite_staging collection.
- job_reports
  
  Job histories are now written to this collection.
- _user_prefs
  
  This collection stores App Studio social data, such as user tags, bookmarks, and so on.
- system_monitor
  
  The new system metrics used for the DevOps Center are written to this collection. These new metrics replace the metrics previously written to the system_metrics collection.

New REST APIs and endpoints
- New Custom Rules API
- New Query Rewrite API
- New Webapps Admin API
- New Webapps Appkit API
  
  The Webapps API is deprecated in favor of the new Webapps Appkit API.
- /index-pipelines/{id}/collections/{collection}/indexMultiple submits a set of documents to an index pipeline.
- /spark/reports/{job} gets the job results from a specific job.
- /webapps/{id}/war/manifest gets the .war file manifest for specified Web app.

Improvements

Broader access to the Object Explorer

A new Explore button appears in configuration panels for Fusion objects that can be viewed in the Object Explorer. Click the button to see the object’s relationships to other Fusion objects.

Use external management tools to control Fusion configurations

Many of the values in conf/fusion.properties can now be set using environment variables, enabling you to set them using systemd, Docker, Kubernetes, and so on. Default values are also provided. For example, in api.port = ${API_PORT:-8765}, the default value is 8765 unless API_PORT is defined.

Note that ZOOKEEPER_PORT cannot not be used, and value of zookeeper.port in fusion.properties must be the same as the value of clientPort in conf/zookeeper/zoo.cfg.
Improved connectors functionality
- Improved incremental crawl performance across all connectors with MapDB upgrade.
- Kerberos support in the Jive connector.
- javascript evaluation.
- Web connector now supports website authentication credentials files in container path.
- javascript evaluation. for crawling websites.
- SharePoint V1 connector now have bulk start link URL list import.
- Confluence connector now supports API token based authentication.
- The SMB2/3 Connector has improved support for crawling distributed file systems.
- The SMB2/3 Connector now saves the original file path and redirected file paths when crawling distributed file systems.
- JDBC V1 connector has improved settings for managing index commits during crawls.
- The OneDrive connector can now crawl user-specific drives.
- Some connectors now give you the option to index metadata about documents that were discarded because they were too large or too small using the new f.index_items_discarded/Index discarded document metadata parameter. A new field, _lw_skipped_reason_s, indicates the reason that the document was skipped during indexing. The new key is available for these connectors:
  - Web: default = false
  - Sharepoint: default = true
  - Sharepoint Online: default = true
  - SMB2: default = false
  - Box: default = false
  - Google Drive: default = false
  - Dropbox: default = false
  - Local Filesystem: default = false
- Several connectors have a new Enable Plugin Parsing/pluginParsing parameter. When it is enabled, the connector parses raw content before streaming it to the index pipeline. The following connectors support this parameter:
  - Local Filesystem
  - OneDrive
  - Sitecore
  - Windows Share (SMB 2/3)

The dashboards framework has been upgraded to Banana 1.6.23. See the Banana release notes.

For tighter security, CORS is now disallowed by default. You can enable it, if needed, by editing the proxy.corsAllowOrigin property in conf/fusion.properties.

Other changes

The Synonyms UI is no longer available. See the new Synonym Detection feature, available with a Fusion AI license.
The synonyms collection has been replaced with the new query_rewrite collection.

The Recommendations API is deprecated and will be removed in a future release.

Known issues

When under load, the Fusion proxy service can occasionally become stuck, causing user authentication to fail. This is the result of the proxy InputStream failing to close properly.

An upgrade to Fusion 4.2.4 is required to fix this issue. See Upgrade Fusion.
Connectors
- Repeatedly stopping a V2 datasource job and clearing the datasource may result in an out-of-memory condition. To recover from this state, restart the connectors-rpc process:
  fusion/4.2.0/bin/connectors-rpc restart
- In a cluster environment, after installing a V2 connector (Onedrive connectors) where multiple connectors-rpc nodes are running, the connector may only install and run on one node instead of propagating to all nodes. If this happens, restart all nodes and then re-install the connector.
- Although the Web V1 connector’s default value for crawlDBType/Crawl database type is "in-memory", this can cause an out-of-memory condition when crawling large sites. Change the value to "on-disk".
- If new items are not picked up when recrawling a Box.com folder, delete the records in the system_box_distributed_crawl collection, like this:
  curl “http://localhost:8983/solr/system_box_distributed_crawl/update?commit=true” -H “Content-Type: text/xml” --data-binary ‘<delete><query>*:*</query></delete>’
  Then run the datasource job again.
- The Local Filesystem connector may slow down while crawling an empty folder.
- With the SMB2/3 connector, if multiple start links point to some of the same data, then the data is indexed multiple times. Remove redundant start links and use only the "parent" link.
- FS (V2) connector does not save Item metadata in Crawldb. recrawls do not work as expected (some items will be missing or not be evaluated on the next crawl).
- When re-installing connector using same plugin id but different file name a deadlock condition may occur resulting in a timeout error.
Fusion UI
- In the Query Workbench, some document fields may not appear in the dropdown list of fields for faceting. To work around this, enter the name of the field in the text box and press Return. If the field exists in the dataset, it will be added as a facet even though it does not appear in the list.
- When using Compare mode in the Query Workbench, configuring the list of display fields may change the display in both panels instead of only the working panel.
- After logging out using Chrome version 73.0.3683.75, the login page may not automatically appear. To work around this, do a hard refresh (by holding down the CTRL key while clicking the Reload button).
- In the Query Rewriting UI, creating a query rewrite with the same name as an existing query rewrite deletes the existing one. Be sure to create new query rewrites using unique names.
- Jobs may display incorrect information about their current status or the time at which they last ran. To work around this, use the Jobs API to verify a job’s status and history.
- After you upload a .war file to the App Studio interface, the View Published UI button disappears from the App Studio configuration panel. To restore this button, click Edit, then click Return to Fusion.
- New index pipelines may not appear in the Index Workbench until you do a hard refresh (by holding down the CTRL key while clicking the Reload button).
- In the Query Rewriting UI, after selecting multiple business rules where some rules have tags, adding more tags to the selected rules deletes their existing tags. To work around this, add tags to individual rules instead of adding them in bulk.