3.1.3 Release Notes

Release date: 19 December 2017

Component versions:

  • Solr 6.6.2

  • ZooKeeper 3.4.6

  • Spark 2.1.1

New features

  • Solr 6.6.2

    This release incorporates Solr 6.6.2, which includes a critical security fix for the CVE-2017-12629 zero-day exploit. Additional changes are described in the Solr 6.6.2 release notes.

Improvements

  • Web connector improvements

    The Web connector now crawls JavaScript-enabled Web sites faster with the use of a headless instance of Firefox as a crawling agent. A new f.useFirefox parameter ("true" by default) enables JavaScript evaluation using the Firefox browser. (f.crawlJS must also be "true".) If it is set to "false", then a Java-embedded browser called JBrowserDriver is used. We recommend you use Firefox when possible because it provides more reliable JavaScript evaluation behavior.

    Firefox is now installed as part of the Web connector package; you do not need to install Firefox separately. However, you do need GTK3 as a prerequisite. See the Web connector for instructions.

    Two additional parameters are available when f.useFirefox is "true":

    • f.firefoxBinaryPath configures the path to the Firefox binary.

    • f.firefoxHeadlessBrowser can be set to "false" to display the Firefox browser windows during processing.

    Additionall, a new f.discardLinkURLAnchors parameter has a default value of "true". Set this to "false" when crawling sites with links containing anchors.

  • Jive connector improvements

    • Events, videos, and ideas can now be indexed.

      The Jive connector now handles content whose type is event, video, or idea.

    • A new fetch_personal_blogs parameter enables fetching user blogs.

    • The default value for places_to_crawl has changed from "all" to "space,group".

    • The Jive connector now stores and re-tries documents without ACLs.

      When security trimming is enabled, the connector validates the acl_ss field before sending a document to the index pipeline. If it is null or empty, then the document is stored in the crawDB as a failed item, but it is also sent to the pipeline. In the next job, failed items (including documents without ACLs) are put in the queue to be processed, so they can be re-indexed with permissions.

    • The connector now updates the index when ACLs have changed.

      Content is re-indexed when an ACL change is detected. If no such change is detected, then only new or modified content is re-indexed.

    • Jive connector logging is now less verbose.

  • JDBC connector improvements

    A new manually_uploaded_driver parameter can be used to specify the JDBC driver class name when a driver is loaded manually into fusion/3.1.3/apps/libs/ instead of into the blob store.

  • Sharepoint connector improvements

    The Sharepoint connector no longer retries failed crawls by default. To enable and configure retries, use the following parameters in the conf/fusion.properties file:

    • -Dconnectors.sharepoint.retry.enabled

    • -Dconnectors.sharepoint.retry.waitMultiplier

    • -Dconnectors.sharepoint.retry.waitMaxTimeMs

    • -Dconnectors.sharepoint.retry.stopAfterAttempt

    The example below shows the default values which you can adjust to suit your needs:

    connectors.jvmOptions=-Xmx1g -Xss256k  -Dcom.lucidworks.connectors.pipelines.embedded=false -Dconnectors.sharepoint.retry.enabled=false -Dconnectors.sharepoint.retry.waitMultiplier=300 -Dconnectors.sharepoint.retry.waitMaxTimeMs=10000 -Dconnectors.sharepoint.retry.stopAfterAttempt=4

Other changes

  • The job scheduler now works correctly when the environment includes multiple Fusion API service nodes.

  • Copying a field using the Field Mapping index stage no longer causes the Solr Dynamic Field Mapping stage to change the name of the original field.

  • The Sharepoint connector now escapes invalid URL characters in documents.

  • The Box.com connector now applies any configured proxy connector before attempting to authenticate with the Box.com service.

  • Passwords for JDBC and Jira datasources are no longer saved in plaintext on disk.

  • The Box.com connector now correctly stores filenames as foo.txt instead of foo.txt/foo.txt.

  • Script jobs no longer fail when their names contain spaces.