The Web V2 connector has been updated to version 2.1.1.This release delivers feature enhancements and stability improvements to strengthen functionality, streamline configuration, and improve the user experience.Fixes and improvements:
- Fixed an authentication regression introduced in v2.0.0 that caused certain authenticated datasources to fail with a “Username may not be null” error, even when credentials were configured. Authentication now works as expected for these datasources.
- Restored support for viewport-related configuration properties used during JavaScript evaluation when indexing pages. Viewport width, viewport height, and device screen factor settings are now applied correctly by passing the configured values as command-line arguments to the browser.
- Updated the configuration UI to hide unused properties. Only relevant options are now visible, reducing clutter.
The Amazon Web Services (AWS) S3 V2 connector has been updated to version 1.5.0.This release adds support for the AWS S3 Object Lambda Access Point that lets you customize how data is processed when it is retrieved from AWS S3 buckets.The Lambda function associated with the access point can perform functions such as decrypting files and redacting personal information from files. For example, if you invoke the AWS S3 V2 connector using the access point and fetch encrypted files, the encrypted files are decrypted and returned to the connector.In the Fusion UI, the datasource includes the
Object Lambda Access Point ARN - GET OBJECT
field where you enter its Amazon Resource Name (ARN)
. This access point needs to be associated with the Lambda function on AWS S3 that supports the get-object
API call. The connector fetches objects using the access point and triggers the corresponding Lambda function on AWS S3. If the Object Lambda Access Point ARN - GET OBJECT
field does not contain a value, the AWS S3 V2 connector retrieves objects normally and the Lambda function is not invoked.This is an update with a label, description, and tag.The Web V2 connector has been updated to version 2.1.0.This release adds JavaScript evaluation for the Web V2 connector, which allows the connector to extract content from a website that is only available after JavaScript has been rendered. The release also rebuilds some of the connector’s capabilities in Selenium Grid, which offers non-headless browser mode and concurrent processing. Additionally, this release restores features from the Web V1 connector.If you are authenticating to your website when crawling it, you can evaluate JavaScript while crawling websites. Select Evaluate JavaScript during SmartForms/SAML Login.The headless browsing setting in the Web V2 connector lets you runs browsers without actually seeing the browser. For websites that render pages on the server, the Headless browser field must be unchecked for the crawl to work correctly and retrieve links. For websites that render pages on the client side, the Headless browser field should be checked.
ImportantIf you are using Web V2 2.1.0 or later, you must use Selenium Grid as part of your Web V2 connector setup.
JavaScript evaluation
JavaScript evaluation is available for remote and hosted connectors and supports authentication and headless browsing. The capabilities of a browser are essential, and this release introduces Selenium Grid to implement browser-based rendering.For hosted connectors, Selenium Grid support is available through Kubernetes. For remote connectors, Selenium Grid support is available through Docker Compose. See the Web V2 remote support repository for setup instructions and YAML files.The Selenium services require an x86 architecture to run properly. Running the Selenium services on an ARM-based system such as Apple Silicon is not supported.ImportantUp to three Web V2 connectors can run simultaneously in a single cluster. This prevents reaching a max concurrency limit per Web V2 connector, which affects how much data can be sent to Selenium Grid at one time.
Improvements
Thedepth
property has been restored, allowing you to control the scope of your web crawl. The default value is -1
, which does not limit the scope of the crawl. Configure this value in the Limit Document Properties section of the Web V2 connector.If a crawl fails because the start link is invalid, the Web V2 connector now marks the crawl as failed and Fusion logs an exception. This restores functionality from the Web V1 connector.Bug fix
Previously, the Port field for Basic Authentication did not accept-1
as a value to accept any port. This is now resolved, and -1
is an accepted value.The Web V2 connector has been updated to version 2.0.1.
- Fixed a bug where Web V2 v2.0.0 failed to handle non-HTML responses such as JSON. When a JSON response was returned, the connector would complete with a success response but without indexing any data due to a premature stream closure error. This issue occurred only with Web V2 v2.0.0 on Fusion 5.9.11 and did not affect Web V2 v1.4.0.
- Compatibility for this connector is extended to include all versions of Fusion 5.9.x.
The AWS S3 V2 connector is updated to v1.4.2.This release resolves a bug where recrawls failed to fetch data from folders. This occurred when a previous crawl saved errors in Crawldb and the datasourceConfig had:
objectKeys
configuredenableStrayContent
set tofalse
The Web V2 connector has been updated to version 2.0.0. This release resolves a critical issue where the Restrict crawl to start-link checkbox appeared unchecked in the UI, while the setting was enabled in the backend. This discrepancy caused the option to be enforced even when the UI indicated it was disabled, potentially resulting in the indexing job failing silently.This release also includes general performance and compatibility improvements.Known issues in version 2.0.0:
- The Link rewrite script option, which allows JavaScript to modify document links before fetching, is currently non-functional. A fix is planned for a future release.
- The Max items setting enforces a limit that is one less than the configured value. For example, if set to 10,000, only 9,999 documents are fetched. A fix is planned for a future release.
ImportantStarting with Fusion 5.9.11, you must upgrade to the Web V2 connector v2.0.0 or later. Previous versions (e.g. 1.4.0) are incompatible with Fusion 5.9.11 and later versions of Fusion 5.9 due to changes introduced by the upgraded JDK in Fusion. If you are using an earlier version of Fusion, use Web V2 v1.4.0 instead.
The JDBC V2 Connector is updated to v2.6.2.This release resolves a bug that caused some updated or newly added records to be missed during delta indexing.Previously, the connector only looked at the end time of the last indexing job to find new data. This behavior meant any records added or changed during that job could be skipped during delta indexing.With this fix, the connector now looks at both the start and end times of the last job, ensuring that no records are missed when running a delta indexing job.
The REST V2 connector is upgraded to v1.1.0.
Summary
A recipe is added for Alfresco to enhance the search experience by implementing a hierarchical request feature that traverses multiple storage levels to locate and index file content. Several new features are added, including introducing a retry mechanism to reattempt requests failing due to server-side errors based on configurable retry counts and delay times. Additionally, this release features a skip indexation option that prevents indexing parent objects when they are only used to discover child objects. The connector now supports recursive requests to automatically retrieve nested objects regardless of depth. This release also allows for limiting documents through exclusion by regular expressions and file size constraints, and includes improvements like enhanced logging with query parameters and the addition of an index field to track the number of documents indexed per request configuration.New recipe
- A new recipe is included as part of this release: Alfresco.
- This new recipe integrates with the Alfresco information management software improve to improve the search experience.
- The recipe implements the new hierarchical request feature to locate and index file content at multiple storage levels.
New features
Hierarchical request feature:
- Hierarchical object discovery allows requests to traverse multiple levels by following the natural structure of the data under the source.
- Retry Count: The number of attempts a request will be retried.
- Max Delay Time: The maximum wait time in milliseconds between retries.
Skip indexation of objects feature
- When enabled, the response is not indexed. This is useful when objects are requested only to discover their child objects without indexing the parent object.
-
Example indexing a list of files with their binary content:
- Given a parent request (
objectType=FILE
), retrieve a list of file metadata. This request helps discover the IDs of files to be downloaded in a follow-up request. - Given a child request (
objectType=FILE-DOWNLOAD
withparentObjectType=FILE
), download the binary content from previously discovered file metadata. - The indexed documents will include file metadata (from the
FILE
request) joined with binary content (from theFILE-DOWNLOAD
request). - Enable Skip Indexation in
FILE
request to prevent indexing file metadata objects.
- Given a parent request (
Recursive requests
- Enables recursive retrieval of nested objects of the same
ObjectType
using the same request configuration. This is useful when the depth of nesting is unknown, automating the retrieval of all nested objects. - Example: Detect all nested folders from a system path where the depth of nested folders is unknown. Enable Recursive Requests to retrieve all levels automatically.
Limit documents
-
Exclude by RegEx allows specifying a list of key-value pairs to exclude objects from indexing.
- Key references the field name of the object to exclude and supports
JsonPath
expressions for navigating nested objects (for example,objects.nested.path
). - Value is a regular expression matched against the field value in the object. If the match succeeds, then the entire object is excluded.
- Key references the field name of the object to exclude and supports
-
Exclude by File Size allows setting minimum and maximum file sizes (in bytes) to exclude files outside the specified range.
- Key references the field name of the object containing the file size and supports
JsonPath
expressions (for example,objects.nested.path
). - Minimum File Size: Files smaller than this value will be excluded.
- Maximum File Size: Files larger than this value will be excluded. (Set to
-1
for no limit.)
- Key references the field name of the object containing the file size and supports
Improvements
- Improved logging to include query parameters in requests.
- Added index field
_lw_rest_object_type_s
to store the value of theObjectType
configuration property, which represents the name of the request. This helps track the number of documents indexed per request configuration.
Deprecation
- Property
Service Endpoints
used for object discovery through two-level requests is deprecated. Instead, useList of Requests Configuration
for configuring multiple request levels.
The REST V2 connector is introduced as a way to enable users to crawl content by exposing its data through a REST API. It can be configured to communicate with a wide selection of external datasources by making API calls and indexing the responses. As an out-of-the-box V2 connector, it provides a low-code user experience for indexing data from REST API-compatible sources.The REST V2 connector expedites setting up and indexing various datasources using premade configurations, called recipes. These recipes are templates that include all required parameters for datasource integration and can be easily adjusted to suit your specific needs. Unlike single-purpose connectors, the REST V2 connector can index data from any datasource supported by a corresponding recipe.
- The REST V2 Connector relies on a public GitHub repository to store and manage recipes. Recipes are open-source and accessible to the community for use and contribution.
- Two recipes are included in the initial release: Jira and Confluence. Additional recipes are being developed and will be released as they are finalized.
- This release also includes two forms of authentication: OAuth and Basic Auth.
- Full crawl retrieves all objects from the datasource.
- Recrawl relies on the
strayContentDeletion
feature from theconnectors-service
to ensure deleted objects from the source are also removed from the index.
- Allows defining a root request to retrieve first-level objects.
- Supports a list of child requests (children of the main request) to retrieve second-level objects.
- By default, objects retrieved with the root request and child requests are indexed as individual Solr documents.
- Next page URL uses a URL to fetch the next page of results.
- Batch size and index start use a batch size and starting index to paginate through results.
- When parsing within the plugin, the response is parsed as a JSON object structure using JSONPath. This is the default behavior.
- When parsing with Fusion, the response is emitted directly to Fusion, where Fusion parsers handle the binary data. Enable this feature by setting the property
Send as Binary Data
.
Child Response Mapping → Custom Solr Field
. This feature works only when both parent and child objects are parsed as JSON objects.The following variables are used when configuring a datasource:${LW_BATCH_SIZE}
: Used with pagination by batch size. This variable represents thesize
query parameter defined in the property,Pagination By BatchSize → BatchSize
.${LW_INDEX_START}
: Used with pagination by batch size. This variable represents thestart-point
query parameter defined in the propertyPagination By BatchSize → IndexStart
, which is used to traverse the pagination.${LW_PARENT_DATA_KEY}
: Used with the child request configuration. This variable is replaced with theid
extracted from the root object using the propertyParent Data Key
.
This connector is compatible with Fusion 5.9.0 and later.
An improvement is made to the LDAP ACLs V2 connector.
- The timeout limit to retrieve ACLs from LDAP services is extended.
This version of the connector is compatible with Fusion 5.9.1 and later.
The Web V2 Connector is upgraded to v1.4.0.
- Incorporates OAuth for compatibility with Ping Identity and Azure.
- Fixes a bug where links listed under BULK START LINKS were not being indexed.
The AWS S3 V2 connector is updated to v1.4.1.
- A new property, retry count, has been introduced to the S3 connector. This property value is passed in the AWS SDK, determining the number of retry attempts for retrieving the file from the AWS S3 bucket. The configurability of this property enhances the effectiveness of the connector, especially in situations of network instability.
The Kaltura V2 connector is updated to v1.4.0.
This version of the connector is compatible with Fusion 5.9.4 and later.
- Adds validation to establish a connection with Kaltura before commencing the crawling process.
- Updates the plugin to replace static Security Trimming with Graph Security Trimming, improving performance.
The AWS S3 V2 connector is updated to v1.4.0.
- Adds an Enable Stray Content Deletion property in the Fusion UI for the S3 connector to toggle stray content deletion on or off. When stray content deletion is enabled, content that was removed from the datasource is deleted from the index in Fusion. When stray content deletion is disabled, content that was removed from the datasource is not deleted from the index in Fusion. This property is enabled by default.
The AEM V2 connector is updated to v1.3.0.
This version of the connector is compatible with Fusion 5.9.4 and later.
- Updates the plugin to replace static Security Trimming with Graph Security Trimming, improving performance.
- Fixed an issue where security trimming was enabled by default.
The File Upload V2 connector replaces the File Upload V1 connector for indexing local file contents.
- Provides a convenient way to quickly ingest data from your local filesystem.
- Constructed using the V2 framework and serves as a replacement for the classic version.
The Box.com V2 connector is updated to version 2.2.0.
This version of the connector is compatible with Fusion 5.9.4 and later.
- Updates the plugin to replace static Security Trimming with Graph Security Trimming, improving performance.
- Fixes the invalid RegEx implementation that adjusts the Box URL to retrieve data from the Box datasource so the connector now accepts dashes (
-
) in Box start URLs. - Corrects relative path field data when indexing documents that are of type ‘File’ in nested folders.
- Fixes a bug where new documents were not being indexed upon incremental crawls.
- Migrates the Box-Java-SDK to version 4.8.0.
An improvement is made to the LDAP ACLs V2 connector.
- When the maximum number of referrals is reached, an exception is now thrown to handle the situation while ensuring the connector does not stop functioning.
The Windows Share SMB 2/3 V2 connector is updated to version 2.0.0.
This version of the connector is compatible with Fusion 5.9.4 and later.
- Upgrades the plugin to use the latest SDK version.
- Updates the plugin to replace static Security Trimming with Graph Security Trimming, improving performance.
- Fixes a bug where an “Error validating datasource” message displayed after trying to save a datasource with the
Enable DFS
connection property.
The AEM V2 connector is updated to version 1.2.0.
- Now you can configure the AEM connector to include child paths when indexing fields.
This option is off by default; enable it by selecting Index metadata by child path in your AEM datasource configuration. - Fixes a bug that prevented running the connector on Windows.
- The basic authentication username and password fields have moved under Authentication Settings > Login Settings.
The LDAP ACLs V2 connector is updated to version 2.1.0.
- Fixes Graph security trimming not working with
Everyone except external
access. - Allows connector to reach maximum referral. The connector will throw the exception but will not stop.
The JDBC V2 Connector is updated to v2.6.1.
- Fixed a pagination issue that limited the number of records returned.
-
Added support for pagination of IBM Db2 (version 11 and earlier) by using a template sub-query:
- Fixed an issue where UTF-8 characters stored as CLOB data types didn’t index properly.
The JDBC V2 Connector is updated to v2.6.0 and implements a pagination feature. This allow the JDBC connector to ingest a specified number of rows during the crawl process.This feature is controlled by following newly added properties:
LIMIT
- specifies the number of rows returned in the results.OFFSET
- dictates the number of rows to skip from the beginning of the returned data before presenting the results.disableAutomaticPagination
- disables automatic pagination to ignore limit and offset fields.
The LDAP ACLs V2 connector v1.5.1 release includes backported fixes.
- Indexes the all-users acl document when indexing from AzureAD.
- Fixes Graph Security Trimming with
Everyone except external
access in SharePoint Online.
The JDBC V2 Connector is updated to v2.5.0.
- Exposes configuration options for the validation timeout.
- Resolves an issue where documents were deleted on incremental crawls when stray delete was enabled.
The Kaltura V2 connector is updated to v1.3.0.
- This adds back in the Kaltura External Media Entry Compare Attribute fields.
The LDAP ACLs V2 connector is updated to v2.0.0.The updated connector:
LDAP ACLs V2 connector v2.0.0 is compatible with Fusion 5.9.1 and later. See the configuration reference page for the latest version and compatibility details.
- Supports remote configurations.
- Uses the SDK that correctly processes the NOT-EQUAL phrase in delete-by-query.
- Supports unauthenticated proxy.
The AEM V2 connector is updated to v1.1.0.
- The connector allows OAuth 2 support for JWT token.
The JDBC V2 Connector is upgraded to v2.4.0.
- This release expands the types of character large objects (CLOBs) which can be indexed to include IBM CLOBs.
The LDAP ACLs V2 connector is upgraded to v1.5.0.
If you are on Fusion 5.9.0 or later, you must use the LDAP ACLs V2 connector v1.5.0 or later.
- The datasource indexing job will now return an error when invalid access credentials are provided instead of reporting a success.
- The
Security
parameter is now set toenabled
by default.
The Kaltura V2 connector is upgraded to v1.2.0.
- Fixed an issue that occurred when reaching Kaltura’s API limit of 10,000 documents. Indexing over 10,000 documents should now work as expected.
The JDBC V2 Connector is upgraded to v2.3.0.
- The JDBC V2 connector now supports indexing character large objects (CLOBs).
The Kaltura V2 connector is upgraded to v1.1.0.
- A
NoClassDefFound
error in the connector has been fixed.
The Web V2 Connector is upgraded to v1.3.0.
- A bug with the Rewrite URI configuration option that prevented the configuration from being applied during the indexing job has been fixed.
- The SAML, Form, and NTML authorization form fields are updated to match those present in the Web V1 connector.
The JDBC V2 Connector is upgraded to v2.2.0.
-
A bug that affected JDBC datasources with large document sizes has been fixed. Previously, the indexing job would fail and produce a Solr error message similar to the following:
This bug has been fixed and indexing large documents should now work as expected.
The LDAP ACLs V2 Connector is upgraded to v1.4.0.
-
Added support for the graph security trimming stage for Active Directory in Azure.
This describes how to migrate your pre-Fusion 5.8 Graph Security Trimming query pipeline stage setup to Fusion 5.8 or later. It applies to deployments using:
- SharePoint Optimized V2 connector v1.1.0 or later
- LDAP ACLs V2 connector v1.4.0 or later to crawl Active Directory in Azure
- The LDAP ACLs V2 connector v1.2.0 or later to crawl Active Directory in LDAP
Migration
To migrate a deployment that is crawling Active Directory to Fusion 5.8 or later, follow these steps.Update the datasource configurations
The SharePoint Optimized V2 and LDAP ACLs V2 datasources must index the content documents and ACL documents to the same collection. Ensure both datasources use the same value,contentCollection
, for the field ACL Collection ID.If using SharePoint-Optimized and LDAP ACLs < v2.0.0
Update the ACL Collection Id in the datasource configuration.The SharePoint-Optimized and LDAP ACLs datasources must index theircontent_documents
andacl_documents
to the same collection. Make sure the property Security -> ACL Collection in both datasources have the same value. In both datasources, SharePoint-Optimized and LDAP ACLs, check the property Security -> ACL Collection Id and make sure it points to the same content-collection.- Navigate to Indexing > Datasources.
- Open your SharePoint Optimized V2 or LDAP ACLs V2 datasource.
- Under Security, update the configuration to use
contentCollection
as the ACL Collection ID. The Security checkbox must be checked for this field to appear. - Save the configuration.
Repeat this process for all required datasources.
If using SharePoint-Optimized and LDAP ACLs >= v2.0.0
Recreate or update the datasources. If only updated, it is not possible to go back to the configuration of a previous plugin version.
By default, the LDAP ACLs and SharePoint-Optimized V2 datasources will index thecontent_documents
andacl_documents
to the same collection.- Navigate to Indexing > Datasources.
- Open your SharePoint Optimized V2 or LDAP ACLs V2 datasource.
- Under Graph Security Filtering Configuration, select Enable security trimming.
Repeat this process for all required datasources.
Clear the datasources and perform a full crawl
- Navigate to Indexing > Datasources.
- Open your SharePoint Optimized V2 or LDAP ACLs V2 datasource.
- Click the Clear Datasource button, and choose yes.
- Navigate to Collections > Collections Manager.
- Verify that the
job_state
collection is empty. - Return to your datasource.
- Click Run > Start to reindex your data.
Repeat this process for all required datasources.
- A bug that prevented the connector from retrying after a connection error has been fixed.
The JDBC V2 Connector is upgraded to v2.1.0.
- The
statement.setMaxRows()
field was added to resolve a timeout error for large queries on a JDBC data source.
The LDAP ACLs connector is upgraded to v1.3.0.
- Fixed a memory leak that resulted in an
OutOfMemoryError
runtime error, causing recrawls to quickly fail.
The Web V2 Connector is upgraded to v1.2.0.
- Fixed a bug that sometimes resulted in an
OutOfMemory: Java Heap Space
error when the connector was run remotely. - Fixed a dependency issue that sometimes resulted in a
Error starting controllers
error that caused the job to fail.
The LDAP ACLs V2 Connector is upgraded to v1.2.0.
- Fixed a bug that removed access control lists (ACLs) during incremental crawls in Azure Active Directory. This bug resulted from a deviation from the LDAP ACLs V2 connector’s unique incremental crawl behavior for Azure Active Directory.
- Fixed a bug that sometimes deleted ACLs that the indexing job failed to update during its previous run.
- Known issue: The job may complete with a “Successful” status even when a network communication error occurred.
The Web V2 connector is upgraded to v1.1.0.
- Fixed a bug that sometimes prevented the connector from parsing PDF, Word, Excel, and other file types.
- Some conflicting dependencies were removed to resolve errors.
- A missing dependency, guava-retrying, was added to resolve a
NoClassDefFoundError
error. - A bug was fixed that resulted in the connector performing full crawls instead of recrawling as expected.
- Known issue: JavaScript evaluation is not functional at this time.
- Known issue: The counter values in the Datasources > DATASOURCE_NAME > Job History view do not reflect actual values, when the datasource uses the Web V2 connector.