- Platform versions
- Key differences
Before the release of Fusion 4.2.4, SharePoint and SharePoint Online connectors were offered in platform version V1. For configuration information on the V1 connectors:
V1 Optimized connectors
As of Fusion 4.2.4, SharePoint and SharePoint Online connectors are offered in platform version V1 Optimized. The V1 Optimized platform version is designed to replace the V1 platform version, and users are encouraged to upgrade to Fusion 4.2.4 or higher to take advantage of the V1 Optimized benefits. For configuration information on the V1 Optimzed connectors:
CSOM REST API
The V1 platform version uses SOAP API. This API style has been deprecated as of SharePoint 2013.
The V1 Optimized platform version uses CSOM REST API. This API style provides a variety of benefits not found with SOAP API:
CSOM REST API supports bulk operations for faster crawl operations.
CSOM REST API uses traffic decorating and is therefore less susceptible to throttling.
CSOM REST API is considerably more efficient, resulting in less data being transfered during crawl operations.
Active Directory Connector for ACLs dependency
The V1 platform version has a key limitation in regard to LDAP/ActiveDirectory access. In order to look up user group memberships, each SharePoint datasource was required to perform LDAP queries. If multiple SharePoint datasources utilized a single LDAP/ActiveDirectory backend, however, multiple LDAP lookup operations took place unnecessarily, and the user would suffer from excessive LDAP overhead.
In Fusion 4.2.4, the Active Directory (AD) Connector for ACLs was introduced.
The SharePoint V1 Optimized connector works in tangent with the AD Connector for ACLs to create a sidecar collection which is used in graph security trimming queries. As a result, all LDAP/ActiveDirectory operations are fully dependent on the AD Connector for ACLs.
|If you are using SharePoint Online, and it’s not backed by Azure Active Directory or Active Directory Federation Services (ADFS), the V1 Optimized connector does not depend on the AD Connector for ACLs.|
The V1 platform version does not use the SharePoint Changes API. As a result, the recrawl process required all items to be revisited in order. For large SharePoint collections, incremental crawls took an excessive amount of time.
The V1 Optimized platform version is able to take advantage of the Changes API to perform incremental crawls. The Changes API tracks all additions, updates, and deletions since the previous crawl operation for a collection.
This improved crawl operation process significantly improves incremental crawl speed.
Graph security trimming
The security trimming approach used by the V1 platform version had notable drawbacks:
LDAP/ActiveDirectory information is stored in an inefficient manner. When a document is fetched for indexing, it returns the users and groups with permission to view the document. However, SharePoint doesn’t explicitly list these users and groups. The security trimming approach requires that all nested LDAP/ActiveDirectory groups be fetched and added to the document ACLs.
As a result, if the nested LDAP/ActiveDirectory group relationships change, the content is sometimes required to be reindexed despite not changing in SharePoint. This can lead to massive reindexing operations.
Each SharePoint datasource requires a separate Solr filter. With the V1 platform version, SharePoint datasources are unable to share the same security filter, even if they are pointing to the same SharePoint farm. This restriction can be severely inefficient.
In a use case with five SharePoint datasources, for example, five Solr filter queries (fqs) would be required. The more fqs you have, the more work is required from Solr while performing queries, resulting in slower queries. This inefficiency scales with the number of SharePoint datasources, and it is not uncommon to have 30-50 datasources in an application.
SharePoint security filters cannot be shared with other connectors. For example, if a SharePoint datasource and an SMB2 datasrouce are backed by the same ActiveDirectory, you are still required to have an individual security filter for both datasources. Again, this inefficiency scales with the number of datasources you have.
Unlike the V1 platform version, the V1 Optimized platform version uses a Solr graphy query approach. Advantages include:
LDAP/ActiveDirectory information is not stored in nested groups on the content document ACL fields.
ACLs in SharePoint content documents are stored in a field. Each SharePoint document that you crawl contains ACLs. As the document is indexed by Fusion, a field is populated with any role assignments attached to the document to ensure only users with appropriate permissions can view it. For example when doing a security trimmed query, you can input the username that is performing the search, and a Solr fq is formed with the values that match the ACL field on each document. The documents that are returned are restricted to what the user is permitted to view.
A single filter can perform a secruity trimming query against datasources backed by the same ActiveDirectory instance. This is not restricted to the SharePoint V1 Optimized connector. Other connectors, such as the SMB2 connector, can use the same filter.
Group membership lookups (LDAP queries) are separated from the SharePoint connector. Now, the AD Connector for ACLs is used to create a separate ACL Solr sidecar collection. First, a Solr graph query is performed to obtain a user’s groups and nested groups from the sidecar collection. Then, a join query is used to match the ACL fields on the content documents.Note
This process is performed behind-the-scenes. The V1 Optimized connector uses the security trimming stage like all other connectors.
Multiple crawl phases
The V1 platform version does not support multple crawl phases.
The V1 Optimized platform version performs crawl operations in two phases:
Pre-fetch phase - The pre-fetch phase utilizes the CSOM REST API to fetch all relevant metadata in large batches. This creates a pre-fetch database, which is exported for use by the post-fetch phase.Note
The pre-fetch phase does not download the file content of list items. It only fetches the metadata.
Post-fetch phase - After the pre-fetch phase has completed, the crawl operation is ready to index documents during the post-fetch phase. The crawl will iterate through all items identified in the pre-fetch phase and index them into the pipeline. If there is file content associated with a pre-fetch list item, that content will be downloaded and parsed using the Fusion parser.