Product Selector

Fusion 5.12
    Fusion 5.12

    Configure a SharePoint V1 Optimized Datasource

    The SharePoint connector retrieves content and metadata from an on-premises SharePoint repository.

    1. Decide what you need to crawl

    The first and most important thing to do is determine what you are trying to crawl, and to pick your “Start Links” accordingly.

    Choose one of the following:

    How to crawl an entire SharePoint Web application

    1. Leave the Limit Documents > Fetch all site collections option checked (as it is by default).

    2. Specify the Web application URL as a site.

      For example: https://lucidworks.sharepoint.local/

    Crawling an entire SharePoint Web application requires administrative access to SharePoint.

    How to crawl a subset of SharePoint site collections

    1. Uncheck the Limit Documents > Fetch all site collections option.

    2. Specify a "Start Link" for each site collection that you want to crawl.

      Examples: https://lucidworks.sharepoint.local/sites/site1, https://lucidworks.sharepoint.local/sites/site2, https://lucidworks.sharepoint.local/sites/site3

    How to crawl a specific sub-site, list, or list item:

    1. Uncheck the Limit Documents > Fetch all site collections option.

    2. Specify a "Start Link" for each site collection that contains the item you want to fetch.

    3. Specify a non-wildcard Inclusive Regular Expression for each parent.

      For example, if you want to crawl https://lucidworks.sharepoint.local/sites/mysitecol/myparentsite/somesite then you must include inclusive regexes for all parents along the way:

      https\:\/\/lucidworks\.sharepoint\.local\/sites\/mysitecol
      https\:\/\/lucidworks\.sharepoint\.local\/sites\/mysitecol\/myparentsite
      https\:\/\/lucidworks\.sharepoint\.local\/sites\/mysitecol\/somesite
      https\:\/\/lucidworks\.sharepoint\.local\/sites\/mysitecol\/somesite\/.*
      If you exclude a parent item of the site, the connector will not crawl the site because it will never spider down to it during the crawl process.

    2. Set up permissions for the crawl

    You have two options here:

    • Set up a crawl account with only as much permission as it needs.

      This approach has the security advantage of providing minimal access to Fusion. However, the crawl account cannot retrieve the list of site collections behind a Web application URL. It cannot access the SharePoint Tenant Admin API to list all the site collections on your tenant. If you use this authentication method, you must enter each site collection to crawl in Start Links.

    • Provide administrative access to crawl

    How to set up a crawl account

    1. Create a Lucidworks Fusion crawl permission

    1. Navigate to Central Administration > Manage web application > Permission Policy.

    2. Click Add permission policy level. In this example, the permission level is named "fusion_crawl_policy".

    3. If you need to list all site collections in a SharePoint web application, select the option Site Collection Auditor:

      Fusion SharePoint Crawl Permissions

    4. Grant the following permissions:

      • View Items - View items in lists and documents in document libraries.

      • Open Items - View the source of documents with server-side file handlers.

      • View Versions - View past versions of a list item or document.

      • View Application Pages - View forms, views, and application pages. Enumerate lists.

      Site Permissions
      • Browse Directories - Enumerate files and folders in a Web site using SharePoint Designer and Web DAV interfaces.

      • View Pages - View pages in a Web site.

      • Enumerate Permissions - Enumerate permissions on the Web site, list, folder, document, or list item.

      • Browse User Information - View information about users of the Web site.

      • Use Remote Interfaces - Use SOAP, Web DAV, the Client Object Model or SharePoint Designer interfaces to access the Web site.

      • Open - Allows users to open a Web site, list, or folder in order to access items inside that container.

    2. Grant user permission to the user policy

    1. Navigate to Central Administration > Manage web application > User Policy > Add Users.

    2. Create a new user with the new policy permission level, "fusion_crawl_policy", selected:

      SharePoint Permission Policy Level

    3. Test user permissions

    The following PowerShell script verifies permissions on the user account created to crawl SharePoint from Fusion.

    The script must be run by the user account on which the permissions were set. If rights were granted:

    • On your account, you must run the script to verify the user rights are set correctly.

    • On a different user account, the owner of that account must run the script.

    1. Save the script with following file name: test-sharepoint-permissions.ps1.

    2. Enter the first of the site collection URLs to crawl in the $site_col_url field of the script.

    3. Save the changes.

    Permission verification script

    $site_col_url="https://your.sharepoint.local/sites/mysitecollection"
    
    $cred = (Get-Credential)
    
    if (-not ([System.Management.Automation.PSTypeName]'ServerCertificateValidationCallback').Type)
    {
    $certCallback = @"
        using System;
        using System.Net;
        using System.Net.Security;
        using System.Security.Cryptography.X509Certificates;
        public class ServerCertificateValidationCallback
        {
            public static void Ignore()
            {
                if(ServicePointManager.ServerCertificateValidationCallback ==null)
                {
                    ServicePointManager.ServerCertificateValidationCallback +=
                        delegate
                        (
                            Object obj,
                            X509Certificate certificate,
                            X509Chain chain,
                            SslPolicyErrors errors
                        )
                        {
                            return true;
                        };
                }
            }
        }
    "@
        Add-Type $certCallback
     }
    
    [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12;
    [ServerCertificateValidationCallback]::Ignore()
    
    $headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add("Content-Type", "text/xml")
    $headers.Add("SOAPAction", "http://schemas.microsoft.com/sharepoint/soap/GetUpdatedFormDigestInformation")
    $headers.Add("X-RequestForceAuthentication", "true")
    $headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
    
    $body = "<?xml version=`"1.0`" encoding=`"utf-8`"?>`n<soap:Envelope xmlns:xsi=`"http://www.w3.org/2001/XMLSchema-instance`" xmlns:xsd=`"http://www.w3.org/2001/XMLSchema`" xmlns:soap=`"http://schemas.xmlsoap.org/soap/envelope/`">`n  <soap:Body>`n    <GetUpdatedFormDigestInformation xmlns=`"http://schemas.microsoft.com/sharepoint/soap/`" />`n  </soap:Body>`n</soap:Envelope>"
    
    $response = Invoke-RestMethod "${site_col_url}/_vti_bin/sites.asmx" -Method 'POST' -Headers $headers -Body $body -Credential $cred
    
    $digest_value = $response.Envelope.Body.GetUpdatedFormDigestInformationResponse.FirstChild.DigestValue
    
    
    $headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add("Content-Type", "text/xml")
    $headers.Add("X-RequestForceAuthentication", "true")
    $headers.Add("X-RequestDigest", $digest_value)
    $headers.Add("Accept", "application/json")
    $headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
    
    $body = @'
    <Request AddExpandoFieldTypeSuffix="true" SchemaVersion="14.0.0.0" LibraryVersion="16.0.0.0"
             ApplicationName=".NET Library" xmlns="http://schemas.microsoft.com/sharepoint/clientquery/2009">
        <Actions>
            <ObjectPath Id="2" ObjectPathId="1"/>
            <ObjectPath Id="4" ObjectPathId="3"/>
            <Query Id="5" ObjectPathId="3">
                <Query SelectAllProperties="false">
                    <Properties>
                        <Property Name="Webs" SelectAll="true">
                            <Query SelectAllProperties="false">
                                <Properties/>
                            </Query>
                        </Property>
                        <Property Name="Title" ScalarProperty="true"/>
                        <Property Name="ServerRelativeUrl" ScalarProperty="true"/>
                        <Property Name="RoleDefinitions" SelectAll="true">
                            <Query SelectAllProperties="false">
                                <Properties/>
                            </Query>
                        </Property>
                        <Property Name="RoleAssignments" SelectAll="true">
                            <Query SelectAllProperties="false">
                                <Properties/>
                            </Query>
                        </Property>
                        <Property Name="HasUniqueRoleAssignments" ScalarProperty="true"/>
                        <Property Name="Description" ScalarProperty="true"/>
                        <Property Name="Id" ScalarProperty="true"/>
                        <Property Name="LastItemModifiedDate" ScalarProperty="true"/>
                    </Properties>
                </Query>
            </Query>
        </Actions>
        <ObjectPaths>
            <StaticProperty Id="1" TypeId="{3747adcd-a3c3-41b9-bfab-4a64dd2f1e0a}" Name="Current"/>
            <Property Id="3" ParentId="1" Name="Web"/>
        </ObjectPaths>
    </Request>
    '@
    
    $response = Invoke-RestMethod "${site_col_url}/_vti_bin/client.svc/ProcessQuery" -Method 'POST' -Headers $headers -Body $body -Credential $cred
    $response | ConvertTo-Json -Depth 100

    Successful query response

    If the test script executes successfully, metadata is returned. The following is a sample of a successful response:

    test-sharepoint-permissions.ps1
    cmdlet Get-Credential at command pipeline position 1
    Supply values for the following parameters:
    [
        {
            "SchemaVersion":  "14.0.0.0",
            "LibraryVersion":  "16.0.10337.12109",
            "ErrorInfo":  null,
            "TraceCorrelationId":  "c419a69f-1c06-b07f-b69b-4d7720fd7756"
        },
        2,
        {
            "IsNull":  false
        },
        4,
        {
            "IsNull":  false
        },
        5,
        {
            "_ObjectType_":  "SP.Web",
            "_ObjectIdentity_":  "c419a69f-1c06-b07f-b69b-4d7720fd7756|740c6a0b-85e2-48a0-a494-e0f1759d4aa7:site:8992a373-cdf0-4262-b240-9527c7174682:web:2080d74c-e181-43df-829f-ad5bee97b6f8",
            "Webs":  {
                         "_ObjectType_":  "SP.WebCollection",
                         "_Child_Items_":  [
                                               {
                                                   "_ObjectType_":  "SP.Web",
           ... truncated for brevity ...
    
            "LastItemModifiedDate":  "\/Date(1603731388000)\/"
        }
    ]

    Failed query response

    If the test script fails, either:

    • An error code is generated. For example, an error code 401.

    • An error message with explanatory information is returned. The following is a sample of a failed response:

    Credential
    Invoke-RestMethod : The remote server returned an error: (401) Unauthorized.
    At C:\Users\nicho\Documents\test-sharepoint-permissions.ps1:47 char:13
    + $response = Invoke-RestMethod "${site_col_url}/_vti_bin/sites.asmx" - ...
    +             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebExc
       eption
        + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand
    
    Invoke-RestMethod : The remote server returned an error: (401) Unauthorized.
    At C:\Users\nicho\Documents\test-sharepoint-permissions.ps1:100 char:13
    + $response = Invoke-RestMethod "${site_col_url}/_vti_bin/client.svc/Pr ...
    +             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebExc
       eption
        + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand