Configure a SharePoint V2 Datasource
1. Decide what to crawl
Determine what to crawl and select one of the following:
-
An entire SharePoint Web application (all site collections in a specific SharePoint URL).
How to crawl an entire SharePoint Web application
-
Verify the Limit Documents > Fetch all site collections option is selected (default).
-
Specify the Web application URL as a site.
For example:
https://lucidworks.sharepoint.local/
Administrative access to SharePoint is required to crawl an entire SharePoint Web application. |
How to crawl a subset of SharePoint site collections
-
Uncheck the Limit Documents > Fetch all site collections option.
-
Specify a "Start Link" for each site collection to crawl.
Examples include:
-
https://lucidworks.sharepoint.local/sites/site1
-
https://lucidworks.sharepoint.local/sites/site2
-
https://lucidworks.sharepoint.local/sites/site3
-
How to crawl a specific sub-site, list, or list item:
-
Uncheck the Limit Documents > Fetch all site collections option.
-
Specify a "Start Link" for each site collection that contains the item to fetch.
-
Specify a non-wildcard Inclusive Regular Expression for each parent.
For example, if you want to crawl
https://lucidworks.sharepoint.local/sites/mysitecol/myparentsite/somesite
, then you must include inclusive regexes for all parents:https\:\/\/lucidworks\.sharepoint\.local\/sites\/mysitecol https\:\/\/lucidworks\.sharepoint\.local\/sites\/mysitecol\/myparentsite https\:\/\/lucidworks\.sharepoint\.local\/sites\/mysitecol\/somesite https\:\/\/lucidworks\.sharepoint\.local\/sites\/mysitecol\/somesite\/.*
If you exclude a parent item of the site, the connector does not crawl the site because it will not spider down to it during the crawl process.
2. Create permission and user policy for the crawl
The options are:
-
Set up an on-prem crawl account with only as much permission as it needs.
This approach has the security advantage of providing minimal access to Fusion. However, the crawl account cannot retrieve the list of site collections behind a Web application URL.
-
Set up an online crawl account with only as much permission as it needs.
This approach has the security advantage of providing minimal access to Fusion. However, the crawl account cannot retrieve the list of site collections behind a Web application URL.
How to set up an on-prem crawl account
Create a permission policy level
-
Navigate to Central Administration > Manage web application > Permission Policy.
-
Select Add permission policy level. In this example, the permission level is named fusion_crawl_policy.
-
If you need to list all site collections in a SharePoint web application, select the Site Collection Auditor option.
-
Grant the following permissions:
-
View Items - View items in lists and documents in document libraries.
-
Open Items - View the source of documents with server-side file handlers.
-
View Versions - View past versions of a list item or document.
-
View Application Pages - View forms, views, and application pages. Enumerate lists.
Site Permissions-
Browse Directories - Enumerate files and folders in a Web site using SharePoint Designer and Web DAV interfaces.
-
View Pages - View pages in a Web site.
-
Enumerate Permissions - Enumerate permissions on the Web site, list, folder, document, or list item.
-
Browse User Information - View information about users of the Web site.
-
Use Remote Interfaces - Use SOAP, Web DAV, the Client Object Model or SharePoint Designer interfaces to access the Web site.
-
Open - Allows users to open a Web site, list, or folder in order to access items inside that container.
-
Grant user permission to the user policy
-
Navigate to Central Administration > Manage web application > User Policy > Add Users.
-
Create a new user with the new fusion_crawl_policy permission level selected:
How to set up an online crawl account
Create a permission policy level
-
Navigate to Site settings > Site permissions > Advanced Permission Settings.
-
Select New permission level. In this example, the permission level is named fusion_crawl_policy.
-
Grant the following permissions:
-
View Items - View items in lists and documents in document libraries.
-
Open Items - View the source of documents with server-side file handlers.
-
View Versions - View past versions of a list item or document.
-
View Application Pages - View forms, views, and application pages. Enumerate lists.
Site Permissions-
Browse Directories - Enumerate files and folders in a Web site using SharePoint Designer and Web DAV interfaces.
-
View Pages - View pages in a Web site.
-
Enumerate Permissions - Enumerate permissions on the Web site, list, folder, document, or list item.
-
Browse User Information - View information about users of the Web site.
-
Use Remote Interfaces - Use SOAP, Web DAV, the Client Object Model or SharePoint Designer interfaces to access the Web site.
-
Open - Allows users to open a Web site, list, or folder in order to access items inside that container.
-
Grant user permission
-
Navigate to Site settings > Site permissions > Advanced Permission Settings.
-
Select Grant permissions.
-
Enter the new user name and add the user.
-
Select a value in the Select a permission level field.
-
Select Share.
-
In the Edit Permissions > Choose Permissions section, select the following check boxes:
-
Read. Can view pages and list items and download documents.
-
LW Fusion.
-
-
Select OK to save the information.
If you grant the service account the Site Collection Auditor permission, the Lucidworks Fusion SharePoint connector has write-level permission and can list:
|
How to provide admin access to crawl
See the SharePoint documentation for instructions.
3. Test user permissions
The following PowerShell script verifies permissions on the user account created to crawl SharePoint from Fusion.
The script must be run by the user account on which the permissions were set. If rights were granted:
|
-
Save the script with following file name:
test-sharepoint-permissions.ps1
. -
Enter the first of the site collection URLs to crawl in the
$site_col_url
field of the script. -
Save the changes.
Permission verification script
$site_col_url="https://your.sharepoint.local/sites/mysitecollection"
$cred = (Get-Credential)
if (-not ([System.Management.Automation.PSTypeName]'ServerCertificateValidationCallback').Type)
{
$certCallback = @"
using System;
using System.Net;
using System.Net.Security;
using System.Security.Cryptography.X509Certificates;
public class ServerCertificateValidationCallback
{
public static void Ignore()
{
if(ServicePointManager.ServerCertificateValidationCallback ==null)
{
ServicePointManager.ServerCertificateValidationCallback +=
delegate
(
Object obj,
X509Certificate certificate,
X509Chain chain,
SslPolicyErrors errors
)
{
return true;
};
}
}
}
"@
Add-Type $certCallback
}
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12;
[ServerCertificateValidationCallback]::Ignore()
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "text/xml")
$headers.Add("SOAPAction", "http://schemas.microsoft.com/sharepoint/soap/GetUpdatedFormDigestInformation")
$headers.Add("X-RequestForceAuthentication", "true")
$headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
$body = "<?xml version=`"1.0`" encoding=`"utf-8`"?>`n<soap:Envelope xmlns:xsi=`"http://www.w3.org/2001/XMLSchema-instance`" xmlns:xsd=`"http://www.w3.org/2001/XMLSchema`" xmlns:soap=`"http://schemas.xmlsoap.org/soap/envelope/`">`n <soap:Body>`n <GetUpdatedFormDigestInformation xmlns=`"http://schemas.microsoft.com/sharepoint/soap/`" />`n </soap:Body>`n</soap:Envelope>"
$response = Invoke-RestMethod "${site_col_url}/_vti_bin/sites.asmx" -Method 'POST' -Headers $headers -Body $body -Credential $cred
$digest_value = $response.Envelope.Body.GetUpdatedFormDigestInformationResponse.FirstChild.DigestValue
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "text/xml")
$headers.Add("X-RequestForceAuthentication", "true")
$headers.Add("X-RequestDigest", $digest_value)
$headers.Add("Accept", "application/json")
$headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
$body = @'
<Request AddExpandoFieldTypeSuffix="true" SchemaVersion="14.0.0.0" LibraryVersion="16.0.0.0"
ApplicationName=".NET Library" xmlns="http://schemas.microsoft.com/sharepoint/clientquery/2009">
<Actions>
<ObjectPath Id="2" ObjectPathId="1"/>
<ObjectPath Id="4" ObjectPathId="3"/>
<Query Id="5" ObjectPathId="3">
<Query SelectAllProperties="false">
<Properties>
<Property Name="Webs" SelectAll="true">
<Query SelectAllProperties="false">
<Properties/>
</Query>
</Property>
<Property Name="Title" ScalarProperty="true"/>
<Property Name="ServerRelativeUrl" ScalarProperty="true"/>
<Property Name="RoleDefinitions" SelectAll="true">
<Query SelectAllProperties="false">
<Properties/>
</Query>
</Property>
<Property Name="RoleAssignments" SelectAll="true">
<Query SelectAllProperties="false">
<Properties/>
</Query>
</Property>
<Property Name="HasUniqueRoleAssignments" ScalarProperty="true"/>
<Property Name="Description" ScalarProperty="true"/>
<Property Name="Id" ScalarProperty="true"/>
<Property Name="LastItemModifiedDate" ScalarProperty="true"/>
</Properties>
</Query>
</Query>
</Actions>
<ObjectPaths>
<StaticProperty Id="1" TypeId="{3747adcd-a3c3-41b9-bfab-4a64dd2f1e0a}" Name="Current"/>
<Property Id="3" ParentId="1" Name="Web"/>
</ObjectPaths>
</Request>
'@
$response = Invoke-RestMethod "${site_col_url}/_vti_bin/client.svc/ProcessQuery" -Method 'POST' -Headers $headers -Body $body -Credential $cred
$response | ConvertTo-Json -Depth 100
Successful query response
If the test script executes successfully, metadata is returned. The following is a sample of a successful response:
test-sharepoint-permissions.ps1 cmdlet Get-Credential at command pipeline position 1 Supply values for the following parameters: [ { "SchemaVersion": "14.0.0.0", "LibraryVersion": "16.0.10337.12109", "ErrorInfo": null, "TraceCorrelationId": "c419a69f-1c06-b07f-b69b-4d7720fd7756" }, 2, { "IsNull": false }, 4, { "IsNull": false }, 5, { "_ObjectType_": "SP.Web", "_ObjectIdentity_": "c419a69f-1c06-b07f-b69b-4d7720fd7756|740c6a0b-85e2-48a0-a494-e0f1759d4aa7:site:8992a373-cdf0-4262-b240-9527c7174682:web:2080d74c-e181-43df-829f-ad5bee97b6f8", "Webs": { "_ObjectType_": "SP.WebCollection", "_Child_Items_": [ { "_ObjectType_": "SP.Web", ... truncated for brevity ... "LastItemModifiedDate": "\/Date(1603731388000)\/" } ]
Failed query response
If the test script fails, either:
-
An error code is generated. For example, an error code 401.
-
An error message with explanatory information is returned. The following is a sample of a failed response:
Credential Invoke-RestMethod : The remote server returned an error: (401) Unauthorized. At C:\Users\nicho\Documents\test-sharepoint-permissions.ps1:47 char:13 + $response = Invoke-RestMethod "${site_col_url}/_vti_bin/sites.asmx" - ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebExc eption + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand Invoke-RestMethod : The remote server returned an error: (401) Unauthorized. At C:\Users\nicho\Documents\test-sharepoint-permissions.ps1:100 char:13 + $response = Invoke-RestMethod "${site_col_url}/_vti_bin/client.svc/Pr ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebExc eption + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand