Avoid SharePoint Throttling
When using a SharePoint connector to crawl SharePoint Online, rate limiting can be an issue. Learn more about throttling in SharePoint Online.
You have a few options to avoid throttling:
Decrease the number of threads
If you see many 429
/503
errors, you are probably hitting SharePoint Online with too many concurrent fetchers.
-
Set Crawl Performance > Fetch Threads to a lower value.
-
Set Crawl Performance > Prefetch Threads to a lower value.
Stagger the datasource jobs
If you have multiple SharePoint Online datasource jobs that run at the same time, use the Scheduler to stagger their schedules instead.
Increase the number of retries
By default, the connector is configured with retries. This provides a chance for the requests that were rate-limited to run again.
You can increase the number of retries and the interval between retries. The process is called exponential backoff, which gradually increases the delays between retries to increase the chances of a successful retry. This helps prevent missing documents due to rate limiting.
For SharePoint Optimized V2, retry configuration parameters include:
-
Retry Delay
-
Maximum Retries
-
Delay Factor
-
Maximum Delay Time
-
Maximum Time Limit
When you are receiving too many rate limiting errors, it is likely too many requests are being sent too frequently. Retrying may not help. One option is to decrease your traffic instead. If you want to continue sending the maximum number of requests, configure the Retryer backoff multiplier so it gets larger after every retry. The crawler will slow significantly and allow SharePoint to relax the throttling. |