Process Sitemaps on Web Sites

The Web connector retrieves data from a Web site using HTTP and starting from a specified URL.

Crawling sitemaps is supported. Simply add the URL(s) of the sitemap to the f.sitemapURLs property (Sitemap URLs in the UI) and all of the URLs found in a sitemap are added to the list of URLs to crawl. Sitemap indexes (that is, a sitemap that points to other sitemaps) are also supported. The URLs found through each sitemap are added to the list of URLs to crawl.

To configure your datasource to crawl only the sitemap file, add the sitemap URL to both the startLinks property (because that is a required property for a datasource) and also to the f.sitemapsURL property so it is properly treated as a sitemap by the connector when it starts.