How To
    Learn More V1 Connector

      The Box connector retrieves data from a cloud-based data repository. To fetch content from multiple Box users, you must create a Box app that uses OAuth 2.0 with JWT server authentication. For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication.

      How the Box Connector Works

      When you crawl a Fusion datasource that uses the Box connector, the Box connector performs a two-step process to crawl a Box data repository:

      1. Build a pre-fetch index – The Box connector crawls file metadata user-by-user. It creates a distributed pre-fetch index that describes the structure of files in the repository. The pre-fetch index contains basic file metadata—file IDs and the directory relationships. Fusion stores the pre-fetch index in Solr as a system collection called system_box_distributed_crawl, which is shared by all datasources.

        The pre-fetch index lets the Box connector crawl files randomly, file-by-file; instead of user-by-user. This gets around Box rate limits.

      2. Build the file index – The Box connector crawls files file-by-file. It uses the pre-fetch index to fetch the contents of files and metadata. It indexes the documents through Fusion’s index pipeline.

      The initial crawl of a Box data repository can take a long time (hours or days). After the initial crawl, both the pre-fetch and main parts of the crawl are incremental, and they proceed much more quickly.

      Fusion cannot delete the pre-fetch data, so if you want to perform a new crawl using a different start link, you must do one of the following in order to get new results:

      • Clear the system_box_distributed_crawl collection manually:

        1. Navigate to Collections > Collections Manager.

        2. Hover over system_box_distributed_crawl, and then click the Configure Configure icon.

        3. Click Clear Collection.

      • Create a new distributed crawl collection for the datasource by editing the Distributed crawl collection name field (f.fs.distributedCrawlCollectionName) in your datasource configuration.

      While crawling with folders as start links on the Box connector, the crawling user (JWT App User ID) must either:

      • Be the owner of folders

      • Have access to the start links folders

      The startLinks defined for the datasource must include the numeric Box file and directory IDs. The root directory of any Box account has an ID of 0 (zero). If you want to crawl your entire Box repository, you should enter '0'.