Box.com Connector and Datasource Configuration

The Box connector retrieves data from a Box.com cloud-based data repository. To fetch content from multiple Box users, you must create a Box app that uses OAuth 2.0 with JWT server authentication. For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication.

Note In Fusion 3.1, the Box connector uses an improved approach for crawling large Box data repositories. If you use the Box connector, we recommend that you upgrade to Fusion 3.1 or later.

How the Box Connector Works

When you crawl a Fusion datasource that uses the Box connector, the Box connector performs a two-step process to crawl a Box data repository:

  1. Build a prefetch index – The Box connector crawls file metadata user-by-user. It creates a distributed prefetch index that describes the structure of files in the repository. The prefetch index contains basic file metadata—file IDs and the directory relationships. Fusion stores the prefetch index in Solr as a system collection.

  2. Build the file index The Box connector crawls files file-by-file. It uses the prefetch index to fetch the contents of files and metadata. It indexes the documents through Fusion’s index pipeline.

The prefetch index lets the Box connector crawl files randomly, file-by-file; instead of user-by-user. This gets around Box rate limits.

The initial crawl of a Box data repository can take a long time (hours or days). After the initial crawl, both the prefetch and main parts of the crawl are incremental, and they proceed much more quickly.

Overview of Steps

Note: These steps are for a multi-user Box.com data repository. For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication.

Following is an overview of the steps required to set up Box and Fusion, and to crawl a Box data repository.

Set Up Box:

  1. Sign up for a Box developer account.

  2. Enable 2-step verification.

  3. Create a Box app that Fusion can use to crawl the Box files.

  4. Configure your app to use a Box service account.

Set Up Fusion:

  1. Install Fusion’s Box Connector.

  2. Create datasources in Fusion that use the Box connector.

Crawl a Box Data Repository:

  1. Crawl the Fusion datasources.

Set Up Box

Set up Box so that Fusion can crawl Box data repositories.

Step 1: Sign Up for a Box Developer Account

If you already have an account, proceed to Step 2.

  1. Open the Box Developers web portal.

  2. In the upper right corner, click Sign Up.

  3. Enter the requested information and click Submit.

  4. Open the confirmation email and click Verify Email.

  5. Log in to your account.

Step 2: Enable 2-Step Verification

  1. Log in to your Box developer account as the Admin.

    1. Open the Box Developers web portal.

    2. In the upper right corner, click Log In.

  2. Enable 2-step verification for unrecognized logins:

    1. Open the Account Settings page. (You can reach this page from the drop-down menu under your initials.)

    2. Under Authentication, select Require 2-step verification for unrecognized logins.

    3. Choose your Country and enter a Mobile Phone Number, and then click Continue.

    4. Enter the verification code you receive, and then click Continue.

    5. If you are using a new mobile device, Box will send you a second code. Enter it, and then click Submit.

    6. Click Save Changes.

Step 3: Create a Box App that Fusion Can Use to Crawl the Box files

Create a Box app that uses OAuth 2.0 with JWT server authentication.

If you already have an app, configure it.

  1. Log in to your Box developer account as the Admin.

    1. Open the Box Developers web portal.

    2. In the upper right corner, click Log In.

  2. Open the page for creating a new app and click Create New App.

  3. Click Custom App, and then click Next.

  4. Click OAuth 2.0 with JWT (Server Authentication), and then click Next.

  5. Name your app, and then click Create App. The name must be globally unique across all apps created by all Box users.

  6. Click View Your App.

  7. Proceed to step 3 in the next section.

Step 4: Configure Your App to Use a Box Service Account

  1. Log in to your Box developer account as the Admin.

    1. Open the Box Developers web portal.

    2. In the upper right corner, click Log In.

  2. Open the Box Dev Console and click your app.

  3. Click Configuration.

  4. Configure scopes and advanced features:

    1. Under Application Scopes, deselect Manage groups.

    2. Under Advanced Features, deselect Generate User Access Tokens.

    3. Click Save Changes.

  5. Click Generate a Public/Private Keypair. Your browser downloads a file that contains the public/private key pair. This is the only copy of your private key. Store it securely. Click OK.

  6. Under OAuth 2.0 Credentials, click COPY for the Client ID.

  7. Authorize your app:

    1. Open the Box Admin Console.

    2. Click Settings > Enterprise Settings (or Business Settings) > Apps.

    3. Under Custom Applications, click Authorize New App.

    4. In the API Key box, paste the Client ID credential you copied in step 6, and then click Next.

    5. Read the App Authorization dialog and click Authorize.

    6. Close the Admin Console browser tab.

  8. Close the Dev Console browser tab.

Set Up Fusion

Set up Fusion to crawl Box data repositories.

Step 5: Install Fusion’s Box Connector

  1. At the top right of this documentation page, click Connector download.

  2. On the Connectors page, provide the requested contact information, select Box, and then click Submit.

  3. You will receive an email. Open it and click Get My Connectors.

  4. On the Connector Downloads page, click the Box connector to download it. Don’t expand the archive.

  5. Open the Fusion UI and click Devops > Blobs.

  6. Click Add.

  7. Select Connector Plugin.

  8. Click Choose File, select the file, and then click Open.

  9. Click Upload.

Step 6: Create Datasources

Create datasources that use the Box connector to access the Box data repository.

For each datasource:

  1. From the Fusion launcher, click Search > Home Home > Datasources.

  2. Click Add.

  3. Choose Box.com.

  4. Fill in the form. Note the following regarding configuration settings to use:

    Setting Notes

    Start Links

    Each start link defined for the datasource must consist of a numeric Box file ID or directory ID, preceded by a / (slash). The root directory of any Box account has ID 0 (zero). To crawl your entire Box repository, enter '/0'. These images indicate with underlines where you can get a folder ID or file ID. Select a folder or file at Box.com.

    Folder ID:

    Folder ID

    You would enter the start link /34192617287.

    File ID:

    File ID

    You would enter the start link /204871656422:

    API Key

    In the Box Developer Console, select the app. On the Configuration tab under OAuth 2.0 Credentials, use the Client ID.

    API Secret

    In the Box Developer Console, select the app. On the Configuration tab under OAuth 2.0 Credentials, use the Client Secret.

    JWT App User ID

    Email address that you use to sign in to your Box account

    JWT Public Key Id

    In the Box Developer Console, select the app. On the Configuration tab, under Add and Manage Public Keys, use the ID for a public key.

    JWT Private Key File

    Full path to the private-key file that Box downloaded (for the private key that corresponds to the public key you chose for JWT Public Key Id)

    JWT Private Key Password

    Passphrase for the private key (from the private-key file)

    Distributed crawl collection name

    Collection that will contain the index that results from the crawl

    Box.com children responses per page

    Use the default value of 1000.

    Nested folder depth limit

    Generally, you want a number that will crawl all documents, so keep the default value. For testing, you could reduce the number substantially to speed up the crawl.

    Number of partition buckets

    Divide the number of files by 5000. Use that number or 10000, whichever is smaller.

    Number distributed crawl datasources

    Use 1 to 27.

    Number of pre-fetch index creator threads

    A number between 2 and 5. Use 2 for small datasources and 5 for huge datasources (over 10 million files).

  5. Click Save.

Crawl a Box Data Repository

Crawl a Box data repository.

Step 7: Crawl the Fusion Datasources

Crawl the datasources, which use Fusion’s Box connector to access the Box data repository. Fusion’s Box connector uses the prefetch index to fetch the contents of each file from Box.com, get metadata from both the distributed index and Box.com, and index the documents through Fusion’s index pipeline.

You can:

  • Run the crawl now:

    1. From the Fusion launcher, click Search > Home Home > Datasources.

    2. Click the datasource.

    3. Click Start Crawl.

  • Schedule the crawl:

    1. From the Fusion launcher, click Devops > Home Home > Scheduler.

    2. Click the the row for the job that corresponds to the datasource.

    3. Specify schedule information, and then click Save.

Authentication Using OAuth 2.0

For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication.

  1. Log in to your Box developer account as the Admin.

    1. Open the Box Developers web portal.

    2. In the upper right corner, click Log In.

  2. Open the page for creating a new app and click Create New App.

  3. Click Custom App, and then click Next.

  4. Click Standard OAuth 2.0 (User Authentication), and then click Next.

  5. Name your app, and then click Create App. The name must be globally unique across all apps created by all Box users.

  6. Click View Your App.

  7. On the Configuration page:

    1. Click the Authentication Method Standard OAuth 2.0 (User Authentication).

    2. Set the Redirect URI to http://localhost or http://0.0.0.0. This address is not used by Fusion, but cannot be left blank.

    3. Click Save Changes.

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.