Box.com Connector and Datasource Configuration

The Box connector retrieves data from a Box.com cloud-based data repository. To fetch content from multiple Box users, you must register an application in order to obtain the proper authorization tokens.

Note In Fusion 3.1, the Box connector uses an improved approach for crawling large Box data repositories. If you use the Box connector, we recommend that you upgrade to Fusion 3.1 or later.

The startLinks defined for the datasource must include the Box numeric file and directory IDs. The root directory of any Box account has ID 0 (zero) - if you want to crawl your entire Box repository, you should enter '0'.

Box Authorization, Access, and Refresh Tokens

Fusion supports authentication using OAuth2.

Box.com uses a two-step process to get the correct tokens for access via the Box API. The first step uses a client_id and a client_secret code to get an authorization token which is used to make a second request to get an access token and a refresh token. The authorization token is only valid for a limited period. The access token is available for about 60 minutes, so the refresh token is used to make further requests for a new, valid, access token to be able to crawl your Box files.

Here’s how to get the required tokens:

  1. Open the Box Developer Console and create a new application, giving it a unique application name.

    The next screen is the application editing screen which contains the client_id and client_key values used for authorization.

  2. Copy the client_id and client_secret values

  3. Set the redirect_uri to http://localhost or http://0.0.0.0. This address is not used by Fusion, but cannot be left blank.

  4. Scroll to the bottom of the page and save the application.

  5. Construct a GET request (directly in your browser) that looks like the following, replacing CLIENT_ID and CLIENT_SECRET with the values Box.com provided when you registered your application:

    https://app.box.com/api/oauth2/authorize?response_type=code&client_id=CLIENT_ID&state=security_token=CLIENT_SECRET&redirect_uri=

    When you enter the GET request to your browser, you will be presented with a Box.com authentication screen to grant access to Box.

  6. Enter your Box username and password, and click "Authorize".

    The response will include the authorization code (the "code" property in the response). This code is only good for 30 seconds. If you miss the window when it is still valid, you will need to start over to get a new code.

    Here is a sample response:

    http://0.0.0.0/?state=security_token=MAgknRyPVRFUlaMHyD9h4kWCW4aytbqF&code=EQY21Goi9fv2bKxOqAN48QfxHOwysmXd

    The authorization code needs to be put into a POST request within 30 seconds, or the code will expire.

  7. Use the CLIENT_ID and CLIENT_SECRET from the earlier request:

    curl -X POST \-d 'grant_type=authorization_code&code=AUTH_CODE&client_id=CLIENT_ID&client_secret=CLIENT_SECRET' https://app.box.com/api/oauth2/token

    If the request is successful, the response will look like this:

    {
      "access_token":"qds2amBKVuiT7P2besq5t4sEv9Y9k4CL",
      "expires_in":3652,
      "restricted_to":[],
      "refresh_token":"cHjGe3BVc7UsvKoP6V4O9sayRpJYQVMWFImopVbpZ15omkAuK7oMXC2yOywjSPAu",
      "token_type":"bearer"
      }
  8. Save the refresh_token for use configuring the Box datasource in Fusion, together with the client_id and client_secret values. Fusion uses the refresh_token to get a valid access code for crawling, because the access code is only good for 1 hour,

The refresh_token is valid for about 60 days. After an initial crawl, if you do not crawl your Box files again within the 60 day window, you will need to repeat the process to get a new refresh_token and update the datasource with the new token. If, however, you crawl your Box files before the 60 day expiration window, Fusion will automatically update the refresh_token and store it in fusion/data/connectors/container/lucid.anda/datasourceID, where 'datasourceID' is the ID of the datasource. If you regularly crawl your Box files, you should never have to repeat the process of getting the authorization token, access token and refresh token.

Authentication Using OAuth2

In this case, Fusion uses OAuth 2.0 to authenticate to a normal Box user when accessing the Box API.

Fusion needs the inputs below to crawl your Box data.

Required options are highlighted.

UI Label,
API Name
Description

OAuth Refresh Token
f.fs.refreshToken

The refresh token requires us to authenticate using the Box.com account you want to do the crawling to get it. To speed the process, we created a handy downloadable utility. Use it like this:

  1. Launch the executable JAR:

    java -jar /path/to/box-oauth-generator-1.0.jar
  2. Enter the requested data.

  3. Click Get Redirect Token.

    The utility obtains the refresh_token from Box.com using the specified credentials; it’s valid for about 60 days.

Box OAuth Refresh Token File
f.fs.refreshTokenFile

The filename in which to save the refresh token.

Tip
If you crawl your Box files at intervals of less than 60 days, Fusion will automatically update the refresh_token and store it in fusion/data/connectors/container/lucid.anda/<datasourceID>, where <datasourceID> is the ID of the datasource.

Configuration