Box.com Connector and Datasource Configuration

The Box connector retrieves data from a Box.com cloud-based data repository. To fetch content from multiple Box users, you must create a Box app that uses OAuth 2.0 with JWT server authentication. For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication.

Note In Fusion 3.1, the Box connector uses an improved approach for crawling large Box data repositories. If you use the Box connector, we recommend that you upgrade to Fusion 3.1 or later.

The startLinks defined for the datasource must include the Box numeric file and directory IDs. The root directory of any Box account has ID 0 (zero) - if you want to crawl your entire Box repository, you should enter '0'.

During the crawl, the human-readable path (e.g., /FolderName/FileName.txt) will be added to each document as the title field.

See a tutorial about the complete configuration process here (full-screen recommended):

Box Authorization, Access, and Refresh Tokens

Fusion supports two methods of authentication with the Box API:

  • JSON Web Token (JWT)

  • OAuth2

Authentication Using JWT

Box.com has released a Box Developer Edition. The Box Developer Edition offers a new functionality where app users will no longer have to create their own Box accounts to use an application.

App Auth uses the JSON Web Token (JWT) authentication architecture to establish a trusted connection with Box, allowing an application to provision and manage a new type of Box account that removes the friction of multiple logins for users or the difficulty of managing services.

For this option, Fusion needs the inputs below to crawl your Box data.

Required options are highlighted.

UI Label,
API Name
Description

JWT App User ID
f.fs.appUserId

The Developer Edition API App User ID that you want to crawl as.

JWT Public Key ID
f.fs.publicKeyId

The public key prefix registered in Box Auth that you want to use to authenticate with.

JWT Private Key File Path
f.fs.privateKeyFile

The Developer Edition API App User ID that you want to crawl as.

JWT Private Key File Password
f.fs.privateKeyPassword

The password that secures the public key.

Tip
The biggest advantage to using the JWT App Auth Users approach is that you don’t have to generate new refresh tokens. The public/private key file combination remain valid indefinitely.

Authentication Using OAuth2

In this case, Fusion uses OAuth 2.0 to authenticate to a normal Box user when accessing the Box API.

Fusion needs the inputs below to crawl your Box data.

Required options are highlighted.

UI Label,
API Name
Description

OAuth Refresh Token
f.fs.refreshToken

The refresh token requires us to authenticate using the Box.com account you want to do the crawling to get it. To speed the process, we created a handy downloadable utility. Use it like this:

  1. Launch the executable JAR:

    java -jar /path/to/box-oauth-generator-1.0.jar
  2. Enter the requested data.

  3. Click Get Redirect Token.

    The utility obtains the refresh_token from Box.com using the specified credentials; it’s valid for about 60 days.

Box OAuth Refresh Token File
f.fs.refreshTokenFile

The filename in which to save the refresh token.

Tip
If you crawl your Box files at intervals of less than 60 days, Fusion will automatically update the refresh_token and store it in fusion/3.0.x/data/connectors/container/lucid.anda/<datasourceID>, where <datasourceID> is the ID of the datasource.

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.