Configure a Box.com Datasource
The Box connector retrieves data from a Box.com cloud-based data repository.
Configuration overview
These steps are for a multi-user Box.com data repository. For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication. |
Following is an overview of the steps required to set up Box and Managed Fusion, and to crawl a Box data repository.
-
Sign up for a Box developer account.
-
Enable 2-step verification.
-
Create a Box app that Managed Fusion can use to crawl the Box files.
-
Configure your app to use a Box service account.
-
Install Managed Fusion’s Box Connector.
-
Create datasources in Managed Fusion that use the Box connector.
-
Crawl the Managed Fusion datasources.
Set Up Box
Set up Box so that Managed Fusion can crawl Box data repositories.
Step 1: Sign Up for a Box Developer Account
If you already have an account, proceed to Step 2.
-
Open the Box Developers Console.
-
In the top right corner, click Sign Up.
-
Select an appropriate Platform Developer plan.
-
Enter the requested information and click Submit.
-
Open the confirmation email and click Verify Email.
-
Log in to your Box account.
Step 2: Enable 2-Step Verification
-
Log in to your Box developer account.
-
Open the Box Developers Console.
-
Log in as the admin.
-
-
Create the Box account that you want to use for crawling.
-
Open the Users page in the Box admin console.
-
Click +Users to create a new user account.
-
Enter the Name and Email for the user, and then click Add user.
-
Click the user you just created to enter its user settings.
-
Make this user a Co-Admin by selecting Co-Admin checkbox. Once clicked, a pane titled "User is granted the following administrative privileges" appears. Select all of the following:
-
Manage users
-
Manage groups
-
View users' content
-
Log in to users' accounts
-
Run new reports and access existing reports
-
-
Click Save.
-
Close the Admin Console browser tab.
-
-
Enable 2-step verification for unrecognized logins:
-
Open the Account Settings page. (You can reach this page from the drop-down menu under your initials.)
-
On the Account tab, under Authentication, select Require 2-step verification for unrecognized logins.
-
Choose your Country and enter a Mobile Phone Number, and then click Continue.
-
Enter the verification code you receive, and then click Continue.
-
If you are using a new mobile device, Box will send you a second code. Enter it, and then click Submit.
-
Click Save Changes.
-
Step 3: Create a Box App that Managed Fusion Can Use to Crawl the Box files
Create a Box app that uses OAuth 2.0 with JWT server authentication.
If you already have an app, configure it.
-
Open the Box Developers Console.
-
Click Create New App.
-
Select Custom App, and then click Next.
-
Click OAuth 2.0 with JWT (Server Authentication), and then click Next.
-
Name your app, and then click Create App. The name must be globally unique across all apps created by all Box users.
-
Click View Your App.
Step 4: Configure Your App to Use a Box Service Account
-
Use OpenSSL to create a private/public key pair:
-
Install OpenSSL if you need to. Windows instructions are here.
-
Open a Command Prompt window and run these commands to generate a private/public key pair:
openssl genrsa -aes128 -out private_key.pem 2048
Enter a password for the private key when prompted.
openssl rsa -pubout -in private_key.pem -out public_key.pem
In the current directory of the Command Prompt, you now have private and public key files,
private_key.pem
andpublic_key.pem
respectively.
-
-
Open the Box Developers Console, log in as Admin if you are asked to log in, and click your app.
-
Configure scopes and advanced features:
-
Under Application Access, select Enterprise.
-
Under Application Scopes, deselect Manage groups.
-
Under Advanced Features, enable Generate User Access Tokens and Perform Actions as Users.
-
Click Save Changes.
-
-
In the Add and Manage Public Keys area, click Add a Public Key and paste the contents of the
public_key.pem
file (generated from the Box key creation) into the text box.-
Make a note of the new Public Key ID that you just created.
-
-
Authorize your app:
-
Open the Box Admin Console.
-
In the left navigation menu, click Settings > Enterprise Settings (or Business Settings) > Apps.
-
Under Custom Applications, click Authorize New App.
-
In the API Key box, paste the Client ID credential you copied in step 6, and then click Next.
-
Read the App Authorization dialog and click Authorize.
-
Close the Admin Console browser tab.
If you change your app’s configuration later, you must repeat this step to re-authorize your app. -
-
Close the Dev Console browser tab.
Set Up Managed Fusion
Set up Managed Fusion to crawl Box data repositories.
Step 5: Create Datasources
Create datasources that use the Box connector to access the Box data repository.
For each datasource:
-
In the Managed Fusion UI, Navigate to Indexing > Datasources.
-
Click Add.
-
Select Box (V2) under Not Installed.
-
Select Box (V2) again under Installed.
-
Fill in the form. Note the following regarding configuration settings to use:
Setting Notes Start Links
Each start link defined for the datasource must consist of a numeric Box file ID or directory ID. The root directory of any Box account has ID 0 (zero). To crawl your entire Box repository, enter '0'. These images indicate with underlines where you can get a folder ID or file ID. Select a folder or file at Box.com.
Folder ID:
Enter the start link
34192617287
.File ID:
Enter the start link
204871656422
:API Key
In the Box Developers Console, select the app. On the Configuration tab under OAuth 2.0 Credentials, use the Client ID.
API Secret
In the Box Developers Console, select the app. On the Configuration tab under OAuth 2.0 Credentials, use the Client Secret.
JWT App User ID
Email address that you use to sign in to your Box co-admin account. Use the Co-admin account you created earlier for this.
JWT Public Key Id
In the Box Developers Console, select the app. On the Configuration tab, under Add and Manage Public Keys, use the ID for a public key.
JWT Private Key
Base64-encoded contents of the private-key file that matches your JWT Public Key Id. Base64 encode the entire contents of the file, including the first and last lines.
JWT Private Key Password
Passphrase for the private key (from the private-key file you created during Box key creation).
Distributed crawl collection name
Collection that contains the pre-fetch index.
Box.com children responses per page
Use the default value of 1000.
Nested folder depth limit
Generally, you want a number to crawl all documents, so keep the default value. For testing, you could reduce the number substantially to speed up the crawl.
Number of partition buckets
Divide the number of files by 5000. Use that number or 10000, whichever is smaller.
Number distributed crawl datasources
Use 1 to 27.
Number of pre-fetch index creator threads
A number between 2 and 5. Use 2 for small datasources and 5 for huge datasources (over 10 million files).
-
Click Save.
Crawl a Box Data Repository
Crawl a Box data repository.
Step 6: Crawl the Managed Fusion Datasources
Crawl the datasources, which use Managed Fusion’s Box connector to access the Box data repository. Managed Fusion’s Box connector uses the pre-fetch index to fetch the contents of each file from Box.com, get metadata from both the distributed index and Box.com, and index the documents through Managed Fusion’s index pipeline.
You can:
-
Run the crawl now.
-
From the Managed Fusion launcher, click Search > Home > Datasources.
-
Click the datasource.
-
Click Start Crawl.
-