Google Drive Connector and Datasource Configuration
- Google Drive Authentication
- Get the Client ID and Client Secret
- Get a Refresh Token
- Googledrive Connector-specific Properties
- Limit Documents
- Crawl Performance
- Recrawl Rules
- Crawl History
- Field Mapping
- ConnectorDb Configuration
- General Configuration
The Google Drive connector is used to index the documents in a Google Drive account.
Since Google Drive uses OAuth for authentication, you must configure your Google Drive account to allow Fusion to access it, and then provide OAuth access keys to Fusion so the crawler can authenticate and get documents.
startLink used during configuration is a path to a directory. Google provides unique IDs for sub-directories. To index all documents that you have access to, provide 'root' as the
Google Drive Authentication
Fusion needs to be able to authenticate to Google Drive in order to fetch the documents for the index. In order to get the authentication credentials from Google Drive, you need to first register a new application and get a client ID and a client secret, then use those to get a refresh token. The client ID, client secret, and refresh token are all supplied to Fusion during configuration of the datasource.
|If you have an organization account with Google, understand that if you use your account to create the Client ID, Client Secret and Refresh Token the crawl will be limited only to the documents you have authorization to view. If you want to index all documents in your organization, you may need to speak with your Google Drive administrator for an account that has additional access rights.|
Get the Client ID and Client Secret
The following steps will provide you with a client id and client secret. You will use these to get the refresh token later.
Create a project in the Google Developers Console at https://console.developers.google.com/project. You can provide any project name and project ID; Google will pre-select a name and ID. Once the project has been created, you will see the Project Dashboard.
In the left-hand navigation bar, choose 'APIs & Auth', then choose 'APIs'.
Find the 'Drive API' in the list of available APIs and toggle it to 'On' on the right side of the screen. You will be asked to accept the Drive terms of service.
Once the Drive API has been enabled, choose 'Credentials' from the left navigation, under 'APIs & Auth'.
From this next screen, select 'Create new Client ID' from the OAuth section. Next choose 'Web Application' and click 'Configure consent screen'.
On the next screen, you only need to provide your email address (it should appear in the list) and a product name. This can be any name you want to use. Then click 'Save'.
On the next screen, again choose 'Web Application' and then provide the following URLs:
In the Authorized Redirect URIs box, replace the default text with 'https://developers.google.com/oauthplayground'.
The next screen will show the credentials for the web application you just created. You will need the Client ID and the Client Secret when you configure the datasource.
Get a Refresh Token
These steps will allow you to get the refresh token. You will need the Client ID and Client Secret from the previous step.
Access Google’s OAuth Playground at https://developers.google.com/oauthplayground/.
Click the gear icon at the upper right to open the Settings window and choose 'Use your own OAuth tokens'. Fill in the Client ID and Client Secret then close the Settings window.
From the list of APIs in the list on the left ('Step 1'), choose 'Drive APIs', and then select 'https://www.googleapis.com/auth/drive.readonly'. You don’t need to authorize the other APIs, since this is just to get a refresh token and does not impact the APIs that are authorized for your account. Click 'Authorize APIs'. You will be redirected to another page to authorize the API for your application. You may be prompted to login before you see the authorization page. Authorize the API, and then you will be redirected back to Step 2 of the OAuth Playground.
In Step 2, click 'Exchange authorization code for tokens'. The refresh token will be shown in the right side of the screen, at the end of the JSON response.
When entering configuration values in the UI, use unescaped characters, such as