File Upload V2Connector Configuration Reference

Table of Contents

Remote connectors
Configuration

The File Upload V2 connector provides a convenient way to quickly ingest data from your local filesystem.

A common use for the File Upload connector is parsing and indexing information stored locally in a CSV file. It also works with JSON files, PDF files, and more. Files ingested by the File Upload connecter are uploaded to the blob store, where a file ID is created while the contents of the file are indexed into Fusion.

Remote connectors

V2 connectors support running remotely in Fusion versions 5.7.1 and later. Refer to Configure Remote V2 Connectors.

Below is an example configuration showing how to specify the file system to index under the connector-plugins entry in your values.yaml file:

additionalVolumes:
- name: fusion-data1-pvc
    persistentVolumeClaim:
    claimName: fusion-data1-pvc
- name: fusion-data2-pvc
    persistentVolumeClaim:
    claimName: fusion-data2-pvc
additionalVolumeMounts:
- name: fusion-data1-pvc
    mountPath: "/connector/data1"
- name: fusion-data2-pvc
    mountPath: "/connector/data2"

You may also need to specify the user that is authorized to access the file system, as in this example:

securityContext:
    fsGroup: 1002100000
    runAsUser: 1002100000

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

A connector that can crawl a single file that has been uploaded through Fusion's Blob Store.

description - string

Optional description

<= 125 characters

pipeline - stringrequired

Name of the IndexPipeline used for processing output.

>= 1 characters

Match pattern: ^[a-zA-Z0-9_-]+$

diagnosticLogging - boolean

Enable diagnostic logging; disabled by default

Default: false

parserId - string

The Parser to use in the associated IndexPipeline.

Match pattern: ^[a-zA-Z0-9_-]+$

coreProperties - Core Properties

Common behavior and performance settings.

fetchSettings - Fetch Settings

System level settings for controlling fetch behavior and performance.

indexingThreads - number

Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.

>= 1

<= 10

exclusiveMinimum: false

exclusiveMaximum: false

Default: 4

Multiple of: 1

pluginInstances - number

Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.

<= 500

exclusiveMinimum: false

exclusiveMaximum: false

Default: 0

Multiple of: 1

fetchResponseScheduledTimeout - number

The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.

>= 1000

<= 500000

exclusiveMinimum: false

exclusiveMaximum: false

Default: 300000

Multiple of: 1

indexingInactivityTimeout - number

The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.

>= 60

<= 691200

exclusiveMinimum: false

exclusiveMaximum: false

Default: 86400

Multiple of: 1

pluginInactivityTimeout - number

The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.

>= 60

<= 691200

exclusiveMinimum: false

exclusiveMaximum: false

Default: 600

Multiple of: 1

numFetchThreads - number

Maximum number of fetch threads; defaults to 5.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.

>= 1

<= 500

exclusiveMinimum: false

exclusiveMaximum: false

Default: 5

Multiple of: 1

indexMetadata - boolean

When enabled the metadata of skipped items will be indexed to the content collection.

Default: false

indexContentFields - boolean

When enabled, content fields will be indexed to the crawl-db collection.

Default: false

asyncParsing - boolean

When enabled, content will be indexed asynchronously.

Default: false

id - stringrequired

A unique identifier for this Configuration.

>= 1 characters

Match pattern: ^[a-zA-Z0-9_-]+$

properties - File Upload Connector properties

Plugin specific properties.

fileId - string

The File ID of the blob record that stores this file.

>= 1 characters

mediaType - string

The Media type (aka, MIME type or Content-Type) to associate this file with. Useful during file parsing. Default is application/octet-stream. Update this value with the correct media type of the file in the BlobStore .e.g if file is a json, the media type is application/json.

Default: application/octet-stream