> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# File Upload Pro

> The File Upload Pro connector provides a convenient way to quickly ingest data from your local filesystem.

export const schema = {
  "type": "object",
  "title": "File Upload Pro",
  "description": "A connector that can crawl a single file that has been uploaded through Fusion's Blob Store.",
  "required": ["id", "properties", "pipeline"],
  "properties": {
    "id": {
      "type": "string",
      "title": "Configuration ID",
      "description": "A unique identifier for this Configuration.",
      "minLength": 1,
      "pattern": "^[a-zA-Z0-9_-]+$"
    },
    "pipeline": {
      "type": "string",
      "title": "Pipeline ID",
      "description": "Name of the IndexPipeline used for processing output.",
      "minLength": 1,
      "pattern": "^[a-zA-Z0-9_-]+$"
    },
    "parserId": {
      "type": "string",
      "title": "Parser ID",
      "description": "The Parser to use in the associated IndexPipeline.",
      "pattern": "^[a-zA-Z0-9_-]+$"
    },
    "description": {
      "type": "string",
      "title": "Description",
      "description": "Optional description",
      "hints": ["lengthy"],
      "maxLength": 125
    },
    "modified": {
      "type": "string",
      "title": "Date Modified",
      "description": "The date at which this Configuration was last modified.",
      "hints": ["readonly", "hidden"]
    },
    "created": {
      "type": "string",
      "title": "Date Created",
      "description": "The date at which this Configuration was created.",
      "hints": ["readonly", "hidden"]
    },
    "type": {
      "type": "string",
      "title": "Type",
      "description": "A type ID for this connector.",
      "hints": ["readonly", "hidden"]
    },
    "diagnosticLogging": {
      "type": "boolean",
      "title": "Diagnostic Logging",
      "description": "[Deprecated] Enables verbose diagnostic logging for troubleshooting. May increase log volume. Disabled by default.",
      "default": false
    },
    "collection": {
      "type": "string",
      "title": "Collection ID",
      "description": "The associated content Collection.",
      "hints": ["readonly", "hidden"],
      "minLength": 1,
      "pattern": "^[a-zA-Z0-9_-]+$"
    },
    "type_description": {
      "type": "string",
      "title": "Type Description",
      "default": "A connector that can crawl a single file that has been uploaded through Fusion's Blob Store.",
      "hints": ["hidden", "readonly"]
    },
    "category": {
      "type": "string",
      "title": "Category",
      "default": "A connector that can crawl a single file that has been uploaded through Fusion's Blob Store.",
      "hints": ["hidden", "readonly"]
    },
    "connector": {
      "title": "Connector Type",
      "description": "Connector type.",
      "minLength": 1,
      "type": "string",
      "hints": ["hidden"]
    },
    "properties": {
      "type": "object",
      "title": "File Upload Connector properties",
      "description": "Plugin specific properties.",
      "required": ["mediaType", "fileId"],
      "properties": {
        "fileId": {
          "type": "string",
          "title": "File ID",
          "description": "The File ID of the blob record that stores this file.",
          "minLength": 1
        },
        "mediaType": {
          "type": "string",
          "title": "File Media Type",
          "description": "The Media type (aka, MIME type or Content-Type) to associate this file with. Useful during file parsing. Default is application/octet-stream. Update this value with the correct media type of the file in the BlobStore .e.g if file is a json, the media type is application/json.",
          "default": "application/octet-stream"
        }
      }
    },
    "coreProperties": {
      "type": "object",
      "title": "Core Properties",
      "description": "Common behavior and performance settings.",
      "required": [],
      "properties": {
        "fetchSettings": {
          "type": "object",
          "title": "Fetch Settings",
          "description": "System level settings for controlling fetch behavior and performance.",
          "required": [],
          "properties": {
            "numFetchThreads": {
              "type": "number",
              "title": "Fetch Threads",
              "description": "Maximum number of fetch threads; defaults to 5. This setting controls the number of threads that call the Connectors fetch method. Higher values can, but not always, help with overall fetch performance.",
              "default": 5,
              "hints": ["hidden"],
              "maximum": 500,
              "exclusiveMaximum": false,
              "minimum": 1,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "indexingThreads": {
              "type": "number",
              "title": "Index Subscription Threads",
              "description": "Maximum number of indexing threads; defaults to 4. This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource. Higher values can sometimes help with overall fetch performance.",
              "default": 4,
              "hints": ["hidden"],
              "maximum": 10,
              "exclusiveMaximum": false,
              "minimum": 1,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "pluginInstances": {
              "type": "number",
              "title": "Number of plugin instances for distributed fetching",
              "description": "Maximum number of plugin instances for distributed fetching. Only specified number of plugin instances will do fetching. This is useful for distributing load between different instances.",
              "default": 0,
              "hints": ["hidden"],
              "maximum": 500,
              "exclusiveMaximum": false,
              "minimum": 0,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "fetchResponseScheduledTimeout": {
              "type": "number",
              "title": "Fetch response scheduled timeout(ms)",
              "description": "The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.",
              "default": 300000,
              "hints": ["hidden"],
              "maximum": 500000,
              "exclusiveMaximum": false,
              "minimum": 1000,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "indexingInactivityTimeout": {
              "type": "number",
              "title": "Indexing inactivity timeout(seconds)",
              "description": "The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.",
              "default": 86400,
              "hints": ["hidden"],
              "maximum": 691200,
              "exclusiveMaximum": false,
              "minimum": 60,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "pluginInactivityTimeout": {
              "type": "number",
              "title": "Plugin inactivity timeout(seconds)",
              "description": "The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.",
              "default": 600,
              "hints": ["hidden"],
              "maximum": 691200,
              "exclusiveMaximum": false,
              "minimum": 60,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "indexMetadata": {
              "type": "boolean",
              "title": "Index metadata",
              "description": "When enabled the metadata of skipped items will be indexed to the content collection.",
              "default": false,
              "hints": ["hidden"]
            },
            "indexContentFields": {
              "type": "boolean",
              "title": "Index content fields",
              "description": "When enabled, content fields will be indexed to the crawl-db collection.",
              "default": false,
              "hints": ["hidden"]
            },
            "fetchItemQueueSize": {
              "type": "number",
              "title": "Fetch Item Queue Size",
              "description": "Size of the fetch item queue.Larger values result in increased memory usage, but potentially higher performance.Default is 10k.",
              "default": 10000,
              "hints": ["hidden"],
              "maximum": 500000,
              "exclusiveMaximum": false,
              "minimum": 1,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "fetchRequestCheckInterval": {
              "type": "number",
              "title": "Fetch request check interval(ms)",
              "description": "The amount of time to wait before check if a request is done",
              "default": 15000,
              "hints": ["hidden"],
              "maximum": 500000,
              "exclusiveMaximum": false,
              "minimum": 1000,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "fetchResponseCompletedTimeout": {
              "type": "number",
              "title": "Fetch response completion timeout(ms)",
              "description": "The maximum amount of time for a response to be completed. If exceeded, the task will be retried if the job is still running",
              "default": 300000,
              "hints": ["hidden"],
              "maximum": 600000,
              "exclusiveMaximum": false,
              "minimum": 1,
              "exclusiveMinimum": false,
              "multipleOf": 1
            },
            "asyncParsing": {
              "type": "boolean",
              "title": "Async Parsing",
              "description": "When enabled, content will be indexed asynchronously.",
              "default": false
            }
          }
        },
        "skipConfigValidation": {
          "type": "boolean",
          "title": "Skip Validation",
          "description": "Enable to skip configuration validation when it takes too long and causes timeout issue",
          "default": false
        }
      },
      "hints": ["advanced"]
    }
  },
  "category": "Repository",
  "categoryPriority": 1
};

export const SchemaParamFields = ({schema}) => {
  const sanitize = str => {
    if (typeof str !== "string") return str;
    return str.replace(/^"(.*)"$/s, "$1").replace(/\\/g, "").replace(/"/g, "'");
  };
  const formatDescription = str => {
    const s = sanitize(str);
    return (/[.!?]\)*$/).test(s) ? s : `${s}.`;
  };
  const {description, properties = {}, required: requiredProps = []} = schema;
  const visibleProps = useMemo(() => Object.entries(properties).filter(([, prop]) => !prop.hints?.includes("hidden")), [properties]);
  return <div>
      {description && <p>{formatDescription(description)}</p>}

      {visibleProps.map(([name, prop]) => {
    const isRequired = requiredProps.includes(name);
    const hasDefault = prop.default !== undefined;
    const rawDefault = prop.default;
    const isComplexDefault = hasDefault && (typeof rawDefault === "object" || typeof rawDefault === "string" && (rawDefault.length > 20 || rawDefault.includes('"')));
    const fieldProps = {
      key: name,
      body: prop.title || name,
      type: prop.type,
      ...prop.title && ({
        post: [<><span className="text-stone-400 dark:text-stone-500">API property: </span>{name}</>]
      }),
      ...isRequired && ({
        required: true
      }),
      ...!isComplexDefault && hasDefault ? {
        default: sanitize(String(rawDefault))
      } : {}
    };
    const isObject = prop.type === "object" && prop.properties;
    const isArrayOfObjects = prop.type === "array" && prop.items?.type === "object" && prop.items.properties;
    return <ParamField {...fieldProps}>
            {prop.description && <p>{formatDescription(prop.description)}</p>}

            {isComplexDefault && <div className="flex">
                <p>
                  <strong>Default:</strong>
                </p>
                <pre className="!my-0">
                  <code>
                    {JSON.stringify(rawDefault, null, 2)}
                  </code>
                </pre>
              </div>}

            {isArrayOfObjects && <div className="flex">
              <p>
                <strong>Object attributes:</strong>
              </p>
              <pre className="!my-0">
                <code>
                  {'{\n'}
                  {Object.entries(prop.items.properties).map(([iname, iprop]) => <>
                      {`  ${iname}`}
                      {prop.items?.required?.includes(iname) && <span style={{
      color: 'red'
    }}> required</span>}
                      {`: {\n    display name: ${sanitize(iprop.title || '')}\n    type: ${iprop.type}\n  }\n`}
                    </>)}
                  {'}'}
                </code>
              </pre>
              </div>}

            {isObject && <Expandable title="properties">
                <SchemaParamFields schema={{
      properties: prop.properties,
      required: prop.required
    }} />
              </Expandable>}
          </ParamField>;
  })}
    </div>;
};

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/fusion-connectors/connectors/fileupload-v2

[mintlify link]: https://doc.lucidworks.com/docs/fusion-connectors/connectors/fileupload-v2

[old doc.lw link]: https://doc.lucidworks.com/fusion-connectors/r9iys5

<Callout icon="plug" color="#A4C6F7" iconType="solid">
  * **Latest version:** v1.1.0
  * **Compatible with Fusion version:** 5.9.0 and later
</Callout>

A common use for the File Upload connector is parsing and indexing information stored locally in a CSV file. It also works with JSON files, PDF files, and more. Files ingested by the File Upload connector are uploaded to the [blob store](/docs/5/fusion/getting-data-in/blob-storage), where a file ID is created while the contents of the file are indexed into Fusion.

<LwTemplate />

## Remote connectors

V2 connectors support [running remotely](/docs/fusion-connectors/developers/remote-v2-connectors) in Fusion versions 5.7.1 and later.

<Accordion title="Configure Remote V2 Connectors">
  If you need to index data from behind a firewall, you can configure a V2 connector to run remotely on-premises using TLS-enabled gRPC.

  ## Prerequisites

  Before you can set up an on-prem V2 connector, you must configure the egress from your network to allow HTTP/2 communication into the Fusion cloud. You can use a [forward proxy server](#egress-and-proxy-server-configuration) to act as an intermediary between the connector and Fusion.

  The following is required to run V2 connectors remotely:

  * The [plugin zip file and the connector-plugin-standalone JAR](https://plugins.lucidworks.com/).
  * A configured connector backend gRPC endpoint.
  * Username and password of a user with a `remote-connectors` or `admin` role.
  * If the host where the remote connector is running is not configured to trust the server’s TLS certificate, you must configure the file path of the trust certificate collection.

  <Note>If your version of Fusion doesn’t have the `remote-connectors` role by default, you can create one. No API or UI permissions are required for the role.</Note>

  ## Connector compatibility

  Only V2 connectors are able to run remotely on-premises.
  You also need the remote connector client JAR file that matches your Fusion version.
  You can download the latest files at [V2 Connectors Downloads](/docs/fusion-connectors/downloads/v2-connectors-downloads).

  <Note>Whenever you upgrade Fusion, you must also update your remote connectors to match the new version of Fusion.</Note>

  The gRPC connector backend is not supported in Fusion environments deployed on AWS.

  ## System requirements

  The following is required for the on-prem host of the remote connector:

  * (Fusion 5.9.0-5.9.10) JVM version 11
  * (Fusion 5.9.11) JVM version 17
  * Minimum of 2 CPUs
  * 4GB Memory

  Note that memory requirements depend on the number and size of ingested documents.

  ## Enable backend ingress

  In your `values.yaml` file, configure this section as needed:

  ```yaml theme={"dark"}
  ingress:
    enabled: false
    pathtype: "Prefix"
    path: "/"
    #host: "ingress.example.com"
    ingressClassName: "nginx"   # Fusion 5.9.6 only
    tls:
      enabled: false
      certificateArn: ""
      # Enable the annotations field to override the default annotations
      #annotations: ""
  ```

  * Set `enabled` to `true` to enable the backend ingress.
  * Set `pathtype` to `Prefix` or `Exact`.
  * Set `path` to the path where the backend will be available.
  * Set `host` to the host where the backend will be available.
  * In Fusion 5.9.6 *only*, you can set `ingressClassName` to one of the following:
    * `nginx` for Nginx Ingress Controller
    * `alb` for AWS Application Load Balancer (ALB)
  * Configure TLS and certificates according to your CA’s procedures and policies.

    <Note>  TLS must be enabled in order to use AWS ALB for ingress.</Note>

  ## Connector configuration example

  ```yaml theme={"dark"}
  kafka-bridge:
    target: mynamespace-connectors-backend.lucidworkstest.com:443 # mandatory
    plain-text: false # optional, false by default.  
      proxy-server: # optional - needed when a forward proxy server is used to provide outbound access to the standalone connector
      host: host
      port: some-port
      user: user # optional
      password: password # optional
    trust: # optional - needed when the client's system doesn't trust the server's certificate
      cert-collection-filepath: path1

  proxy: # mandatory fusion-proxy
    user: admin
    password: password123
    url: https://fusiontest.com/ # needed only when the connector plugin requires blob store access

  plugin: # mandatory
    path: ./fs.zip
    type: #optional - the suffix is added to the connector id
      suffix: remote
  ```

  ### Minimal example

  ```yaml theme={"dark"}
  kafka-bridge:
    target: mynamespace-connectors-backend.lucidworkstest.com:443

  proxy:
    user: admin
    password: "password123"

  plugin:
    path: ./testplugin.zip
  ```

  ### Logback XML configuration file example

  ```xml theme={"dark"}
  <configuration>
      <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
          <encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
              <pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
              <charset>utf8</charset>
          </encoder>
      </appender>

      <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
          <file>${LOGDIR:-.}/connector.log</file>
          <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
              <!-- rollover daily -->
              <fileNamePattern>${LOGDIR:-.}/connector-%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
              <maxFileSize>50MB</maxFileSize>
              <totalSizeCap>10GB</totalSizeCap>
          </rollingPolicy>
          <encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
              <pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
              <charset>utf8</charset>
          </encoder>
      </appender>

      <root level="INFO">
          <appender-ref ref="CONSOLE"/>
          <appender-ref ref="FILE"/>
      </root>
  </configuration>
  ```

  ## Run the remote connector

  ```java theme={"dark"}
  java [-Dlogging.config=[LOGBACK_XML_FILE]] \
    -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]
  ```

  The `logging.config` property is optional. If not set, logging messages are sent to the console.

  ## Test communication

  You can run the connector in communication testing mode. This mode tests the communication with the backend without running the plugin, reports the result, and exits.

  ```java theme={"dark"}
  java -Dstandalone.connector.connectivity.test=true -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]
  ```

  ## Encryption

  In a deployment, communication to the connector’s backend server is encrypted using TLS. You should only run this configuration without TLS in a testing scenario. To disable TLS, set `plain-text` to `true`.

  ## Egress and proxy server configuration

  One of the methods you can use to allow outbound communication from behind a firewall is a proxy server. You can configure a proxy server to allow certain communication traffic while blocking unauthorized communication. If you use a proxy server at the site where the connector is running, you must configure the following properties:

  * **Host.** The hosts where the proxy server is running.
  * **Port.** The port the proxy server is listening to for communication requests.
  * **Credentials.** Optional proxy server user and password.

  When you configure egress, it is important to disable any connection or activity timeouts because the connector uses long running gRPC calls.

  ## Password encryption

  If you use a login name and password in your configuration, run the following utility to encrypt the password:

  1. Enter a user name and password in the [connector configuration YAML](#configuration).

  2. Run the standalone JAR with this property:

     ```java theme={"dark"}
     -Dstandalone.connector.encrypt.password=true
     ```

  3. Retrieve the encrypted passwords from the log that is created.

  4. Replace the clear password in the configuration YAML with the encrypted password.

  ## Connector restart (5.7 and earlier)

  The connector will shut down automatically whenever the connection to the server is disrupted, to prevent it from getting into a bad state. Communication disruption can happen, for example, when the server running in the `connectors-backend` pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.

  You can use Linux scripts and utilities to restart the connector automatically, such as [Monit](https://mmonit.com/monit/).

  ## Recoverable bridge (5.8 and later)

  If communication to the remote connector is disrupted, the connector will try to recover communication and gRPC calls. By default, six attempts will be made to recover each gRPC call. The number of attempts can be configured with the `max-grpc-retries` bridge parameters.

  ## Job expiration duration (5.9.5 only)

  The timeout value for irresponsive backend jobs can be configured with the `job-expiration-duration-seconds` parameter. The default value is `120` seconds.

  ## Use the remote connector

  Once the connector is running, it is available in the Datasources dropdown. If the standalone connector terminates, it disappears from the list of available connectors. Once it is re-run, it is available again and configured connector instances will not get lost.

  ## Enable asynchronous parsing (5.9 and later)

  To separate document crawling from document parsing, enable Tika Asynchronous Parsing on remote V2 connectors.
</Accordion>

Below is an example configuration showing how to specify the file system to index under the `connector-plugins` entry in your `values.yaml` file:

```yaml wrap  theme={"dark"}
additionalVolumes:
- name: fusion-data1-pvc
    persistentVolumeClaim:
    claimName: fusion-data1-pvc
- name: fusion-data2-pvc
    persistentVolumeClaim:
    claimName: fusion-data2-pvc
additionalVolumeMounts:
- name: fusion-data1-pvc
    mountPath: "/connector/data1"
- name: fusion-data2-pvc
    mountPath: "/connector/data2"
```

You may also need to specify the user that is authorized to access the file system, as in this example:

```yaml wrap  theme={"dark"}
securityContext:
    fsGroup: 1002100000
    runAsUser: 1002100000
```

## Configuration

<Tip>
  When entering configuration values in the UI, use *unescaped* characters, such as `\t` for the tab character. When entering configuration values in the API, use *escaped* characters, such as `\\t` for the tab character.
</Tip>

<SchemaParamFields schema={schema} />
