Skip to main content

Documentation Index

Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt

Use this file to discover all available pages before exploring further.

  • Latest version: v1.0.0
  • Compatible with Fusion version: 5.9.0 and later
The FTP Pro connector retrieves documents using the File Transfer Protocol (FTP), FTP Secure (FTPS), and SSH File Transfer Protocol (SFTP). This modern connector replaces the legacy FTP connector with enhanced security, intelligent retry mechanisms, and advanced filtering capabilities. Download this connector from the V2 and Pro connectors download page.

What are Pro connectors?

Pro connectors are built on the same framework as V2 connectors but meet higher internal standards for stability, reliability, and production readiness. If you’re currently using other V2 connectors, the process for installing and upgrading a Pro connector remains the same.

Protocol support

The FTP Pro connector supports FTP, FTPS, and SFTP connections.

FTP (File Transfer Protocol)

Standard unencrypted FTP connections. Use only for non-sensitive data or when connecting within a trusted network.

FTPS (FTP Secure)

FTP over explicit SSL/TLS encryption. The connector supports explicit FTPS connections that upgrade a standard FTP connection to use encryption.

SFTP (SSH File Transfer Protocol)

SSH-based file transfer protocol with built-in encryption. SFTP is the recommended protocol for secure file transfers and supports both password and public key authentication.

Business use cases

If you rely on FTP due to third-party integrations, regulatory or compliance requirements, restricted environments, or legacy systems, the FTP Pro connector lets you index data from an FTP server to Fusion for a modern search experience.
Customer self-service portals
Index product manuals, installation guides, troubleshooting documentation, and FAQs stored on FTP servers to power customer-facing search experiences. Customers can find answers quickly without contacting support, reducing ticket volume and improving satisfaction.
Product documentation and downloads
Make product specifications, datasheets, software installers, and firmware updates discoverable through customer portals. Filter by file type (PDF for manuals, ZIP for downloads) and use path patterns to organize content by product line or model number.
Media and resource libraries
Surface marketing materials, product images, videos, and tutorials for customers. Use file size filters to handle large media files and ensure customers can discover the right assets for their needs.

Prerequisites

Be sure you’ve satisfied these prerequisites so the FTP Pro connector can reliably access, crawl, and index your data.

FTP server access requirements

  • FTP or FTPS: Valid username and password with read permissions to the directories you want to crawl.
  • SFTP: Username with either password authentication or SSH private key authentication.
  • Network access: Ensure Fusion can reach the FTP server on the configured port. This port is typically port 21 for FTP and FTPS, or port 22 for SFTP.
  • Firewall rules: For Fusion cloud deployments, configure egress rules to allow connections to your FTP server.

Additional requirements for remote connectors

You may run this connector remotely. To run the connector remotely behind a firewall, you need:
  • A Fusion user with the remote-connectors or admin role for gRPC authentication
  • The connector-plugin-standalone.jar alongside the FTP Pro connector’s ZIP file on the remote host
  • A configured connector backend gRPC endpoint in your YAML configuration
  • A truststore file path if the remote host doesn’t trust Fusion’s TLS certificate
See Remote V2 connectors for information about configuring remote V2 and Pro connectors.

Authentication

The FTP Pro connector supports multiple authentication methods depending on your protocol and security requirements. Select the connection type and enter the relevant fields.

FTP and FTPS authentication

For FTP and FTPS connections, provide the following:
  • Username: Valid FTP account username
  • Password: Password for the FTP account
These credentials are used with Apache Commons Net for both standard FTP and explicit SSL/TLS (FTPS) connections.

SFTP password authentication

For SFTP connections using password authentication, provide the following:
  • Username: Valid SFTP account username
  • Password: Password for the SFTP account

SFTP public key authentication

For enhanced security, SFTP supports SSH public key authentication. Provide the following:
  • Username: Valid SFTP account username
  • SSH Private Key: The private key content (PEM format)
  • SSH Private Key Passphrase: (Optional) Passphrase if the private key is encrypted
This authentication method is recommended for production environments and automated indexing jobs where password rotation policies may disrupt scheduled crawls.

Host key verification for SFTP

You may configure host key verification for SFTP connections in order to verify your server’s identity and protect against man-in-the-middle attacks. Enter the SHA-256 or MD5 SSH host key fingerprint in the Trusted SSH Host Key Fingerprint field. You can obtain the host key fingerprint by connecting to the server with an SSH client and recording the fingerprint displayed during the first connection.
Host key verification is strongly recommended for production SFTP deployments to ensure you’re connecting to your server.

Crawl options

Maximum directory depth

The Maximum Depth Level setting limits how deep the connector descends into subdirectories. For example, with an initial path of /documents and a max depth of 2, the connector will crawl the contents of /documents/folder1/folder2/ but not the contents of /documents/folder1/folder2/folder3/. To crawl only top-level directories, set the field’s value to 1. To perform a full recursive crawl, set the value to 0.

Index folder metadata

When the Index Folder Metadata field is enabled, the FTP Pro connector indexes each directory as a document containing:
  • Directory name
  • Full path
  • Modification date
  • Permissions metadata
This setting is useful for creating a complete file system hierarchy in your search index or for finding directories by name.

Index failed documents

When a file fails to download or process, enabling the Index Failed Documents field creates a document with available metadata but without file content. This helps track which files may need attention. The indexed metadata includes the following values:
  • File name and path
  • File size
  • Modification date
  • Error information

Intelligent retry mechanism

The FTP Pro connector includes a retry mechanism. This ensures reliable indexing even when facing transient network failures, server timeouts, or temporary unavailability. Additionally, the connector uses exponential backoff between retries, starting with a short delay and increasing with each subsequent retry. This feature prevents overwhelming a struggling server while still providing multiple opportunities to succeed. You can set the number of retry attempts in the Retry Count field. The connector intelligently determines which errors should be retried and which indicate permanent failures. For FTP or FTPS errors:
  • 4xx response codes: Retry (temporary server issues)
  • 5xx response codes: Don’t retry (permanent errors)
For SFTP errors:
  • Network errors: Retry (temporary connectivity issues)
  • Permission errors: Don’t retry (permanent access denial)
General errors:
  • Timeouts: Retry (temporary network congestion)
  • DNS failures: Don’t retry (configuration error)
  • Authentication failures: Don’t retry (invalid credentials)
During datasource validation, the retry mechanism is automatically disabled to provide fast feedback. This ensures you quickly see any configuration or connection issues without waiting through retry delays.

Re-crawl indexing

The FTP Pro connector supports full re-crawls to keep your index synchronized with the FTP server. For new and modified content, the connector:
  • Indexes new files added since the last crawl
  • Re-indexes files modified since the last crawl based on the modification timestamp
  • Updates metadata for existing documents
The connector relies on Fusion’s stray deletion support to remove documents that no longer exist on the FTP server. During a re-crawl, the connector takes the following steps for deleted content:
  • The connector tracks all crawled documents in CrawlDB
  • During a re-crawl, documents not encountered are marked as stray
  • Stray documents are automatically deleted from the index
The re-crawl process ensures your search index accurately reflects the current state of your FTP server without manual cleanup.

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.