- Connector architecture
- V1 and V2 platform versions
- Built-in connectors
- Connector logs
Connectors are the built-in mechanism for pulling your data into Fusion. Lucidworks provides a wide variety of connectors, each specialized for a particular data type. When you add a datasource to a collection, you specify the connector to use for ingesting data. See the complete list of connectors, with links to configuration reference information for each one.
As of Fusion Server 4.0, connector plugins can be hosted within Fusion, or can run remotely. The communication of messages between Fusion and a remote Connector or hosted Connector are identical; Fusion sees them as the same kind of Connector. This means you can implement a plugin locally, connect to a remote Fusion for initial testing, and when done, upload the same artifact into Fusion, so Fusion can host it for you.
The connectors architecture was designed to be scalable. Depending on the connector, jobs can now be scaled by adding new instances of the connector. The fetching process for these new types also supports distributed fetching, so that many instances can contribute to the same job.
In the hosted case, connectors are cluster aware. This means that when a new instance of Fusion starts up, the connectors on other Fusion nodes become aware of the new connectors, and vice versa. This makes scaling the crawling process very natural and simple.
SDK connectors can be hosted within Fusion Server or can run remotely. In the remote case, connectors become clients of Fusion. These clients run a very lightweight process and communicate to Fusion using a very efficient messaging format. This option makes it possible to put the connector wherever the data lives. This may be done for performance reasons, or for security/access reasons. See Remote Connectors for more details.
V1 and V2 platform versions
Initially, Fusion offered classic connectors, or V1, connectors. V1 connectors were developed with general-purpose crawler framework called Anda, created by Lucidworks. Anda helps simplify and streamline crawler development, reducing the task of developing a new crawler to gain access to your data.
As of version 4.1.0, Fusion began offering V2 connectors, which utilize a Java SDK framework. The V2 platform version is included by default for all connectors it is available for.
Starting with version 5.2.0, V1 connectors are included in the Fusion image. Fusion knows where to find compatible V1 connectors locally for installation at any time through the UI (under Datasources) or via the Connector Plugins Repository API.
In addition to the features and benefits provided by V1 connectors, V2 connectors offer:
Security Access-control Lists (ACL) which are separate from content
Fusion connectors support SSL/TLS security
Improved scalability, depending on the connector
Jobs can be scaled by simply adding instances of the connector
The fetching process supports distributed fetching, allowing many instances to contribute to the same job
Connectors can be hosted within Fusion, or can run remotely
Hosted connectors are cluster-aware, allowing connectors on separate notes to become of new connectors
Remote connectors become clients of Fusion and run a lightweight process and communicate to Fusion using an efficient messaging format
Remote connectors can be located wherever the data is located, which might be required for performance or security and access
Google’s fast and efficient framework gRPC is used as the underlying client/server technology
Increased flexibility in the way services and their methods are defined
HTTP/2 based transport
Efficient serialization format for data handling (protocol buffers)
Allows bi-directional/multiplexed stream
Fusion comes with a standard set of built-in connectors:
Additional connectors are available for download at http://lucidworks.com/connectors/. You can look in
fusion/4.0.x/apps/connectors/connectors-rpc/plugins/ to see which additional connectors are currently installed.
You can find connector logs in
SDK connectors support Diagnostic Mode, which enables Fusion to print more detailed information to the logs about each request, including the ID of every document inserted, updated, or deleted in the oplog. More information on Diagnostic Mode can be found in the Configuration section of the connectors which offer it: