Skip to main content
  • Latest version: v1.0.0
  • Compatible with Fusion version: 5.4.0 and later
Due to rate limit restrictions in the Slack Web API, this connector has the following restrictions:
  • Only one connector-plugin instance should be started.
  • Only one fetching thread should be set when creating the datasource configuration.

Authentication

You must install a Slack application in the workspace that will be indexed before using the Slack V2 connector. See Slack’s Basic app setup for more information about installing a Slack application. Once the app is correctly installed in the workspace, Slack will provide an access token, which you will then use in the Slack V2 connector configuration.

Required application scopes

The Slack application must provide access to the following User Token Scopes in order to work with Slack V2 connector: Slack User Token Scopes

Initial actions

The Slack V2 connector will perform the following actions at the beginning of each job:
  • Load all users in the users cache. This information will be used later to resolve user names when processing mentions in messages.
    It will also be useful for avoiding requests to retrieve user’s metadata.
  • Load the channels to be processed using the channel filtering properties from the configuration.
    This will be useful to establish a source of truth when determining whether a channel should be indexed or deleted on each job.

Document filtering

Profiles indexing
Index profilesDetermines whether users are indexed
Index guest profilesDetermines whether profiles from guest users are indexed
Index inactive profilesDetermines whether profiles of deactivated users are indexed
Channels indexing
Index channelsDetermines whether a document per channel is indexed
Index archived channelsDetermines whether archived channels are indexed to the content collection
Index public channelsDetermines whether public channels are indexed
Index private channelsDetermines whether private channels are indexed
Index direct messagesDetermines whether direct and multi-party direct message only are indexed
Channels to includeA list of the channel names to be included in the crawling process
Messages indexing
Index messagesDetermines whether messages from the channels are indexed
Index bot messagesDetermines where messages from bots in channels are indexed
Index messages replies (threads)Determines whether the replies (if applicable) to messages are indexed

Crawling process

Due to limitations in the Slack Web API, edited and deleted messages cannot be retrieved. You must clear the datasource and start a new connectors job to index edited messages and/or remove deleted messages.

Crawl process diagram

Crawl process diagram

Endpoints used in the crawling process

EndpointsDescription
https://slack.com/api/users.listLists all users in the workspace. This is not controlled by the properties in Document Filtering. All users are retrieved at the beginning of each job.
https://slack.com/api/conversations.listLists the channels that should be indexed. This fetching is guided by the properties listed in Document Filtering.
https://slack.com/api/conversations.membersLists the members of specific channels being indexed.
https://slack.com/api/conversations.historyRetrieves a page of messages from a specific channel
https://slack.com/api/users.infoRetrieves the information of a single user, if it’s not present in the users cache (message document generation).
https://slack.com/api/conversations.infoRetrieves the information of a single channel, if it’s not present in the channel’s cache (message document generation).
https://slack.com/api/team.infoRetrieves the information of a team (message document generation).
https://slack.com/api/conversations.repliesRetrieves a page of replies from a specific message from a channel.

Message mentions processing

The Slack V2 connector retrieves messages from the Slack API in the following format:
 "type": "message",
 "text": "this is a mention in a message: <@U01U580HSDN>",
In this example, @U01U580HSDN is the user Id. The connector will resolve the user Id with the user’s display name, if it exists. Otherwise, it will use the full name in the user’s profile.

Incremental Crawling processors

Incremental crawling process Each incremental job will re-index the User ACL to update all users in the Access Control Collection, and the Channel ACL to update the members of each channel.
ProcessorDescription
ProfilesCheckpointProcessorPerforms the following actions:
● Triggers the indexing of all user profiles and the profile’s ACL
● Verifies that profiles should still be indexed. If not, it deletes them
● Verifies guest and deleted profiles
ChannelsCheckpointProcessorDetects new and deleted channels by comparison and the triggers the re-indexing of the channel ACL documents.
ChannelCheckpointProcessorVerifies that the current channel should still be indexed. If the channel should still be indexed, then it triggers the indexing of new messages providing the timestamp (ts) of the last seen message from the previous job.

Security filtering

Security filtering processors

ProcessorDescription
ProfileAclProcessorBuilds a document representing a Slack user, which is then sent to the Access Control Collection. Regular users can access public channels, even if they are not members of the channel. Guest users can access only the channels they are members of.
ChannelAclProcessorRetrieves the current members of the given channel, builds a document with those values, and sends the document to the Access Control Collection. In public channels, an extra document is sent to the Access Control Collection to provide access to regular users who are not members of public groups.
Access Control Document links

Security trimming rules

Security trimming uses the following rules:
  • All users (regular and guest) can see the profile of any user in the Slack workspace.
  • All regular users can access public channels and messages, even if they are not members of the channel.
  • Guest users can only access the public or private channels they are members of.
  • Only the members of private channels can access the channel and messages.
  • Only the users involved can access direct messages and multi-party direct messages.

Remote connectors

V2 connectors support running remotely in Fusion versions 5.7.1 and later.
If you need to index data from behind a firewall, you can configure a V2 connector to run remotely on-premises using TLS-enabled gRPC.

Prerequisites

Before you can set up an on-prem V2 connector, you must configure the egress from your network to allow HTTP/2 communication into the Fusion cloud. You can use a forward proxy server to act as an intermediary between the connector and Fusion.The following is required to run V2 connectors remotely:
  • The plugin zip file and the connector-plugin-standalone JAR.
  • A configured connector backend gRPC endpoint.
  • Username and password of a user with a remote-connectors or admin role.
  • If the host where the remote connector is running is not configured to trust the server’s TLS certificate, you must configure the file path of the trust certificate collection.
If your version of Fusion doesn’t have the remote-connectors role by default, you can create one. No API or UI permissions are required for the role.

Connector compatibility

Only V2 connectors are able to run remotely on-premises. You also need the remote connector client JAR file that matches your Fusion version. You can download the latest files at V2 Connectors Downloads.
Whenever you upgrade Fusion, you must also update your remote connectors to match the new version of Fusion.
The gRPC connector backend is not supported in Fusion environments deployed on AWS.

System requirements

The following is required for the on-prem host of the remote connector:
  • (Fusion 5.9.0-5.9.10) JVM version 11
  • (Fusion 5.9.11) JVM version 17
  • Minimum of 2 CPUs
  • 4GB Memory
Note that memory requirements depend on the number and size of ingested documents.

Enable backend ingress

In your values.yaml file, configure this section as needed:
ingress:
  enabled: false
  pathtype: "Prefix"
  path: "/"
  #host: "ingress.example.com"
  ingressClassName: "nginx"   # Fusion 5.9.6 only
  tls:
    enabled: false
    certificateArn: ""
    # Enable the annotations field to override the default annotations
    #annotations: ""
  • Set enabled to true to enable the backend ingress.
  • Set pathtype to Prefix or Exact.
  • Set path to the path where the backend will be available.
  • Set host to the host where the backend will be available.
  • In Fusion 5.9.6 only, you can set ingressClassName to one of the following:
    • nginx for Nginx Ingress Controller
    • alb for AWS Application Load Balancer (ALB)
  • Configure TLS and certificates according to your CA’s procedures and policies.
    TLS must be enabled in order to use AWS ALB for ingress.

Connector configuration example

kafka-bridge:
  target: mynamespace-connectors-backend.lucidworkstest.com:443 # mandatory
  plain-text: false # optional, false by default.  
    proxy-server: # optional - needed when a forward proxy server is used to provide outbound access to the standalone connector
    host: host
    port: some-port
    user: user # optional
    password: password # optional
  trust: # optional - needed when the client's system doesn't trust the server's certificate
    cert-collection-filepath: path1

proxy: # mandatory fusion-proxy
  user: admin
  password: password123
  url: https://fusiontest.com/ # needed only when the connector plugin requires blob store access

plugin: # mandatory
  path: ./fs.zip
  type: #optional - the suffix is added to the connector id
    suffix: remote

Minimal example

kafka-bridge:
  target: mynamespace-connectors-backend.lucidworkstest.com:443

proxy:
  user: admin
  password: "password123"

plugin:
  path: ./testplugin.zip

Logback XML configuration file example

<configuration>
    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
            <pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
            <charset>utf8</charset>
        </encoder>
    </appender>

    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOGDIR:-.}/connector.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <!-- rollover daily -->
            <fileNamePattern>${LOGDIR:-.}/connector-%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
            <maxFileSize>50MB</maxFileSize>
            <totalSizeCap>10GB</totalSizeCap>
        </rollingPolicy>
        <encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
            <pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
            <charset>utf8</charset>
        </encoder>
    </appender>

    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
        <appender-ref ref="FILE"/>
    </root>
</configuration>

Run the remote connector

java [-Dlogging.config=[LOGBACK_XML_FILE]] \
  -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]
The logging.config property is optional. If not set, logging messages are sent to the console.

Test communication

You can run the connector in communication testing mode. This mode tests the communication with the backend without running the plugin, reports the result, and exits.
java -Dstandalone.connector.connectivity.test=true -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]

Encryption

In a deployment, communication to the connector’s backend server is encrypted using TLS. You should only run this configuration without TLS in a testing scenario. To disable TLS, set plain-text to true.

Egress and proxy server configuration

One of the methods you can use to allow outbound communication from behind a firewall is a proxy server. You can configure a proxy server to allow certain communication traffic while blocking unauthorized communication. If you use a proxy server at the site where the connector is running, you must configure the following properties:
  • Host. The hosts where the proxy server is running.
  • Port. The port the proxy server is listening to for communication requests.
  • Credentials. Optional proxy server user and password.
When you configure egress, it is important to disable any connection or activity timeouts because the connector uses long running gRPC calls.

Password encryption

If you use a login name and password in your configuration, run the following utility to encrypt the password:
  1. Enter a user name and password in the connector configuration YAML.
  2. Run the standalone JAR with this property:
    -Dstandalone.connector.encrypt.password=true
    
  3. Retrieve the encrypted passwords from the log that is created.
  4. Replace the clear password in the configuration YAML with the encrypted password.

Connector restart (5.7 and earlier)

The connector will shut down automatically whenever the connection to the server is disrupted, to prevent it from getting into a bad state. Communication disruption can happen, for example, when the server running in the connectors-backend pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.You can use Linux scripts and utilities to restart the connector automatically, such as Monit.

Recoverable bridge (5.8 and later)

If communication to the remote connector is disrupted, the connector will try to recover communication and gRPC calls. By default, six attempts will be made to recover each gRPC call. The number of attempts can be configured with the max-grpc-retries bridge parameters.

Job expiration duration (5.9.5 only)

The timeout value for irresponsive backend jobs can be configured with the job-expiration-duration-seconds parameter. The default value is 120 seconds.

Use the remote connector

Once the connector is running, it is available in the Datasources dropdown. If the standalone connector terminates, it disappears from the list of available connectors. Once it is re-run, it is available again and configured connector instances will not get lost.

Enable asynchronous parsing (5.9 and later)

To separate document crawling from document parsing, enable Tika Asynchronous Parsing on remote V2 connectors.
Below is an example configuration showing how to specify the file system to index under the connector-plugins entry in your values.yaml file:
additionalVolumes:
- name: fusion-data1-pvc
    persistentVolumeClaim:
    claimName: fusion-data1-pvc
- name: fusion-data2-pvc
    persistentVolumeClaim:
    claimName: fusion-data2-pvc
additionalVolumeMounts:
- name: fusion-data1-pvc
    mountPath: "/connector/data1"
- name: fusion-data2-pvc
    mountPath: "/connector/data2"
You may also need to specify the user that is authorized to access the file system, as in this example:
securityContext:
    fsGroup: 1002100000
    runAsUser: 1002100000

Configuration

I