Compatible with Fusion version: 4.0.0 through 5.12.0
The connector retrieves data from the Atlassian Confluence Wiki CMS by connecting to Confluence Cloud or Confluence Server/Data Center. You can configure this datasource to crawl pages, spaces, blog posts, comments, and attachments.
ImportantThe Atlassian v1 API used for this connector will be removed by Atlassian on December 2, 2024. At that time, this connector will no longer function. Instead, use the Confluence recipe with the REST V2 connector, which works with the Atlassian v2 API.
The Fusion Confluence V1 connector supports Confluence Server versions 5.5 and later and Confluence Cloud. Connector flow

Prerequisites

Perform these prerequisites to ensure the connector can reliably access, crawl, and index your data. Proper setup helps avoid configuration or permission errors, so use the following guidelines to keep your content available for discovery and search in Fusion.
  • The user account in Confluence must be set up.
    • Grant read access to the user account for any spaces and pages being crawled.
    • If you want to crawl attachments, then grant read access to the user account for attachments.
    • If you are indexing ACLs for security trimming, the user account must have the ability to query Users and Groups APIs.

Authentication

Setting up the correct authentication according to your organization’s data governance policies helps keep sensitive data secure while allowing authorized indexing. The methods of authenticating are basic authentication, NTLM authentication for Windows-based enterprise networks with Active Directory, and request authentication for OAuth or a personal access token.

Basic authentication

The authentication options for the Confluence V1 connector in Lucidworks Fusion depend on whether you’re using Confluence Cloud or Confluence Server/Data Center. For Confluence Server/Data Center, you can use a username and password, unless it’s disabled by your organization’s policies. Confluence Cloud does not support password-based login. Instead, use the request authentication method with an API token.

NTLM authentication for Windows/Active Directory

Gather credentials with read access to the Confluence pages and any attachments or APIs you want the connector to crawl. Enter the following in Fusion:
  • Your AD account username as Confluence Username.
  • Your AD account password as Confluence Password or API Token.
  • Your Windows domain as Domain (NTLM auth only).

Request authentication

Request authentication is a flexible method that can use a Bearer token, API key, or OAuth token, depending on your Confluence setup. For Confluence Cloud, go to Atlassian API tokens and generate a new token. After entering your credentials in Fusion, save and test the connection. Fusion should return “Success” or a detailed error such as 401, invalid token.

Common Issues

If you encounter any of the following problems, take the suggested actions to try and resolve them:
  • 401 Unauthorized: Check your token/credentials and ensure your user account has proper access.
  • Token works in browser but not Fusion: Verify HTTPS is used and ensure no firewall blocks Fusion from reaching Confluence.
  • “User does not have permission” error: Ensure the user account has read access to the spaces, pages, and attachments.

Confluence Connector’s security trimming

Why do some field names have different numbers? After crawling some test Confluence content, the Solr index has ACL fields such as acl_users_0_s and acl_groups_0_ss, but the field names can have different numbers. For example, some documents have acl_users_1_s or acl_users_6_s. This is due to the strange way that Confluence handles user and group viewing permissions. Each of these fields represents an ancestor of the item’s security. If a user does not match EACH level of permissions, the user cannot see the document and the doc will be filtered out. You will see three fields that are used during security trimming:
  • ancestorCount_i stores the number of ancestors this item has
  • acl_users_i_s stores the users allowed to see this item at ancestor number i
  • acl_groups_i_s stores the groups allowed to see this item at ancestor number i
Users/groups that want to see a document in Confluence are processed ancestor-by-ancestor linearly. During security trimming, you will give the filter a queryUser and we return the Confluence documents this user can access. The Confluence security trimming algorithm does the following:
  1. Calculate the maximum ancestorCount_i of all documents in the index (max(ancestorCount_i)).
  2. Query Confluence for the Confluence Security Groups that queryUser is part of.
  3. Then for i from [0 to max(ancestorCount_i)], append an AND clause for the security filter to match against each ancestor level for the acl_users_i_s and acl_groups_1_s fields:
    (acl_users_i_s:_lw_confluence_anonymous_ OR acl_users_i_s:queryUser OR acl_group_i_s:group1 OR acl_group_i_s:group2 ... )
For example:
queryUser = ndipiazza
groupsUserIsIn = EngGroup, NorthAmericaGroup
max(ancestorCount_i) = 3
Then the filter would be:
(acl_users_0_s:lw_confluence_anonymous OR acl_users_0_s:ndipiazza OR acl_group_0_s:EngGroup OR acl_group_0_s:NorthAmericaGroup) AND(acl_users_1_s:lw_confluence_anonymous OR acl_users_1_s:ndipiazza OR acl_group_1_s:EngGroup OR acl_group_1_s:NorthAmericaGroup) AND(acl_users_2_s:lw_confluence_anonymous OR acl_users_2_s:ndipiazza OR acl_group_2_s:EngGroup OR acl_group_2_s:NorthAmericaGroup)
As you see, because these are AND’d together, if the user does not match EACH level of permissions, the user cannot see the document and the doc will be filtered out.

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.