Update your Confluence recipe
If you are using theconfluence-v1.json recipe, you must migrate to the confluence.json recipe.
You have two options for updating to the confluence.json recipe:
- Clear an existing datasource and replace the recipe. This is the recommended method.
- Create a new datasource with the new recipe.
confluence.json recipe file and edited the following values in the file:
serviceURL: Your Confluence URL- Confluence username and password or API token
collection: Your Fusion collection namepipeline: Your Fusion pipeline name
Clear an existing datasource
Clear an existing datasource
To use the same REST V2 datasource for your updated Confluence recipe:
- Navigate to Indexing > Datasources and select your existing REST V2 datasource.
- Click Clear Datasource.
- Use the Connector Datasources API to replace the datasource configuration. Replace
AUTHORIZATION_CREDENTIALSwith your Lucidworks login information in Base64:
- Return to your datasource in Fusion. Make any additional edits to your configuration and click Save.
- Click Run, and then click Start to run your datasource.
Create a new datasource
Create a new datasource
To use a new datasource for your updated Confluence recipe:
- Navigate to Indexing > Datasources.
- Click Add + and select REST (v2) to create a new datasource.
- Fill out the following required fields:
- Enter a name for the connector in the Configuration ID field.
- Enter your Fusion pipeline in the Pipeline ID field.
- Enter your Confluence URL in the Service URL field.
- Upload the recipe to Fusion with the Connector Datasources API. Replace
AUTHORIZATION_CREDENTIALSwith your Lucidworks login information in Base64:
- Return to your datasource in Fusion. Make any additional edits to your configuration and click Save.
- Click Run, and then click Start to run your datasource.
Confluence REST configuration
This documentation describes the Confluence RESTconfluence.json file configuration such as the authentication methods, data crawled and retrieved, pagination information, variables used, and endpoints used. Terminology is also provided as a reference.
The list of data the REST connector crawls using the Confluence REST configuration is:
- Spaces
- Pages such as Wiki pages
- Blog posts
- Page and blog post comments
Authentication methods
The Confluence REST configuration supports:- Basic Authentication using the username and password from an Atlassian account. For more information, see Basic auth for REST APIs.
- API Token. For information about how to create a new API token, see API Tokens.
Supported crawl options
The Confluence REST configuration supports the following crawl options:- Full crawl
- Recrawl that relies on the
strayContentDeletionparameter
Pagination information
This recipe uses pagination per request. The connector requests the next page’s URL, and Confluence returns the next page’s URL in the response under the_links.next path. When _links.next is not provided in the response, the connector has found no more pages, and the pagination stops.
The following code sample shows an example response snippet with a link to a next page:
Configure the pagination by next page URL property
Next Page URL Key: _links.next, where_links.nextis the key of the response that contains the next page URL
Configure query parameters
limit=50, where 50 is the number of items per page
Variables used
The Confluence REST configuration uses one variable.${LW_PARENT_DATA_KEY} is used to access a value of the response from the main request to use in an additional request. The Confluence use case indicates this variable can be added to the URL to execute a GET request to retrieve comments for blog posts and pages.
Endpoints to configure with the Confluence REST connector
The following table describes the Confluence REST connector endpoints. Each request is configured under the property List of Requests Configuration, or underrequestConfigurations in the Confluence recipe.
| Request type | ObjectType | Parent ObjectType | Endpoint | Query parameters | Description |
|---|---|---|---|---|---|
| Root Request | SPACE | GET /wiki/api/v2/spaces | limit=50&description-format=plain&status=current | Returns the Spaces with status=current from the Atlassian Confluence instance. | |
| Child Request | PAGE | SPACE | GET /wiki/api/v2/spaces/${LW_PARENT_DATA_KEY}/pages | limit=50&body-format=storage | Return the Pages (children) per each Space retrieved with the previous request SPACE. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘space’, which is extracted by setting the property Response Handling -> parentDataKey=id. |
| Child Request | BLOG | SPACE | GET /wiki/api/v2/spaces/${LW_PARENT_DATA_KEY}/blogposts | limit=50&body-format=storage | Return the Blogs (children) per each Space retrieved with the previous request SPACE. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘space’, which is extracted by setting the property Response Handling -> parentDataKey=id. |
| Child Request | COMMENT_FOOTER_PAGE | PAGE | GET /wiki/api/v2/pages/${LW_PARENT_DATA_KEY}/footer-comments | limit=50&body-format=storage | Return the Footer-Comments per each Page retrieved with the previous request PAGE. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘page’, which is extracted by setting the property Response Handling -> parentDataKey=id. |
| Child Request | COMMENT_REPLY_FOOTER_PAGE | COMMENT_FOOTER_PAGE | GET /wiki/api/v2/footer-comments/${LW_PARENT_DATA_KEY}/children | limit=50&body-format=storage | Return the Replies per each Footer-Comment retrieved with the previous requests COMMENT_FOOTER_PAGE. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘footer-comment’, which is extracted by setting the property Response Handling -> parentDataKey=id. This request enables the property ‘Recursive Request’. |
| Child Request | COMMENT_INLINE_PAGE | PAGE | GET /wiki/api/v2/pages/${LW_PARENT_DATA_KEY}/inline-comments | limit=50&body-format=storage | Return the inline comments per each Page retrieved with the previous request PAGE. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘page’, which is extracted by setting the property Response Handling -> parentDataKey=id. |
| Child Request | COMMENT_REPLY_INLINE_PAGE | COMMENT_INLINE_PAGE | GET /wiki/api/v2/inline-comments/${LW_PARENT_DATA_KEY}/children | limit=50&body-format=storage | Return the Replies per each inline comment retrieved with the previous request COMMENT_INLINE_PAGE. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘inline-comment’, which is extracted by setting the property Response Handling -> parentDataKey=id. This request does not need to enable the ‘Recursive Request’ |
| Child Request | COMMENT_FOOTER_BLOG | BLOG | GET /wiki/api/v2/blogposts/${LW_PARENT_DATA_KEY}/footer-comments | limit=50&body-format=storage | Return the Footer-Comments per each Blog retrieved with the previous request BLOG. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘blog’, which is extracted by setting the property Response Handling -> parentDataKey=id. |
| Child Request | COMMENT_REPLY_FOOTER_BLOG | COMMENT_FOOTER_BLOG | GET /wiki/api/v2/footer-comments/${LW_PARENT_DATA_KEY}/children | limit=50&body-format=storage | Return the Replies per each Footer-Comment retrieved with the previous requests COMMENT_FOOTER_BLOG. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘footer-comment’, which is extracted by setting the property Response Handling -> parentDataKey=id. This request enables the property ‘Recursive Request’. |
| Child Request | COMMENT_INLINE_BLOG | BLOG | GET /wiki/api/v2/blogposts/${LW_PARENT_DATA_KEY}/inline-comments | limit=50&body-format=storage | Return the inline comments per each Blog retrieved with the previous request BLOG. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘blog’, which is extracted by setting the property Response Handling -> parentDataKey=id. |
| Child Request | COMMENT_REPLY_INLINE_BLOG | COMMENT_INLINE_BLOG | GET /wiki/api/v2/inline-comments/${LW_PARENT_DATA_KEY}/children | limit=50&body-format=storage | Return the Replies per each inline comment retrieved with the previous request COMMENT_INLINE_BLOG. Internally, the variable ${LW_PARENT_DATA_KEY} is replaced with the ‘id’ of the parent ‘inline-comment’, which is extracted by setting the property Response Handling -> parentDataKey=id. This request does not need to enable the ‘Recursive Request’ |
Notes
-
The requests are linked hierarchically using the properties
ObjectTypeandParentObjectType.- This hierarchy maintains the parent-child relationships between different levels of objects. For instance, a Page is a Space-Child, a Comment is a Page-Child, and a Comment-Reply is a Comment-Child.
- When objects are indexed, the field
_lw_rest_parent_object_sskeeps the list of parents related to an object. For example, for a page, indexes_lw_rest_parent_object_ss: ["/spaces/TestSpaceName", "/spaces/TestSpace/pages/<pageId>/TestPageName"], where<pageId>is a numeric value.
-
With Confluence API v2 endpoints, different requests are needed to retrieve the footer comments and inline comments from pages and blog posts. To maintain the relationship between the comments, replies, and their parents (pages, blog posts, and spaces), there are eight different requests configurations.
- To retrieve page comments:
COMMENT_FOOTER_PAGEandCOMMENT_INLINE_PAGE. - To retrieve replies of comments:
COMMENT_REPLY_FOOTER_PAGE, andCOMMENT_REPLY_INLINE_PAGE. - To retrieve blog comments:
COMMENT_FOOTER_BLOGandCOMMENT_INLINE_BLOG. - To retrieve replies of blog comments:
COMMENT_REPLY_FOOTER_BLOG, andCOMMENT_REPLY_INLINE_BLOG. - When comments are indexed, the field contains:
_lw_rest_parent_object_ss: ["/spaces/TestSpaceName", "/spaces/TestSpace/pages/<pageId>/TestPageName", "<commentId>"]. - When replies are indexed, the field contains:
_lw_rest_parent_object_ss: ["/spaces/TestSpaceName", "/spaces/TestSpace/pages/<pageId>/TestPageName", "<commentId>", "<commentReplyId>"], where<pageId>,<commentId>and<commentReplyId>are numeric values.
- To retrieve page comments:
Response parsing configuration
Per request, configure the Response Handling property to specify how to parse the response. This field isresponseConfiguration in the JSON recipe.
The Confluence recipe does not use binary parsing.
Plugin parsing
- This parsing happens by default. The responses are parsed as a JSON object structure using JsonPath.
- Plugin parsing applies to all the requests listed in the Endpoints configuration table.
- The properties
Response Handling -> Data ID, Data Pathare configured to extract certain values from the objects parsed. - The properties
Response Handling -> Parent Data Keyare configured to extract the ‘id’ of the parent object.
Terminology
The following terms are provided as a reference.| Term | Description |
|---|---|
| List of Requests Configuration | Configure List of Requests to extract data from the REST source. Requests are linked hierarchically using the properties Parent-Child Request Link -> ObjectType and ParentObjectType. |
| Object Type | The unique name to identify the request. |
| Parent Object Type | Reference an existent Object Type. Create a parent-child hierarchy, where the current request becomes the child of the specified Parent Object Type. If blank, the current request is considered a Root-Request. |
| Root Request | The type of request-configuration to retrieve the initial parent objects. |
| Child Request | The type of request-configuration to retrieve children objects per each parent object. A child-request can be a parent of another child-request. As an example, a Footer-Comment is a child of a Page. |
| Recursive Request | When enabled, recursively performs the same ObjectType request configuration to retrieve all the nested objects under an object. This is particularly useful when the nesting depth is unknown. For example, the request ObjectType=COMMENT_REPLY_FOOTER_BLOG first retrieves only the direct replies from a comment (parent). When recursive requests are enabled, COMMENT_REPLY_FOOTER_BLOG executes recursively until no more replies are found. |
| Response Handling | The responseConfiguration defines the mapping between the response and data objects to be indexed. |
| Data Path | The path to access a specific data object within a response. For example, to access a list of elements named with key objects, the data path is objects. If not provided, the entire response body is indexed. This property accepts JsonPath expressions such as objects, objects[*], or results to extract the list of Confluence objects. |
| Data ID | The identifier key for the data objects extracted with ‘Data Path’. This value is used to build the Solr document’s ID. If not provided, a random UUID is used. This property accepts JsonPath expressions such as _links.webui to extract the unique path of a Page. |
| Parent Data Key | Required for all Child Requests. Map to a key from the parent object, whose value is used to replace the ${LW_PARENT_DATA_KEY} variable in the child request configuration (endpoint, query parameters or body). For example, /wiki/api/v2/spaces/${LW_PARENT_DATA_KEY}/blogposts |
| _lw_rest_object_type_s | All objects index this field, whose value is the ‘ObjectType’ of the request that retrieved the object. |
| _lw_rest_object_s | All objects index this field, whose value contains the objectId extracted with the property ‘Data ID’. For example, for a space, indexes _lw_rest_object_s: "/spaces/TestSpace". For a page, indexes _lw_rest_object_s: "/spaces/TestSpace/pages/<pageId>/TestPage", where <pageId> is a numeric value. |
| _lw_rest_parent_object_ss | All objects index this field, whose value is a list of the objectIds inherited from all their parents, and the objectId from the object itself. For example, for a space, indexes _lw_rest_parent_object_ss: ["/spaces/TestSpace"]. For a comment, indexes _lw_rest_parent_object_ss: ["/spaces/TestSpace", "/spaces/TestSpace/pages/<pageId>/TestPage", "<commentId>"], where <commentId> is a numeric value. |
Recipe
Replace the following values in the recipe:pipelinewith your Fusion pipelinecollectionwith your Fusion collectionidwith the name of a Fusion datasource if you want to use a different name than the one providedserviceURLwith your Confluence URLpasswordwith your Confluence password or API tokenuserwith your Confluence email address