RecipesREST V2 connector
Recipes are preconfigured datasource configurations that can be loaded into Fusion for quicker testing and setup of connectors. Parameter values are already entered into recipes for quick population of the required fields. Recipes are created in a JSON format and contain all of the parameters required to get a connector up and running. You can configure a connector using a recipe by issuing a POST request with the JSON as the body. Any minor adjustments required can be made directly in the JSON and sent to Fusion, where it will show in the UI. After the recipe has been sent to Fusion, you can also go into the Fusion UI and make changes.
A selection of datasources with existing recipes in the public REST V2 connector GitHub repository are listed and described in these pages.
You can view the details directly at the repository or read it on this site. The pages include an overview of configuring the connector for a specific recipe, the example JSON recipe, and any additional information that can be helpful when using the REST connector.
This guide lists and describes some of the parameters you might see in recipes contained in the REST V2 public GitHub repository.
JSON guide
REST V2 recipes are JSON files appended to cURL calls that can be sent to Fusion where their contents show up in the UI. From there, Fusion can use the APIs created by external software companies to crawl the content stored in those datasources. The way the JSON is set up allows the REST V2 connector to work with multiple different products, attesting to its flexibility. This article breaks down the different parts of a recipe’s JSON file and how it can be used when creating a Fusion datasource. The structure and parameters included will vary, but you can use this guide as a reference for information that has been used in some of the recipes.
A parent request, also referred to as a root request, is set using the path for an endpoint. Child requests drill further down into an endpoint and can be looped through to add more information to the documents in the index. This is useful when the content being crawled has additional objects that are not picked up by the parent request alone, for example, comments on pages.
A cURL call to the REST V2 connector has headers to use within the request definition that include content type and authorization to Fusion. Calls for the REST V2 connector go through the Connector Datasources API.
Here is an example call. The headers in this call are used to define the content type as JSON and to enter your Lucidworks login details.
Replace:
-
FUSION_HOST:FUSION_PORT
with your Fusion address. -
AUTHORIZATION_CREDENTIALS
with your Lucidworks login information in Base64. -
JSON_RECIPE
with the preconfigured recipe obtained from GitHub, making sure to update any placeholders found in that recipe.
curl --location --request POST 'https://FUSION_HOST:FUSION_PORT/api/apps/APP_NAME/connectors/datasources' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic AUTHORIZATION_CREDENTIALS=' \
--data-raw '{JSON_RECIPE}'
Calls to the REST V2 connector use the JSON-formatted recipe as the body definition. The JSON request for a REST V2 recipe is generally divided into sections that include general parameters, properties, and app settings.
After Fusion receives the recipe, the connector shows under the list of datasources by going to Indexing > Datasources.
General parameters
The JSON recipe contains connector information with the general parameters used to handle the service. This table contains a selection of parameters that you might see.
Parameter | Description |
---|---|
|
The parser ID is the name of the parser as set up in the Fusion UI under Indexing > Parsers. |
|
This populates the Configuration ID for the connector in the Fusion UI. You can name this whatever you want as long as the name does not already exist as another Configuration ID in your Fusion instance. |
Properties
The properties section of the JSON file contains the bulk of the information being sent and includes the API base URL, authentication mode, service endpoints, HTTP method, query parameters, pagination settings, and any loop configurations.
API base URL
Parameter | Description |
---|---|
|
REST API base URL for the external service from where the data is extracted. Endpoints for the API call are added in the service endpoints section described below. Be sure to add additional levels of security for any content you do not want indexed, otherwise the connector will include all of the content it finds through that URL. |
Authentication mode
The authentication mode in the JSON body request contains the way to authenticate using the API. This will be the login information for the external service. For example, if indexing content from Confluence, you would use this section to include your Confluence login details.
Parameter | Description |
---|---|
|
Uses password and user properties for authentication. Depending on the API used to connect to resources outside of Lucidworks, it may require an API Key to authenticate. In this case, enter the username and replace the password with the API Key. |
|
Allows for fetching an authentication token used to authorize the request for the service endpoints crawl. See how to authenticate using OAuth. |
Service endpoints and list of requests configuration
The service endpoints section (serviceEndpoints
), known as list of requests configuration in some recipes (requestConfigurations
), specifies the API endpoint paths appended to the base URL (serviceURL
) used to crawl a datasource. Query parameters will not work if added directly in the endpoints, so be sure to include any query parameters using queryKey
and queryValue
fields, as the query fields are mapped to Solr using a queryKey
and populated with results from queryValue
.
Parent (root) requests
Parent requests target an API endpoint to crawl content. These endpoints are higher in the structure than child requests (described below), which are used to crawl objects embedded at a deeper level to add more content to a document being indexed.
Parameter | Description |
---|---|
|
The endpoint to append to the cURL location base URL path, for example |
|
HTTP method to use for the request, for example |
|
The name of the field as it will appear in the Solr documents in the index. |
|
The name of the field being queried in the datasource. Used in Solr documents to populate the value of the field entered in |
Pagination
Pagination has two options: pagination by next page URL and pagination by batch size. For pagination by next page URL, the URL that starts the next page is sent by the request. For pagination by batch size, you can configure pagination in the query parameters by indicating the start number of the index and the batch size.
Parameter | Description |
---|---|
|
Key that contains the |
|
Number of objects to retrieve per page, for example |
|
Index from where to start pagination. The default is |
Root response mapping
Root response mapping is used to separate the parent objects being crawled into individual Solr documents by assigning each document a unique ID. The data obtained from child requests is added to the same document by association with this parent ID.
This is also the area where you can choose to index content other than text by enabling binaryResponse
. For attachments, ensure Send as Binary Response is enabled. If it is not, then no attachments are received and indexed. When enabled, the connector looks for MIME type other than .json for attachments to index. For a JSON response, ensure Send as Binary Response is not enabled.
Parameter | Description |
---|---|
|
Name of the field in the data objects extracted with |
|
The name of a specific data object from a datasource that is returned within a response. For example, in order to extract a list of elements named |
|
Set to |
Loops using child requests
Loops, also known as child requests, contain an array of queries to extract more information from a datasource for the documents being indexed. The loop will iterate over the data request for each parent ID and associate the response with the parent. This is useful in cases where the parent endpoint has additional endpoints that can be appended for data contained further down within the endpoint path. Loops perform a separate request for each data object.
The REST V2 connector supports hierarchical discovery, meaning when content is located, that content is recursively checked to see if it has additional information associated with it for the child request and will continue collecting information for each request until no more content is located. For example, if the connector is crawling for comments and attachments, it will check each of those items for any comments and attachments connected to them. If any are found it will check for comments and attachments associated with those, and continue until all relevant content is collected. This is also useful in cases where the connector is searching through folders with multiple levels of subfolders.
Parameter | Description |
---|---|
|
The API endpoint to append to the cURL location base URL path, for example |
|
HTTP method to use for the request. GET and POST are supported. |
|
Contains the array of queries to use within the request definition, each with a |
|
The name of the field as it will appear in the Solr documents in the index. |
|
The name of the field being queried in the datasource. Used in Solr documents to populate the value of the field entered in |
Child response mapping
The child responses are mapped to the parent through the dataId
and dataPath
in the root response mapping described earlier through the use of a parentIdKey
. The parentIdKey
should match the dataId
in the root response mapping.
For example, with root response mapping:
"rootResponseMapping" : {
"dataId" : "id",
"dataPath" : "results"}
The child response mapping would be:
"childResponseMapping" : {
"parentIdKey" : "id"}
Other mappings
Additional mapping configures the data objects. Recipes do not necessarily include all parameters described here.
Parameter | Description |
---|---|
|
This creates the ID Key in the Data Object Mapping section of the UI as the Solr document ID. Fill this property when Destination Key is empty. If neither |
|
The key from the data object entry. This value is used to perform the additional requests. It is mapped to the variable |
|
The key to access the data objects in the response. If not set, the response is assumed to be the whole response body. |
|
The key used to store the additional data objects in the main data objects. If not set, the additional data objects will be indexed as individual Solr documents. |
App settings
The rest of the JSON can include settings for the collection name, pipeline name, and connector type.
Parameter | Description |
---|---|
|
The name of the app used in the Fusion UI. This must match the name of the app, or the connector will not show up in datasources. |
|
The name for the Pipeline ID used. For example, |
|
The value for the connector in the Datasources API. For the REST V2 connector, this will be |
Advanced settings in the UI
Within Fusion, opening the datasource and enabling the Advanced toggle displays optional settings to be applied. Under Core Properties > Fetch Settings you can modify the settings to help control the speed at which the connector crawls the source. For example, increasing Fetch Threads might increase the crawl speed. Setting timeout limits can be useful to end a crawl when something is causing the crawl to get hung up.
How to get a recipe into Fusion
This section shows how to get a recipe from GitHub into Fusion. Recipes are JSON files used as a quick method to create a Fusion datasource.
-
Open the REST V2 connector public GitHub repository.
-
Locate the recipe you want and open the file.
-
Copy the JSON.
-
Add the JSON as the body to a call to the Connector Datasources API.
An example cURL call looks like this:
curl --location --request POST 'https://FUSION_HOST:FUSION_PORT/api/apps/APP_NAME/connectors/datasources' \ --header 'Content-Type: application/json' \ --header 'Authorization: Basic AUTHORIZATION_CREDENTIALS=' \ --data-raw '{JSON_RECIPE}'
-
In the above, replace:
-
FUSION_HOST:FUSION_PORT
with the URL of your Fusion instance. -
APP_NAME
with the name of the app you are using in Fusion. -
AUTHORIZATION_CREDENTIALS
with your Lucidworks login information in Base64. -
JSON_RECIPE
with the recipe you copied from GitHub.
-
-
Change the following in the JSON:
-
id
: This populates the Configuration ID for the connector in the Fusion UI and sets the name of the datasource. You can keep the default or choose a different name to fit your needs. -
serviceURL
: API base URL from where the data is extracted. Change this to match your own datasource URL. -
password
anduser
: Change these values to your login information. -
collection
: This must match the name of the Fusion app that you are using. -
Any other values marked for replacement, such as
parserId
orpipeline
.
-
-
Once values are changed, send the API request.
-
Log into Fusion in a web browser and open the app associated with the request.
-
Go to Indexing > Datsources and select the REST V2 connector in the list with the
id
as the name of the datasource. -
Make any additional changes within the UI.
-
If indexing content other than text, for example images and attachments, select Send as Binary Response.
-
Save the datasouce.
-
After it saves, you can click Run > Start to begin indexing.