The Index Workbench is a powerful tool that combines key aspects of the the data indexing configuration process into one user-friendly panel. It guides the user through the workflow for configuring datasources, parsers and index pipelines.
To set up a new datasource, you have several options. You can paste in a website URL for Fusion to crawl, or you can navigate to a locally saved file through the File Finder dialog. This quicker setup for commonly used Web and Local Filesystem datasources saves steps in the process and helps newer users get started quickly. Alternatively, the Datasource selection dropdown allows you to quickly select and navigate to any configurable datasource option that Fusion has. If there are existing datasources that you have already configured and saved in the Collection, they can be quickly accessed from this pane as well.
In Fusion 3.0, Parsers have been introduced as their own configurable component of the indexing workflow. They allow greater flexibility and specificity when parsing inbound data.
In Fusion 3.0, parsers have been introduced as their own configurable component of the indexing workflow. They allow greater flexibility and specificity when parsing inbound data.
A parser consists of an ordered list of parser stages that is completely customizable. The same parser stage can be added to a given configuration multiple times if the different specified settings within those stages best suits the parsing of the data. There is no limit to the number of parser stages that can be included in a parser, and the order in which they run is also completely flexible. In a parser, after all of the doctype-specific parser stages have run, the Tika and Fallback stages are useful catch-all stages that can attempt to parse anything that has not yet been matched. Tika is used for parsing many types of unstructured documents like PDFs, DOCX, and many more. If all of the other stages in the parser fail to completely parse the data, the Fallback stage can copy the raw bytes directly to Solr.
An Index Pipeline transforms incoming data into a document suitable for indexing by Solr via a series of modularized operations called stages. Fusion provides a variety of specialized index stages to index data effectively. Stages can be selected, configured, and enabled or disabled in the Index Pipeline section of the Index Workbench.
Once you finish configuring a datasource using the Index Workbench, you can move on to setting up queries using the Query Workbench, which provides a similar workflow for configuring and previewing search results.