Skip to main content

Documentation Index

Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt

Use this file to discover all available pages before exploring further.

Apache Tika is a versatile parser that supports many types of unstructured document formats, such as HTML, PDF, Microsoft Office documents, OpenOffice, RTF, audio, video, images, and more. A complete list of supported formats is available at http://tika.apache.org/. To perform image text extraction when Include images is enabled, Tesseract should be installed in the server hosting Fusion.
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.