Deprecation and removal noticeThis parser is deprecated as of Fusion 5.8.0 and is expected to be removed in a later version. Use the asynchronous Tika parsing method instead. For more information, see Asynchronous Tika Parsing.
Apache Tika is a versatile parser that supports many types of unstructured document formats, such as HTML, PDF, Microsoft Office, OpenOffice, RTF, audio, video, images, and more. A complete list of supported formats is available at Apache Tika. To perform image text extraction when Include images is enabled, install Tesseract in the server hosting Fusion. This stage is not compatible with asynchronous Tika parsing.
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.