> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Asynchronous Tika Parsing

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/lucidworks-search/04-move-data-in/parsers/asynchronous-tika-parsing

[mintlify link]: https://doc.lucidworks.com/docs/lucidworks-search/04-move-data-in/parsers/asynchronous-tika-parsing

[old doc.lw link]: https://doc.lucidworks.com/managed-fusion/5.9/ve1qvp

In synchronous Tika parsing, parsing and indexing are performed concurrently. This can result in slow indexing for a large number of documents, as the parser and indexer must share resources.

Asynchronous Tika parsing, on the other hand, performs parsing in the background. This allows Lucidworks Search to continue indexing documents while the parser is processing others, resulting in improved indexing performance for large numbers of documents.

<Tip>
  **FAQ**

  How does the configuration differ from synchronous Tika parsing?

  * In Fusion 5.9.10 and earlier, asynchronous parsing does not use a Fusion parser stage. Instead, the configuration is made in the datasource and index pipeline.
  * In Fusion 5.9.11 and later, asynchronous parsing uses the [Apache Tika Container parser stage](/docs/lucidworks-search/09-developer-documentation/config-specs/parsers/apache-tika-container) in addition to configuration in the datasource and index pipeline. The parser ID set in the connector datasource configuration is used, allowing you to use any parser stage configuration except deprecated Tika stages.
</Tip>

By default, the asynchronous parsing service is deployed with a single instance, or pod. Depending on the number of documents and their size, you may consider scaling up or down the service. It is also possible to configure:

* **Cron expression:** The cron expression that controls when the task is be executed.
* **Number of datasources:** The number of datasources to select per task.
* **Number of documents:** The number of documents to select per datasource per task.
* **Task execution timeout:** The maximum amount of time that a task is allowed to run before it is terminated.

<Note>
  **Important**

  * In Fusion 5.9.10 and earlier, *your parser configuration is ignored* when using asynchronous parsing.\
    The asynchronous parsing service performs Tika parsing using Apache Tika Server. Other parsers, such as HTML and JSON, are not supported by the asynchronous parsing service.\
    Although your datasource is linked to a parser configuration, this link is ignored when asynchronous parsing is used.
  * In Fusion 5.9.11 and later, *your parser configuration is used* when using asynchronous parsing.\
    Asynchronous parsing service performs Tika parsing using [Apache Tika Container](/docs/lucidworks-search/09-developer-documentation/config-specs/parsers/apache-tika-container). Other parsers, such as HTML and JSON, are now supported by the asynchronous parsing service.\
    Your datasource is linked to a parser configuration. This link is used when asynchronous parsing is used.
</Note>

<LwTemplate />

## Requirements

Asynchronous parsing works for V2 connectors only, which use the Java SDK framework. It is enabled in the datasource configuration by toggling the **Advanced** view and turning on the **Async Parsing** option. Learn more about [V1 and V2 connectors](/docs/fusion-connectors/concepts/v1-v2-connectors).

Additionally, the index pipeline must include a [Solr Partial Update Indexer](/docs/lucidworks-search/09-developer-documentation/config-specs/index-pipeline-stages/solr-partial-update-indexer) stage. This stage replaces the Solr indexer stage, which should be removed or turned off. This is required because the connector plugin and the asynchronous parsing services generate one document each, one from the fetching process and another from the parsing process, respectively. Both documents need to be merged into a single document. The Solr Partial Update Indexer merges both documents, while the Solr indexer stage overrides documents.

For more information on asynchronous parsing setup, see **Use Tika asynchronous parsing**.

<Accordion title="Use Tika asynchronous parsing">
  This document describes how to set up your application to use Tika asynchronous parsing.

  Unlike synchronous Tika parsing, which uses a parser stage, asynchronous Tika parsing is configured in the datasource and index pipeline. For more information, see [Asynchronous Tika Parsing](/docs/lucidworks-search/04-move-data-in/parsers/asynchronous-tika-parsing).

  <Check>
    **Field names change with asynchronous Tika parsing.**

    {/* // The code sample `\_lw_*` uses a backslash to escape the underscore character to prevent italics. */}  In contrast to synchronous parsing, asynchronous Tika parsing prepends `parser_` to fields added to a document. System fields, which start with `\_lw_`, are not prepended with `parser_`.  If you are migrating to asynchronous Tika parsing, and your search application configuration relies on specific field names, update your search application to use the new fields.
  </Check>

  ## Configure the connectors datasource

  1. Navigate to your datasource.
  2. Enable the **Advanced** view.
  3. Enable the **Async Parsing** option.

       <img src="https://mintcdn.com/lucidworks/VKnUHJXP6sWH55ak/assets/images/5.8/tika-parser-migration-7.png?fit=max&auto=format&n=VKnUHJXP6sWH55ak&q=85&s=9cfa30dbec1b533642f531001c611859" alt="Enable async option" width="1965" height="1001" data-path="assets/images/5.8/tika-parser-migration-7.png" />

       <Check>
         **Lucidworks Search 5.9.11 and later uses your parser configuration when using asynchronous parsing.**

         The asynchronous parsing service performs Tika parsing using Apache Tika Server.     In Lucidworks Search 5.8 through 5.9.10, other parsers, such as HTML and JSON, are not supported by the asynchronous parsing service. By enabling asynchronous parsing, the parser configuration linked to your datasource is ignored.     In Lucidworks Search 5.9.11 and later, other parsers, such as HTML and JSON, are supported by the asynchronous parsing service. By enabling asynchronous parsing, the parser configuration linked to your datasource is used.
       </Check>
  4. Save the datasource configuration.

  ## Configure the parser stage

  <Check>You must do this step in Lucidworks Search 5.9.11 and later.</Check>

  1. Navigate to **Parsers**.
  2. Select the parser, or create a new parser.
  3. From the **Add a parser stage** menu, select **Apache Tika Container Parser**.
  4. (Optional) Enter a label for this stage. This label changes the names from Apache Tika Container Parser to the value you enter in this field.
  5. If the Apache Tika Container Parser stage is not already the first stage, drag and drop the stage to the top of the stage list so it is the first stage that runs.

  ## Configure the index pipeline

  1. Go to the **Index Pipeline** screen.
  2. Add the **Solr Partial Update Indexer** stage.
  3. Turn off the **Reject Update if Solr Document is not Present** option and turn on the **Process All Pipeline Doc Fields** option:

       <img src="https://mintcdn.com/lucidworks/VKnUHJXP6sWH55ak/assets/images/5.8/tika-parser-migration-2.png?fit=max&auto=format&n=VKnUHJXP6sWH55ak&q=85&s=19da81f65d2eec57f0f7283e210eb487" alt="Tika config setup" width="1936" height="981" data-path="assets/images/5.8/tika-parser-migration-2.png" />
  4. Include an extra update field in the stage configuration using any update type and field name. In this example, an incremental field `docs_counter_i` with an increment value of `1` is added:

       <img src="https://mintcdn.com/lucidworks/VKnUHJXP6sWH55ak/assets/images/5.8/tika-parser-migration-5.png?fit=max&auto=format&n=VKnUHJXP6sWH55ak&q=85&s=2caeca79dd016fe540d1b7388c2f85f0" alt="Tika config setup" width="1936" height="988" data-path="assets/images/5.8/tika-parser-migration-5.png" />
  5. Enable the **Allow reserved fields** option:

       <img src="https://mintcdn.com/lucidworks/VKnUHJXP6sWH55ak/assets/images/5.8/tika-parser-migration-4.png?fit=max&auto=format&n=VKnUHJXP6sWH55ak&q=85&s=cd9d61870b1d603b5880894f67d3ed48" alt="Tika config setup" width="1941" height="979" data-path="assets/images/5.8/tika-parser-migration-4.png" />
  6. Click **Save**.
  7. Turn off or remove the **Solr Indexer stage**, and move the **Solr Partial Update Indexer stage** to be the last stage in the pipeline.

       <img src="https://mintcdn.com/lucidworks/VKnUHJXP6sWH55ak/assets/images/5.8/tika-parser-migration-6.png?fit=max&auto=format&n=VKnUHJXP6sWH55ak&q=85&s=d69738f76b005b608d1ac7b948a99675" alt="Tika config setup" width="1941" height="987" data-path="assets/images/5.8/tika-parser-migration-6.png" />

  Asynchronous Tika parsing setup is now complete. Run the datasource indexing job and monitor the results.
</Accordion>
