> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Enable Tika Asynchronous Parsing

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/4/fusion-server/reference/parser-stages/enable-tika-async-parsing

[mintlify link]: https://doc.lucidworks.com/docs/4/fusion-server/reference/parser-stages/enable-tika-async-parsing

[old doc.lw link]: https://doc.lucidworks.com/fusion-server/4.2/1wdwkx

When you use the Tika Asynchronous Parser, the connector fetches and indexes a document containing only metadata. It does not index the Tika parsed metadata and Tika parsed body.

A separate Tika Asynchronous parsing job will fetch the raw file content, parse it with Tika and do a partial update on the document with both the Tika parsed metadata and Tika parsed body.

<Note>
  See [Fusion 4.x V1 Connector Downloads](/docs/fusion-connectors/downloads/fusion-4-x-connector-downloads) to access the latest versions of the connectors.
</Note>

<LwTemplate />

## Configure the index pipelines

When you use asynchronous parsing, you will need to establish two index pipelines.

## Pre-parse pipeline

The pre-parse pipeline will only fetch and index metadata since it does not have the Tika parsed body or Tika parsed metadata. Within this index pipeline, you will want to make sure you are buffering requests to Solr:

<img src="https://mintcdn.com/lucidworks/de_1M1m_4TTyJqw0/assets/images/4.2/tika-async-configure-solr-index-pipeline.png?fit=max&auto=format&n=de_1M1m_4TTyJqw0&q=85&s=3ef1dd9fd00c1a652ac38b7e4e2a1f5d" alt="Configure Solr index pipeline" width="927" height="606" data-path="assets/images/4.2/tika-async-configure-solr-index-pipeline.png" />

<Note>
  This is the default setting. If this option is not selected, select the box. Doing so will increase the indexing speed.
</Note>

## Post-parse pipeline

The Tika Asynchronous job will download, parse, and send the parsed document to a second Fusion index pipeline. This pipeline is responsible for updating the Tika body and metadata to the document created in the pre-parse pipeline.

The index pipeline **MUST** contain a Solr Partial Update stage with the following selected parameters:

<img src="https://mintcdn.com/lucidworks/de_1M1m_4TTyJqw0/assets/images/4.2/tika-async-post-parse-pipeline.png?fit=max&auto=format&n=de_1M1m_4TTyJqw0&q=85&s=80e79cbc97ca022e7c1aa9740b213bf1" alt="Tika Async post parse pipeline" width="901" height="741" data-path="assets/images/4.2/tika-async-post-parse-pipeline.png" />

The Solr Partial Update Indexer pipeline will add the Tika fields into the existing document that was added by the connector.

<Note>
  If you do not select the `Process All Pipeline Doc Fields` option, you will receive results similar to the following:

  `"body_t":"[{\"field\":\"I’m the content from JSON!\"}]"`
</Note>

## Configure the Asynchronous Parser

Next, we will create the new parser. All settings can be left on the default selections.

<img src="https://mintcdn.com/lucidworks/de_1M1m_4TTyJqw0/assets/images/4.2/tika-async-apache-parser.png?fit=max&auto=format&n=de_1M1m_4TTyJqw0&q=85&s=21d1924719a0aa35496f69a3daf97ccd" alt="Apache Tika Async Parser" width="921" height="714" data-path="assets/images/4.2/tika-async-apache-parser.png" />

## Configure the datasource

Configure the datasource to use the asynchronous parser.

<img src="https://mintcdn.com/lucidworks/de_1M1m_4TTyJqw0/assets/images/4.2/tika-async-configure-datasource.png?fit=max&auto=format&n=de_1M1m_4TTyJqw0&q=85&s=430e62db3e6105ba4b9bde8ef2abaa60" alt="Configure Tika Async datasource" width="908" height="386" data-path="assets/images/4.2/tika-async-configure-datasource.png" />

When you run the datasource job, it will display the documents indexed and will have a special async parsing metadata.

* `_lw_async_parsing_id_i`
* `_lw_async_parsing_fail_count_i`

If the connectors have an async parser, they will not download or parse the documents when running the connector job. This will be performed asynchronously by the Tika Asynchronous Parsing job.

## Configure the Tika Async Parsing job

1. Select **Jobs** under the Collections header.

   <img src="https://mintcdn.com/lucidworks/de_1M1m_4TTyJqw0/assets/images/4.2/tika-async-configure-parsing-job.png?fit=max&auto=format&n=de_1M1m_4TTyJqw0&q=85&s=5a275eb0ee915d28c061fd76c165e55d" alt="Locate jobs" width="347" height="280" data-path="assets/images/4.2/tika-async-configure-parsing-job.png" />
2. Click Jobs, and select **Add +**. Search for Tika to select the option for Tika Async Job.

   <img src="https://mintcdn.com/lucidworks/de_1M1m_4TTyJqw0/assets/images/4.2/tika-async-add-job.png?fit=max&auto=format&n=de_1M1m_4TTyJqw0&q=85&s=24cb24e60e16f91dc3c96f9d620cb983" alt="Add Tika Async job" width="324" height="187" data-path="assets/images/4.2/tika-async-add-job.png" />
3. Configure fields according to your desired specifications.

   <img src="https://mintcdn.com/lucidworks/de_1M1m_4TTyJqw0/assets/images/4.2/tika-async-configure-job.png?fit=max&auto=format&n=de_1M1m_4TTyJqw0&q=85&s=bcf54c82da40fee50f2c712dc2fc6dec" alt="Tika Async configure job" width="869" height="758" data-path="assets/images/4.2/tika-async-configure-job.png" />

Field options for the Tika Async parsing job include:

| Field                      | Notes                                                                                                                                                   |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Tika Endpoints**         | List of Tika server endpoints that are in the cluster. The endpoints will be automatically updated. You do **not** need to specify the endpoints.       |
| **Pipeline ID**            | The Pipeline ID is determined by the partial-update index pipeline you created earlier.                                                                 |
| **Find Filter Queries**    | This is an optional filter to limit the documents fetched by the Tika Async parser.                                                                     |
| **Finder batch size**      | This is the Solr `rows` batch size of the requests.                                                                                                     |
| **Max bytes returned**     | The maximum number of characters indexed.                                                                                                               |
| **Max embedded documents** | This determines the amount of embedded documents that should be parsed until additional embedded documents are ignored. Use -1 for unlimited documents. |
| **Parse timeout**          | This configures the length of time for parsing in milliseconds.                                                                                         |
| **Max Parse Failure**      | If a document fails to parse, you can configure the reattempts up to your designated amount of times.                                                   |
| **Insert Batch Max Docs**  | This controls the batch size in number of documents before submitting to the index pipeline.                                                            |
| **Insert Batch Max Bytes** | This controls the batch size number of bytes contained in the documents before submitting to the index pipeline.                                        |

4. Finally, configure a schedule for the job to run according to your designated specifications.

<img src="https://mintcdn.com/lucidworks/de_1M1m_4TTyJqw0/assets/images/4.2/tika-async-job-schedule.png?fit=max&auto=format&n=de_1M1m_4TTyJqw0&q=85&s=8df721dbb91be1569170672c4dda11e0" alt="Configure a schedule for the Tika Async job to run" width="662" height="396" data-path="assets/images/4.2/tika-async-job-schedule.png" />

<Note>
  The logs for the job will be in `$FUSION_HOME/var/log/tika-server/tika-async.log`
</Note>
