Lucidworks
  • Fusion Platform

    Highly scalable search engine and NoSQL datastore.

    Learn more ›

    Superior relevancy with machine learning, and artificial intelligence.

    Learn more ›

    Quickly create bespoke data applications for web and mobile.

    Learn more ›

  • AI-Powered Search
    • Augmented Intelligence
    • Machine Learning
    • Clustering
    • Query Analysis
    • Signals
    • Faceted Search
    • Indexing
    • Hyper-Personalization
  • Enterprise Solutions
    • Digital Workplace
    • Digital Commerce
    • Search Engine Replacement
    • Government
  • Resources
  • Blog
  • Try It Now
  • Search
CONTACT US

    Products

    • Fusion Server 5.0
    • Fusion AI 5.0
    • Predictive Merchandiser
    • Managed Search
    • Fusion Server 4.2
    • Fusion AI 4.2
    • Fusion App Studio 4.2
    • Fusion Server 4.1
    • Fusion AI 4.1
    • Fusion App Studio 4.1
    • Fusion Server 4.0
    • Fusion AI 4.0
    • Fusion App Studio 4.0
    • Lucidworks Cloud
    • Fusion 3.1
    • Fusion 3.0

    • LucidWorks HDP Search
    • LWS
    • Solr Modules

    Overview

    User Guide

    • Getting Started
      • Fusion Concepts
        • Collections
        • Components
          • Fusion UI
          • UI Service
          • Connectors
          • Parsers
          • API Services
          • Solr
          • Spark
          • ZooKeeper
          • Fusion Scripts
    • Installation and Upgrades
      • System Requirements
      • Deployment Types
      • New Installation
        • Install Fusion on a Single Node
        • Install a Fusion Cluster (Introduction)
        • Install a Fusion Cluster (Unix)
        • Install a Fusion Cluster (Windows)
        • Integrate Fusion with an Existing Solr Deployment
        • Troubleshoot Installation
      • Upgrade Fusion
        • Upgrading 1.2 to 2.4
        • Upgrading 2.1 to 2.4
        • Upgrade 2.1.4 or 2.4 to 3.0
        • Upgrade 3.0.x to 3.1.y
        • Upgrade 3.1.x to 3.1.y
        • Troubleshoot Upgrades
      • Adding or Moving Fusion Nodes
      • Start and Stop Fusion
      • Fusion Scripts
      • Directories and Logs
      • Configuration Files
      • Default Ports
      • Checking System State
    • Fusion Workflow
    • Object Explorer
    • Search
      • Datasources
        • Index Workbench
        • Index Pipelines
          • Fusion PipelineDocument Objects
          • Index Pipeline Stages
          • Index Profiles
          • Entity Extraction
          • Blob Storage
          • Time-Based Partitioning
          • Custom JavaScript Indexing Pipeline Stages
        • Other Ingestion Methods
          • Importing via Pig
          • Importing via Hive
          • Pushing Documents to a Pipeline
      • Query Pipelines
        • Query Workbench
        • Fusion Query Request Objects
        • Fusion Query Response Objects
        • Faceting
        • Query Pipeline Stages
        • Query Profiles
        • Using Query Pipelines with SolrJ
        • Custom JavaScript Search Pipeline Stages
        • Query Language Cheat Sheet
      • Signals and Aggregations
        • Signals
        • Aggregations
          • Creating Aggregations
          • Aggregator Functions
          • Aggregator Scripting
      • Recommendations and Boosting
        • Content-based Filtering
        • Collaborative Filtering
        • Similarities
        • Multiple Recommenders
        • Step by Step
      • Search Applications
        • Autocomplete
        • DateTime Processing
        • Stopwords Files
        • Synonym Files
    • Analytics
      • About Dashboards
      • Use Dashboards
      • Create Dashboards
      • Input Panels
      • Display Panels
      • Manage Dashboards
    • Dev Ops
      • System Metrics
      • Messaging and Alerting
      • System Usage Monitor
      • Fusion Scripts
      • Directories and Logs
      • Default Ports
      • Checking System State
      • Migrating Fusion data
      • Blob Storage
      • Jobs and Schedules
        • Datasource Jobs
        • Spark Jobs
        • Tasks
      • Security
        • Users
        • Roles
        • Permissions
        • Security Realms
        • Configuring Fusion for SSO
        • Configuring Fusion for LDAP
        • Configuring Fusion for Kerberos
        • Configuring Fusion for SAML
        • User Access Request Params
        • Configuring Fusion for SSL
        • Web Authentication Cookie FAQs
        • Secure Communication with a SolrCloud Cluster
    • Spark and Machine Learning
      • Spark Concepts and Terminology
      • Spark Getting Started
      • Spark Driver Processes
      • Spark Configuration
      • Scaling Spark Aggregations
      • Spark Troubleshooting
      • Machine Learning Models in Fusion
    • Tutorials
      • Getting Started with Fusion
        • Part 1 - Getting Data In
        • Part 2 - Getting Data Out
        • Part 3 - Superior Relevancy
      • From Ingest to Search
        • Part 1 - Default Indexing
        • Part 2 - Better Indexing
        • Part 3 - Better Search
    • Resources and Support

    Reference Manual

    • Datasources and Connectors
      • Web
      • Database
        • Couchbase
        • JDBC
        • MongoDB
      • Filesystem
        • Local Filesystem
        • Box.com
        • Dropbox
        • File Upload
        • FTP
        • Google Drive
        • HDFS
        • S3
        • SolrXML
        • Windows Share
      • Hadoop Cluster
        • Apache Hadoop 2
        • Cloudera
        • Hortonworks
        • MapR
      • JavaScript
      • Push to a Pipeline
      • Repository
        • Alfresco
        • Azure
        • Drupal 7.x
        • GitHub
        • JIRA
        • Salesforce
        • ServiceNow
        • SharePoint
        • SharePoint Online
        • Solr
        • Subversion
        • Zendesk
      • Social Media
        • Jive
        • Slack
        • Twitter Search (deprecated)
        • Twitter Stream (deprecated)
    • Parsers
      • Apache Tika Parser
      • Archive Parser
      • CSV Parser
      • Fallback Parser
      • HTML Parser
      • JSON Parser
      • Text Parser
      • XML Parser
    • Pipeline Stages
      • Index Pipeline Stages
        • Apache Tika Parser
        • Call Pipeline
        • Date Parsing
        • Detect Language
        • Detect Sentences
        • Exclude Documents
        • Exclusion Filter
        • Field Mapping
        • Field Parser
        • Filter Short Fields
        • Find and Replace
        • Format Signals
        • Gazetteer Lookup Extraction
        • Include Documents
        • Javascript
        • JDBC
        • Logging
        • Machine Learning
        • OpenNLP NER Extraction
        • Regex Field Extraction
        • Regex Field Filter
        • Regex Field Replacement
        • Resolve Multivalued Fields
        • REST Query
        • Send PagerDuty Message
        • Send Slack Message
        • Send SMTP Email
        • Set Property
        • Solr Dynamic Field Mapping
        • Solr Indexer
        • Solr Partial Update Indexer
        • Tag Part-of-Speech
        • Update Experiment
        • Write Log Message
        • XML Transformation
      • Query Pipeline Stages
        • Active Directory Security Trimming
        • Additional Query Parameters
        • Analytics Catalog
        • Block Documents
        • Boost Documents
        • Boost With Signals
        • Call Pipeline
        • Experiment Query Parameters
        • Facets
        • JavaScript
        • JDBC Lookup
        • Landing Pages
        • Logging
        • Machine Learning
        • Parameterized Boosting
        • Parameterized Faceting
        • Query Fields
        • Recommend Items For Item
        • Recommend Items For User
        • Recommend More Like This
        • REST Query
        • Retrieve Stored Parameters
        • Return Query Parameters
        • Rollup Aggregation
        • Security Trimming
        • Send PagerDuty Message
        • Send SMTP Email
        • Send Slack Message
        • Solr Query
        • Solr Subquery
        • Write Log Message
    • REST API Reference
      • Authentication and Authorization APIs
        • Realms API
        • Roles API
        • Sessions API
        • User API
      • Blob Store API
      • Catalog API
      • Collections API
      • Collection Features API
      • Connector APIs
        • Connector Datasources API
        • Connector History API
        • Connector JDBC API
        • Connector Jobs API
        • Connector Plugins API
        • Connector Status API
        • Connectors Crawl Database API
      • Experiments API (experimental)
      • Groups API
      • Index Pipelines API
      • Index Stages API
      • Index Profiles API
      • Jobs API
      • Links API
      • Messaging API
      • Objects API
      • Parsers API
      • Query Pipelines API
      • Query Profiles API
      • Query Stages API
      • Recommendations API
      • Scheduler API
      • Search Cluster API
      • Signals Aggregator API
      • Signals API
      • Solr API
      • SolrAdmin API
      • Solr Configuration API
      • Spark Jobs API
      • Stopwords API
      • Synonyms API
      • Synonyms Editor API
      • System Admin APIs
        • Configurations API
        • History API
        • Nodes API
        • System API
        • Usage API
      • Tasks API
      • Taxonomy API
      • ZooKeeper Import/Export API

    Release Notes

    Index Pipeline Stages

    • Document transformation
    • Document filtering and enrichment
    • Field transformation
    • Natural language processing
    • Indexing
    • Troubleshooting
    • Advanced

    Index Pipeline stages are used to create and modify PipelineDocument objects. Use the Index Workbench to configure stages in a pipeline and preview the results.

    See these reference topics for details about each index pipeline stage:

    Document transformation

    • Apache Tika Parser

    • CSV Parsing

    • HTML Transformation

    • JSON Parsing

    • XML Transformation

    Document filtering and enrichment

    • Detect Language

    • Exclude Documents

    • Format Signals

    • Include Documents

    • JDBC Lookup

    • REST Query

    Field transformation

    • Date Parsing

    • Field Mapping

    • Filter Short Fields

    • Find and Replace

    • Regex Field Extraction

    • Regex Field Filter

    • Regex Field Replacement

    • Resolve Multivalued Fields

    • Solr Dynamic Field Name Mapping

    Natural language processing

    • Detect Sentences

    • Gazetteer Lookup Extraction

    • OpenNLP NER Extraction

    • Tag Part-of-Speech

    Indexing

    • Solr Indexer

    • Solr Partial Update Indexer

    Troubleshooting

    • Logging

    • Send PagerDuty Message

    • Send SMTP Email

    • Send Slack Message

    • Write Log Message

    Advanced

    • Call Pipeline

    • Exclusion Filter

    • Javascript

    • Machine Learning

    • Set Property

    • Update Experiment

    Lucidworks
    • Company
    • Press
    • Careers
    • Services
    • Partners
    • Blog
    • Labs
    • Resources
    • Events
    • Documentation
    • Annual Conference
    • Fusion Platform
    • AI-Powered Search
    • Enterprise Solutions
    • Contact

    235 MONTGOMERY ST
    SUITE 500
    SAN FRANCISCO, CA 94104

    (415) 329-6515

    Follow Us

    TRY IT NOW

    © 2019 Lucidworks

    • Legal Agreements
    • Privacy Policy
    • Sitemap