atomicUpdates
Send as Atomic Updates?
|
Send documents to Solr as atomic updates; only applies if sending directly to Solr and not an index pipeline.
type: boolean
default value: 'false '
|
cacheAfterRead
Cache After Read
|
Cache input data in memory (and disk as needed) after reading; default is false, setting to true may help stability of the job by reading all data from the input source first before transforming or writing to Solr. This could make the job run slower as it adds an intermediate write operation.
type: boolean
default value: 'false '
|
clearDatasource
Clear Existing Documents
|
If true, delete any documents indexed in Solr by previous runs of this job. Default is false.
type: boolean
default value: 'false '
|
continueAfterFailure
Continue after index failure
|
If set to true, when a failure occurs when sending a document through an index pipeline, the job will continue onto the next document instead of failing
type: boolean
default value: 'false '
|
defineFieldsUsingInputSchema
Define Fields in Solr?
|
If true, define fields in Solr using the input schema; if a SQL transform is defined, the fields to define are based on the transformed DataFrame schema instead of the input.
type: boolean
default value: 'true '
|
format
Format
required
|
Specifies the input data source format; common examples include: parquet, json, textinputformat
type: string
|
id
Spark Job ID
required
|
The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.
type: string
maxLength: 63
pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?
|
mlModelId
Spark ML PipelineModel ID
|
The ID of the Spark ML PipelineModel stored in the Fusion blob store.
type: string
blobType: model:ml-model
reference: blob
|
optimizeOutput
Optimize
|
Optimize the Solr collection down to the specified number of segments after writing to Solr.
type: integer
|
outputCollection
Output Collection
|
Solr Collection to send the documents loaded from the input data source.
type: string
|
outputIndexPipeline
Send to Index Pipeline
|
Send the documents loaded from the input data source to an index pipeline instead of going directly to Solr.
type: string
|
outputParser
Send to Parser
|
Parser to send the documents to while sending to index pipeline. (Defaults to same as index pipeline)
type: string
|
outputPartitions
Output Partitions
|
Partition the input DataFrame into partitions before writing out to Solr or Fusion
type: integer
|
path
Path
|
Path to load; for data sources that support multiple paths, separate by commas
type: string
|
readOptions
Read Options
|
Options passed to the data source to configure the read operation; options differ for every data source so refer to the documentation for more information.
type: array of object
object attributes: {
key
(required)
: {
display name: Parameter Name
type: string
}
value
: {
display name: Parameter Value
type: string
}
}
|
shellOptions
Spark Shell Options
|
Additional options to pass to the Spark shell when running this job.
type: array of object
object attributes: {
key
(required)
: {
display name: Parameter Name
type: string
}
value
: {
display name: Parameter Value
type: string
}
}
|
sparkConfig
Spark Settings
|
Spark configuration settings.
type: array of object
object attributes: {
key
(required)
: {
display name: Parameter Name
type: string
}
value
: {
display name: Parameter Value
type: string
}
}
|
streaming
Streaming
|
type: object
object attributes: {
enableStreaming
: {
display name: Enable Streaming
type: boolean
description : Stream data from input source to output Solr collection
}
outputMode
: {
display name: Output mode
type: string
default value: 'append '
description : Specifies the output mode for streaming. E.g., append (default), complete, update
enum: {
append
complete
update
}
}
}
|
templateParams
Interpreter Params
|
Bind the key/values to the script interpreter
type: array of object
object attributes: {
key
(required)
: {
display name: Parameter Name
type: string
}
value
: {
display name: Parameter Value
type: string
}
}
|
timestampFieldName
Timestamp Field Name
|
Name of the field that holds a timestamp for each document; only required if using timestamps to filter new rows from the input source.
type: string
|
transformScala
Transform Scala
|
Optional Scala script used to transform the results returned by the data source before indexing. You must define your transform script in a method with signature: def transform(inputDF: Dataset[Row]) : Dataset[Row]
type: string
|
transformSql
Transform SQL
|
Optional SQL used to transform the results returned by the data source before indexing. The input DataFrame returned from the data source will be registered as a temp table named '_input'. The Scala transform is applied before the SQL transform if both are provided, which allows you to define custom UDFs in the Scala script for use in your transformation SQL.
type: string
|
type
Spark Job Type
required
|
type: string
default value: 'parallel-bulk-loader '
enum: {
parallel-bulk-loader
}
|
writeOptions
Write Options
|
Options used when writing output. For output formats other than solr or index-pipeline, format and path options can be specified here
type: array of object
object attributes: {
key
(required)
: {
display name: Parameter Name
type: string
}
value
: {
display name: Parameter Value
type: string
}
}
|