Job configuration specifications
id
field in the configuration file. Required field.
trainingCollection
field in the configuration file. Required field.outputCollection
field in the configuration file. Required field.solr
, parquet
, and orc
. Required field.fieldToVectorize
field in the configuration file. Required field.clusterIdField
field in the configuration file. Required field.freqTermField
field in the configuration file. Optional field.clusterLabelField
field in the configuration file. Optional field.<1.0
indicate a percentage, 1.0
is 100 percent, and >1.0
indicates the exact number. This is the maxDF
field in the configuration file. Optional field.<1.0
indicate a percentage, 1.0
is 100 percent, and >1.0
indicates the exact number. This is the minDF
field in the configuration file. Optional field.numKeywordsPerLabel
field in the configuration file. Optional field.analyzerConfig
field in the configuration file. Optional field.SELECT * from spark_input
registers the input data as spark_input
. This is the sparkSQL
field in the configuration file.solr
and parquet
. This is the dataOutputFormat
field in the configuration file.partitionCols
field in the configuration file.parameter name:parameter value
options to use when reading input from Solr or other sources. This is the readOptions
field in the configuration file.
parameter name:parameter value
options to use when writing output to Solr or other sources. This is the writeOptions
field in the configuration file.
trainingDataFrameConfigOptions
field in the configuration file.trainingDataSamplingFraction
field in the configuration file.randomSeed
field in the configuration file.sourceFields
field in the configuration file.Spark Job ID
is used. This is the modelId
field in the configuration file.