Name | Type | Description |
---|---|---|
projectId | String | The project name |
name | String | The asset name |
assetType | DataAssetType | One of: ● project ● table ● relation ● field ● udf ● metric |
description | String | A string describing this asset |
sourceUri | String | A URI to the data source |
owner | String | The user that owns the asset |
ownerEmail | String | The owner’s email address |
tags | Set<String> | A set of arbitrary category strings |
format | String | The format of the underlying data source |
options | List<String> | A list of options for the underlying data source. See Configuration options below for valid options. |
filters | List<String> | A set of Solr query parameters to filter the request |
sql | String | A SQL statement to execute |
cacheOnLoad | boolean | ’True’ to cache the dataset in Spark on catalog project initialization |
dependsOn | List<String> | A list of other assets to load before initializing this data asset |
createdOn | Date | The asset’s creation date, in ISO-8601 format; otherwise the current timestamp is used |
Name | Description | Default |
---|---|---|
collection | The Solr collection name. | None |
zkhost | A ZooKeeper connect string is the list of all servers and ports for the current ZooKeeper cluster. For example, if running a single-node Fusion developer deployment with embedded ZooKeeper, the connect string is fusion-host:9983/lwfusion/3.1.0/solr . If you have an external 3-node ZooKeeper cluster running on servers zk1.acme.com , zk2.acme.com , zk3.acme.com , all listening on port 2181, then the connect string is zk1.acme.com:2181,zk2.acme.com:2181,zk3.acme.com:2181 . | The connectString of the default search cluster. |
query | A Solr query that limits the rows to load into Spark. For example, to only load documents that mention “solr”: | *:* |
fields | A subset of fields to retrieve for each document in the results, such as: If you request Solr function queries, then the library must use the /select Solr handler to make the request as exporting function queries through /export is not supported by Solr. | By default, all stored fields for each document are pulled back from Solr. |
rows | The number of rows to retrieve from Solr per request; do not confuse this with | 1000 |
max_rows | The maximum number of rows; only applies when using the | None |
request_handler | Set the Solr request handler for queries. This option can be used to export results from Solr via the | /select |
splits | Enable shard splitting on default field version. Example: | False |
split_field | The field to split on can be changed using the split_field option. Example: | version |
splits_per_shard | Split the shard into evenly-sized splits using filter queries. You can also split on a string-based keyword field but it should have sufficient variance in the values to allow for creating enough splits to be useful. In other words, if your Spark cluster can handle 10 splits per shard, but there are only 3 unique values in a keyword field, then you will only get 3 splits. Keep in mind that this is only a hint to the split calculator and you may end up with a slightly different number of splits than what was requested. Example: | 20 |
flatten_multivalued | Flatten multi-valued fields from Solr. Example: | true |
dv | Fetch the docValues that are indexed but not stored by using function queries. Should be used for Solr versions lower than 5.5.0. Example: | false |
sample_seed | Read a random sample of documents from Solr using the specified seed. This option can be useful if you just need to explore the data before performing operations on the full result set. By default, if this option is provided, a 10% sample size is read from Solr, but you can use the | None |
sample_pct | The size of a random sample of documents from Solr; use a value between 0 and 1. Example: | 0.1 |
skip_non_dv | Skip all fields that are not docValues. Example: | false |