Catalog API

The Fusion Catalog is a collection of one or more analytics projects, and each project is a collection of data assets, such as tables or relations. Fusion comes with a built-in project called "fusion".

The Fusion Catalog API provides access to assets by data analysis applications that can perform SQL or Solr queries. It includes endpoints for finding, retrieving, and manipulating projects and assets using basic keyword and metadata-driven search.

By default, non-admin Fusion users do not have access to Catalog objects. However, the Catalog API itself does not enforce any permissions, so a user who bypasses the auth proxy has full access to all projects and assets. An admin can grant permissions to Catalog endpoints for users; see Access Control.

For PUT and POST requests, these are valid JSON body attributes:

Name Type Description

projectId

String

The project name

name

String

The asset name

assetType

DataAssetType

One of: + * project * table * relation * field * udf * metric

description

String

A string describing this asset

sourceUri

String

A URI to the data source

owner

String

The user that owns the asset

ownerEmail

String

The owner’s email address

tags

Set<String>

A set of arbitrary category strings

format

String

The format of the underlying data source

options

List<String>

A list of options for the underlying data source

filters

List<String>

A set of Solr query parameters to filter the request

sql

String

A SQL statement to execute

cacheOnLoad

boolean

'True' to cache the dataset in Spark on catalog project initialization

dependsOn

List<String>

A list of other assets to load before initializing this data asset

createdOn

Date

The asset’s creation date, in ISO-8601 format; otherwise the current timestamp is used

Examples

Define a "movielens" project:
FUSION=localhost:8765
curl -XPOST -H "Content-type:application/json"\
 -d '{
  "name": "movielens",
  "assetType": "project",
  "description": "tables and views for the movielens project",
  "tags": ["movies","users"],
  "cacheOnLoad": false
}' "http://$FUSION/api/v1/catalog"
Add a "ratings" table to the "movielens" project:
curl -XPOST -H "Content-type:application/json" -d '{
  "name": "ratings",
  "assetType": "table",
  "projectId": "movielens",
  "description": "movie ratings data",
  "tags": ["movies"],
  "format": "solr",
  "cacheOnLoad": true,
  "options": ["collection -> movielens_ratings", "fields -> user_id,movie_id,rating,rating_timestamp"]
}' "http://$FUSION/api/v1/catalog/movielens/assets"
Issue a SQL statement against the "ratings" table:
curl -XPOST -H "Content-type:application/json" -d '{
  "name": "ratings",
  "assetType": "table",
  "projectId": "movielens",
  "description": "movie ratings data",
  "tags": ["movies"],
  "format": "solr",
  "cacheOnLoad": true,
  "options": ["collection -> movielens_ratings", "fields -> user_id,movie_id,rating,rating_timestamp"]
}' "http://$FUSION/api/v1/catalog/movielens/query"
Issue a SQL query against the "movielens" project:
curl -XPOST -H "Content-Type:application/json" -d '{
"sql":"SELECT m.title as title, solr.aggCount as aggCount FROM movies m INNER JOIN (SELECT movie_id, COUNT(*) as aggCount FROM ratings WHERE rating >= 4 GROUP BY movie_id ORDER BY aggCount desc LIMIT 10) as solr ON solr.movie_id = m.movie_id ORDER BY aggCount DESC"
}' http://localhost:8765/api/v1/catalog/movielens/query
Load a catalog table from a Postgres database:
curl -XPOST -H "Content-type:application/json" -d '{
 "projectId": "nyc_taxi",
 "assetType": "table",
 "name": "trips",
 "sourceUri": "http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml",
 "owner": "Joe Example",
 "ownerEmail": "examplejoe@gmail.com",
 "description": "The NYC taxi trip data stored in Postgres using tools provided by https://github.com/toddwschneider/nyc-taxi-data",
 "tags": ["nyc", "taxi", "postgres", "trips"],
 "format": "jdbc",
 "cacheOnLoad": true,
 "options": ["url -> ${nyc_taxi_jdbc_url}","dbtable -> trips","partitionColumn -> id","numPartitions -> 4","lowerBound -> 0", "upperBound -> $MAX(id)", "fetchSize -> 1000"],
 "filters": ["pickup_latitude >= -90 AND pickup_latitude <= 90 AND pickup_longitude >= -180 AND pickup_longitude <= 180", "dropoff_latitude >= -90 AND dropoff_latitude <= 90 AND dropoff_longitude >= -180 AND dropoff_longitude <= 180"],
 "sql": "SELECT id,cab_type_id,vendor_id,pickup_datetime,dropoff_datetime,store_and_fwd_flag,rate_code_id,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type, concat_ws(',',pickup_latitude,pickup_longitude) as pickup, concat_ws(',',dropoff_latitude,dropoff_longitude) as dropoff FROM trips"
}' "http://$FUSION/api/v1/catalog/nyc_taxi/assets"
Create a data asset using a streaming expression:
curl -XPOST -H "Content-type:application/json" -d '{
  "name": "movie_ratings",
  "assetType": "table",
  "projectId": "movielens",
  "description": "movie ratings data",
  "tags": ["movies"],
  "format": "solr",
  "cacheOnLoad": true,  "options": ["collection -> movielens_ratings", "expr -> hashJoin(search(movielens_ratings,q=\"*:*\",fl=\"movie_id,user_id,rating\",sort=\"movie_id asc\",qt=\"\/export\",partitionKeys=\"movie_id\"),hashed=search(movielens_movies,q=\"*:*\",fl=\"movie_id,title\",sort=\"movie_id asc\",qt=\"\/export\",partitionKeys=\"movie_id\"),on=\"movie_id\")"]
}' "http://$FUSION/api/v1/catalog/movielens/assets"
Send a Solr query:
curl -XPOST -H "Content-Type:application/json" -d '{
  "solr":"*:*",
  "requestHandler":"/select",
  "collection":"movielens_movies",
  "params":{
    "facet":"on",
    "facet.field":"genre",
    "rows":0
  }
}' http://localhost:8765/api/v1/catalog/movielens/query
Send a Solr query using a streaming expression:
curl -XPOST -H "Content-Type:application/json" --data-binary @streaming_join.json http://localhost:8765/api/v1/catalog/movielens/query

{
  "solr":"hashJoin(search(movielens_ratings, q=*:*, qt=\"/export\", fl=\"user_id,movie_id,rating\", sort=\"movie_id asc\", partitionKeys=\"movie_id\"), hashed=search(movielens_movies, q=*:*, fl=\"movie_id,title\", qt=\"/export\", sort=\"movie_id asc\",partitionKeys=\"movie_id\"),on=\"movie_id\")",
  "collection":"movielens_ratings",
  "requestHandler":"/stream"
}