Correlation matrices can be computed using the
corr_matrix function. The
corr_matrix function takes two parameters:
A string, enclosed in single quotes, containing a comma-separated list of numeric fields for which to calculate the matrix.
The sample size to compute the correlation matrix from.
select corr_matrix('petal_length_d, petal_width_d, sepal_length_d, sepal_width_d', 150) as corr, matrix_x, matrix_y from iris
The result set for the
corr_matrix function contains one row for each two field combination listed in the first parameter. The
corr_matrix function returns the correlation for the two field combination. There are two additional fields,
matrix_y that contain the field combination for the row.
The example below shows the
corr_matrix result visualized in Apache Zeppelin with a heat map.