How To
Documentation
    Learn More

      Regression Diagnostics (regress)

      The regress function computes diagnostics for bi-variate linear regression. The regress function takes three parameters:

      1. The numeric field of the independent variable (x).

      2. The numeric field of the dependent variable (y).

      3. The sample size of the regression.

      Sample syntax

      select regress(petal_length_d, sepal_length_d, 150) as regress_sig,
             regress_rsquared,
             regress_r,
             regress_slope
      from iris

      Result set

      The result set for the regress function has one record that contains the selected regression diagnostics. The regress function returns the statistical significance of the regression analysis. The following regression diagnostics can be selected as well:

      • regress_slope (slope)

      • regress_intercept (y-intercept)

      • regress_rsquared (R Squared)

      • regress_r (correlation coefficient)

      • regress_mse (mean square error)

      • regess_sse (sum square error)

      • regress_ssr (sum square due to regression)

      • regress_ssto (total sum of squares)

      Sample regress result in Apache Zeppelin

      Sample result

      Visualization

      Sample visualization of the regress function using Apache Zeppelin’s Number visualization.

      Sample visualization