Applied Statistics Part-1-Maths behind Simple Linear Regression

4 min readMar 19, 2019

First, let’s understand where we can use Regression. In machine learning, we have supervised learning where we have labelled data and we need to find a model which trains on the labelled training data and then predicts the response for unseen data. This model used for supervised learning can be regression model.

Training/Observed Data-

Objective —find a function or its approximation (Generative vs. Predictive models) such that when we provide a new X observation in f, we can predict output Y.

In the additive error model we know, Observed Data = True Value + Noise, so Y-i = f(x-i,1; …..; x-i,p) + ϵ-i for i = 1,….,n, where the error ϵ-i’s are iid with mean 0 and are independent of X’s.

Now, let’s start with understanding types of regression -

Simple Regression — When we have one predictor/explanatory/independent variable(X) and one response/target variable(Y).

Multiple Regression — When we have many predictor variables(X) and one target variable(Y).

Multivariate Regression — When we have many predictor variables(X) as well as many target variables(Y).

Simple linear regression equation

Assumptions -

Linear regression assumes that E(Y/x) = β0+β1*x, linear in x and β0 and β1.
Linear regression also assumes that ϵ-i from the data {y-i, x-i} are iid, with E(ϵ-i) = 0, and var(ϵ-i) = σ², and ϵ-i are also independent of Y and X.

Our simple linear regression equation that is used to predict Y containing estimated values of β0 and β1 is —

Equation -1

In linear regression we have two kinds of error,

Model error which is deviation of observed target variable value and unknown true regression line value, ϵ-i = Y-i — E(Y-i).
Residual error which is deviation of observed value and estimated regression line, e-i = Y-i — Y-ihat. When we sum up the squared residual errors of each training data(i), we get Residual of Sum Squares(RSS) or Squared Sum of Errors (SSE).

RSS = SSE = Σ(Y-i — Y-ihat)² = Σ(e-i)² for i = 1,…,n

Least Square Estimation and Point Estimation

Now to find out Y-ihat from linear regression equation, we need the estimated values of β0 and β1. We choose β0 and β1 such that it minimizes the RSS and these are called (ordinary) least square estimators. We choose OLS estimators because when the errors ϵ-i’s are iid N(0, σ²), it is Maximum Likelihood Estimation (MLE) of β0 and β1. Moreover, they are also Best Linear Unbiased Estimators(BLUE), where best means having minimum variance and unbiased means: E(parameter-estimate) = parameter.

for i = 1,…,n

The estimate for σ²/variance will be Mean Squared Error(MSE) = SSE/ degrees of freedom. Here degrees of freedom is n-2 as 2 are used already to estimate β0 and β1,

σ²-hat = MSE = SSE/n-2 =Σ(e-i)²/n-2 for i = 1,…,n

The point estimation of Y is Yhat = β0hat + β1hat*x-new using equation -1.

Inference on β1 and β0

β1 and β0 are normally distributed and so will be (β1-β1hat)/σ(β1).

When we replace σ(β1) by its estimate s.e(β1), we call (β1-β1hat)/s(β1) as studentized statistic.

Both the studentized statistic shown below follows t-distribution with n-2 degrees of freedom.

Above are the initial basics in applied statistics, I will be sharing more concepts regarding similar topics in Part-2.(Confidence Intervals,Tests and Prediction Interval)

Applied Statistics Part-1-Maths behind Simple Linear Regression

Written by Shikha Saxena