Introduction to Linear Regression: A Machine Learning Approach

Supervised Learning is a form of learning in which we use known data with actual outputs from past experiences to model a relationship and this model is used to predict future outcomes. The known data used to build up the model is called 'training data'.

To build a supervised learning model we need,

Training Data
Hypothesis
Cost Function
Optimization method for Minimization

Linear Regression is a supervised learning algorithm which is used to predict continuous valued outputs, for instance prediction of prices. To get started first I'll introduce the notation used.

\(\ x \) - input/feature/independent variables
\(\ X \) - matrix containing input/feature data set
\(\ y \) - output/target/dependent variable
\(\ (x^{(i)},y^{(i)}) \) - i^th training example
\(\ m \) - number of training samples
\(\ \theta \) - vector containing parameters/weights
\(\ h(x) \) - Hypothesis function
\(\ J(\theta) \) - Cost function

Linear Regression with One variable

This the simplest form of Linear Regression where we have only one feature variable. The general form of this model can be interpreted as

\(\ h(x) = \theta_0+ \theta_1x \)

And i^th data point can be interpreted as

\(\ h(x^{(i)}) = \theta_0+ \theta_1x^{(i)}+ \epsilon^{(i)}\)

where \(\ \epsilon^{(i)}\) is the error term for the i^thdata point. Our goal is to find \(\ \theta_0 \) and \(\ \theta_1\) such that \(\ \epsilon \) is minimum.
The graphs below show the visualizations of a sample of training data and the curve learned after running the learning algorithm to find \(\ \theta_0 \) and \(\ \theta_1\) corresponding to minimum total cost or error.

Linear Regression with Multiple Variables

Basically linear regression with multiple variables is no different from linear regression with one variable other than the fact that there are two or more feature variables to be dealt with. But from here on we're going change the representation of the hypothesis which will be important thought out the study of machine learning. That is the use of Vectorization.

Typical form: \(\ h(x) = \theta_0+ \theta_1x_1+ \theta_2x_2+....+ \theta_nx_n \)

Vectorized form: \(\ h(x) = \theta^TX \)

Here \(\ X \) and \(\ \theta \) are represented as vectors or column matrices.(For convenience \(\ x_0 \) which is equal to 1 is added)

\(\ X=\begin{bmatrix}x_0 \\x_1 \\x_2 \\. \\. \\x_n \end{bmatrix} \) and \(\ \theta = \begin{bmatrix}\theta_0 \\\theta_1 \\\theta_2 \\. \\. \\\theta_n \end{bmatrix} \)

Same as before our goal is to find the \(\ \theta \) or in this case the \(\ \theta \) vector. In terms of visualization and the multi-dimensional nature it is obvious that we can't plot graphs which includes all of the feature variables except for the cases where the dimension\(\ \leq 3 \). But as an alternative we can plot feature variables with each other and examine the Correlation ie. the linear relationship between variables and understand the data set.

NEXT: Polynomial Regression

_ | 1 | 2 | 3 | 4 | next >>

Pages

Kasun's Views