Learning Path for DP-900 Microsoft Azure Data Fundamentals Certification
Learning path to gain necessary skills and to clear the Azure Data Fundamentals Certification. This certification is intended for candidates beginning to wor...
This tutorial covers basic concepts of linear regression. I will explain the process of creating a model right from hypothesis function to gradient descent algorithm. We will also use plots for better visualization of inner workings of the model. At the end we will test our model using single variable training data.
Linear regression is one of the most basic machine learning model. Its like a ‘hello world’ program of the machine learning. Linear regression is used when there is linear relationship between input variables and output variables. That means we can calculate the values of output variables by using some kind of linear combination of input variables. For example house prices are directly proportional to the house size, no of bedrooms, location etc. There is linear relationship between house prices and factors affecting it.
If there is only one input variable then we call it ‘Single Variable Linear Regression’ or ‘Univariate Linear Regression’. And in case of more than one input variables we call it ‘Multi Variable Linear Regression’ or ‘Multivariate Linear Regression’. In this tutorial we will work on univariate linear regression only. Linear regression is ‘Supervised Learning Algorithm’ and mainly used to predict real valued output like house prices.
Every machine learning model actually generalize the relationship between input variables and output variables. In case of linear regression since relationship is linear, this generalization can be represented by simple line function. Let’s consider the below example, input values are plotted on X axis and output values are plotted on Y axis.
Since there are only few data point we can easily eyeball it and draw the best fit line, which will generalize the relationship between input and output variables for us.
Since this line generalizes the relationship between input and output values for any prediction on given input value, we can simply plot it on a line and Y coordinate for that point will give us the prediction value.
So our objective is to find the best fit line which will generalize the given training data for future predictions.
Linear model hypothesis function is nothing but line function only.
Equation of line is
y = mx + b
where
We are just going to use different notation to write it. I am using same notation and example data used in Andrew Ng’s Machine Learning course
h(θ, x) = θ_0 + θ_1 * x_1
where
We already have input variables (x) with us if we can find θ_1 or m and θ_0 or b then we will get best fit line.
Since we have our hypothesis function we are now one step closer to our objective of finding best fit line. But how to find optimum values of theta parameters?
It is obvious that to find the optimum values of theta parameters we have to try multiple values and then choose the best possible values based on the fit of the line. To do this we will create a cost function (J). Inner working of cost function is as below
Instead of plain subtraction(predicted value - target value) we will use below form of more sophisticated cost function, also called as ‘Square Error Function’
Cost function is a function of theta parameters. For simplicity if we plot the cost function against values of θ_1 then we get the ‘Convex function’
At the bottom of the curve we get the minimum value of cost for given value of θ_1. This process of trying different values theta to get minimum cost values is called as ‘Minimizing The Cost’.
Now we are one more step closure to our objective. Only the last part of puzzle is remaining. How many theta values should we try and how to change those values?
This is heart of our model, gradient descent algorithm will help us find optimum values of theta parameters. Inner working of gradient descent algorithm is as below,
In order to change the value of theta it’s important to know whether to increase or decrease its value and by how much margin. Remember our cost function is a convex function and our objective is to go to its bottom. Partial derivative of the cost function will give us the slope at that point.
Consider below examples of positive and negatives slopes,
In case of positive slope we have to decrease the value of θ_1 to get minimum cost value.
In case of negative slope we have to increase the value of θ_1 to get minimum cost value.
So with the help of slope, we can decide whether to increase or decrease the theta value, but to control the magnitude of change we are going to use ‘Learning Parameter Alpha (α)’.
So the final formula to change the theta value is as below,
θ_0 = θ_0 - alpha * partial derivative of cost function w.r.t θ_0
θ_1 = θ_1 - alpha * partial derivative of cost function w.r.t θ_1
After replacing the value of partial derivative of cost function, our formula to get theta values will looks like,
Since at every step of gradient descent we are calculating the cost using all the training example, it is also called as ‘Batch Gradient Descent’ algorithm
Enough of theory, now lets implement gradient descent algorithm using Python and create our linear model
In case you don’t have any experience using these libraries, don’t worry I will explain every bit of code for better understanding
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/satishgunjal/datasets/master/univariate_profits_and_populations_from_the_cities.csv')
df.head() # To get first n rows from the dataset default value of n is 5
population | profit | |
---|---|---|
0 | 6.1101 | 17.5920 |
1 | 5.5277 | 9.1302 |
2 | 8.5186 | 13.6620 |
3 | 7.0032 | 11.8540 |
4 | 5.8598 | 6.8233 |
X = df.values[:, 0] # get input values from first column
y = df.values[:, 1] # get output values from second column
m = len(y) # Number of training examples
print('X = ', X[: 5]) # Show only first 5 records
print('y = ', y[: 5])
print('m = ', m)
X = [6.1101 5.5277 8.5186 7.0032 5.8598]
y = [17.592 9.1302 13.662 11.854 6.8233]
m = 97
plt.scatter(X,y, color='red',marker= '+')
plt.grid()
plt.rcParams["figure.figsize"] = (10,6)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.title('Scatter plot of training data')
Lets create X and theta matrix using available values. Dimension of X matrix is (2 x 1)
Lets create θ matrix using available values. Dimension of θ matrix is (2 x 1)
If we want to do Xθ then no of columns of matrix X should match with no of rows of matrix θ. So let’s add column of ones to matrix X to accommodate the θ_0 intercept term
Since dimension of X matrix is now (2 x 2), we can perform the multiplication
Product of Xθ will be a vector or 1D array,
#Lets create a matrix with single column of ones
X_0 = np.ones((m, 1))
X_0[:5]
array([[1.],
[1.],
[1.],
[1.],
[1.]])
# Using reshape function convert X 1D array to 2D array of dimension 97x1
X_1 = X.reshape(m, 1)
X_1[:5]
array([[6.1101],
[5.5277],
[8.5186],
[7.0032],
[5.8598]])
# Lets use hstack() function from numpy to stack X_0 and X_1 horizontally (i.e. column wise) to make a single 2D array.
# This will be our final X matrix (feature matrix)
X = np.hstack((X_0, X_1))
X[:5]
array([[1. , 6.1101],
[1. , 5.5277],
[1. , 8.5186],
[1. , 7.0032],
[1. , 5.8598]])
Remember to start with we need to initialize the theta parameter with random values. Lets initialize them with 0 values
theta = np.zeros(2)
theta
array([0., 0.])
def compute_cost(X, y, theta):
"""
Compute cost for linear regression.
Input Parameters
----------------
X : 2D array where each row represent the training example and each column represent the feature ndarray. Dimension(m x n)
m= number of training examples
n= number of features (including X_0 column of ones)
y : 1D array of labels/target value for each traing example. dimension(1 x m)
theta : 1D array of fitting parameters or weights. Dimension (1 x n)
Output Parameters
-----------------
J : Scalar value.
"""
predictions = X.dot(theta)
#print('predictions= ', predictions[:5])
errors = np.subtract(predictions, y)
#print('errors= ', errors[:5])
sqrErrors = np.square(errors)
#print('sqrErrors= ', sqrErrors[:5])
J = 1 / (2 * m) * np.sum(sqrErrors)
return J
# Lets compute the cost for theta values
cost = compute_cost(X, y, theta)
print('The cost for given values of theta_0 and theta_1 =', cost)
The cost for given values of theta_0 and theta_1 = 32.072733877455676
def gradient_descent(X, y, theta, alpha, iterations):
"""
Compute cost for linear regression.
Input Parameters
----------------
X : 2D array where each row represent the training example and each column represent the feature ndarray. Dimension(m x n)
m= number of training examples
n= number of features (including X_0 column of ones)
y : 1D array of labels/target value for each traing example. dimension(m x 1)
theta : 1D array of fitting parameters or weights. Dimension (1 x n)
alpha : Learning rate. Scalar value
iterations: No of iterations. Scalar value.
Output Parameters
-----------------
theta : Final Value. 1D array of fitting parameters or weights. Dimension (1 x n)
cost_history: Conatins value of cost for each iteration. 1D array. Dimansion(m x 1)
"""
cost_history = np.zeros(iterations)
for i in range(iterations):
predictions = X.dot(theta)
#print('predictions= ', predictions[:5])
errors = np.subtract(predictions, y)
#print('errors= ', errors[:5])
sum_delta = (alpha / m) * X.transpose().dot(errors);
#print('sum_delta= ', sum_delta[:5])
theta = theta - sum_delta;
cost_history[i] = compute_cost(X, y, theta)
return theta, cost_history
Lets update the gradient descent learning parameters alpha and no of iterations
theta = [0., 0.]
iterations = 1500;
alpha = 0.01;
theta, cost_history = gradient_descent(X, y, theta, alpha, iterations)
print('Final value of theta =', theta)
print('cost_history =', cost_history)
Final value of theta = [-3.63029144 1.16636235]
cost_history = [6.73719046 5.93159357 5.90115471 ... 4.48343473 4.48341145 4.48338826]
# Since X is list of list (feature matrix) lets take values of column of index 1 only
plt.scatter(X[:,1], y, color='red', marker= '+', label= 'Training Data')
plt.plot(X[:,1],X.dot(theta), color='green', label='Linear Regression')
plt.rcParams["figure.figsize"] = (10,6)
plt.grid()
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.title('Linear Regression Fit')
plt.legend()
plt.plot(range(1, iterations + 1),cost_history, color='blue')
plt.rcParams["figure.figsize"] = (10,6)
plt.grid()
plt.xlabel('Number of iterations')
plt.ylabel('Cost (J)')
plt.title('Convergence of gradient descent')
We can predict the result using our model as below
predict1 = np.array([1, 3.5]).dot(theta)
print("For population = 35,000, our prediction of profit is", predict1 * 10000)
predict2 = np.array([1, 7]).dot(theta)
print("For population = 70,000, our prediction of profit is", predict2 * 10000)
For population = 35,000, our prediction of profit is 4519.7678677017675
For population = 70,000, our prediction of profit is 45342.45012944714
This concludes our univariate linear regression. But in real life profit of food truck also depends on lots of many other factors. We can use the same algorithm implemented above to perform linear regression when there are multiple factors affecting output value. In next tutorial I will explain the multivariate linear regression.
Learning path to gain necessary skills and to clear the Azure Data Fundamentals Certification. This certification is intended for candidates beginning to wor...
Learning path to gain necessary skills and to clear the Azure AI Fundamentals Certification. This certification is intended for candidates with both technica...
In this guide we are going to create and train the neural network model to classify the clothing images. We will use TensorFlow deep learning framework along...
In short NLP is an AI technique used to do text analysis. Whenever we have lots of text data to analyze we can use NLP. Apart from text analysis, NLP also us...
There are multiple ways to split the data for model training and testing, in this article we are going to cover K Fold and Stratified K Fold cross validation...
K-Means clustering is most commonly used unsupervised learning algorithm to find groups in unlabeled data. Here K represents the number of groups or clusters...
Any data recorded with some fixed interval of time is called as time series data. This fixed interval can be hourly, daily, monthly or yearly. Objective of t...
Support vector machines is one of the most powerful ‘Black Box’ machine learning algorithm. It belongs to the family of supervised learning algorithm. Used t...
Random forest is supervised learning algorithm and can be used to solve classification and regression problems. Unlike decision tree random forest fits multi...
Decision tree explained using classification and regression example. The objective of decision tree is to split the data in such a way that at the end we hav...
This tutorial covers basic Agile principles and use of Scrum framework in software development projects.
Main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on un...
In this study we are going to use the Linear Model from Sklearn library to perform Multi class Logistic Regression. We are going to use handwritten digit’s d...
In this tutorial we are going to use the Logistic Model from Sklearn library. We are also going to use the same test data used in Logistic Regression From Sc...
This tutorial covers basic concepts of logistic regression. I will explain the process of creating a model right from hypothesis function to algorithm. We wi...
In this tutorial we are going to study about train, test data split. We will use sklearn library to do the data split.
In this tutorial we are going to study about One Hot Encoding. We will also use pandas and sklearn libraries to convert categorical data into numeric data.
In this tutorial we are going to use the Linear Models from Sklearn library. Scikit-learn is one of the most popular open source machine learning library for...
In this tutorial we are going to use the Linear Models from Sklearn library. Scikit-learn is one of the most popular open source machine learning library for...
In this tutorial we are going to cover linear regression with multiple input variables. We are going to use same model that we have created in Univariate Lin...
This tutorial covers basic concepts of linear regression. I will explain the process of creating a model right from hypothesis function to gradient descent a...
In this tutorial we will see the brief introduction of Machine Learning and preferred learning plan for beginners