Learning Path for DP-900 Microsoft Azure Data Fundamentals Certification
Learning path to gain necessary skills and to clear the Azure Data Fundamentals Certification. This certification is intended for candidates beginning to wor...
In this tutorial we are going to cover linear regression with multiple input variables. We are going to use same model that we have created in Univariate Linear Regression tutorial. I would recommend to read Univariate Linear Regression tutorial first. We will define the hypothesis function with multiple variables and use gradient descent algorithm. We will also use plots for better visualization of inner workings of the model. At the end we will test our model using training data.
In case of multivariate linear regression output value is dependent on multiple input values. The relationship between input values, format of different input values and range of input values plays important role in linear model creation and prediction. I am using same notation and example data used in Andrew Ng’s Machine Learning course
Our hypothesis function for univariate linear regression was
h(x) = theta_0 + theta_1*x_1
where x_1 is only input value
For multiple input value, hypothesis function will look like,
h(x) = theta_0 + theta_1 * x_1 + theta_2 * x_2 .....theat_n * x_n
where x_1, x_2...x_n are multiple input values
If we consider the house price example then the factors affecting its price like house size, no of bedrooms, location etc are nothing but input variables of above hypothesis function.
Our cost function remains same as used in Univariate linear regression
For more details about cost function please refer ‘Create Cost Function’ section of Univariate Linear Regression
Gradient descent algorithm function format remains same as used in Univariate linear regression. But here we have to do it for all the theta values(no of theta values = no of features + 1).
For more details about gradient descent algorithm please refer ‘Gradient Descent Algorithm’ section of Univariate Linear Regression
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/satishgunjal/datasets/master/multivariate_housing_prices_in_portlans_oregon.csv')
df.head() # To get first n rows from the dataset default value of n is 5
size(in square feet) | number of bedrooms | price | |
---|---|---|---|
0 | 2104 | 3 | 399900 |
1 | 1600 | 3 | 329900 |
2 | 2400 | 3 | 369000 |
3 | 1416 | 2 | 232000 |
4 | 3000 | 4 | 539900 |
X = df.values[:, 0:2] # get input values from first two columns
y = df.values[:, 2] # get output values from last coulmn
m = len(y) # Number of training examples
print('Total no of training examples (m) = %s \n' %(m))
# Show only first 5 records
for i in range(5):
print('x =', X[i, ], ', y =', y[i])
Total no of training examples (m) = 47
x = [2104 3] , y = 399900
x = [1600 3] , y = 329900
x = [2400 3] , y = 369000
x = [1416 2] , y = 232000
x = [3000 4] , y = 539900
Feature Scaling: In feature scaling we divide the input value by range(max - min) of input variable. By this technique we get new range of just 1.
x1 = x1 / s1
where,
x1 = input variable
s1 = range
Mean Normalization: In mean normalization we subtract the average value from the input variable and then divide it by range(max - min) or by standard deviation of input variable.
x1 = (x1 - mu1)/s1
where,
x1 = input variable
mu1 = average value
s1 = range or standard deviation
Lets create function to normalize the value of input variable
def feature_normalize(X):
"""
Normalizes the features(input variables) in X.
Parameters
----------
X : n dimensional array (matrix), shape (n_samples, n_features)
Features(input varibale) to be normalized.
Returns
-------
X_norm : n dimensional array (matrix), shape (n_samples, n_features)
A normalized version of X.
mu : n dimensional array (matrix), shape (n_features,)
The mean value.
sigma : n dimensional array (matrix), shape (n_features,)
The standard deviation.
"""
#Note here we need mean of indivdual column here, hence axis = 0
mu = np.mean(X, axis = 0)
# Notice the parameter ddof (Delta Degrees of Freedom) value is 1
sigma = np.std(X, axis= 0, ddof = 1) # Standard deviation (can also use range)
X_norm = (X - mu)/sigma
return X_norm, mu, sigma
X, mu, sigma = feature_normalize(X)
print('mu= ', mu)
print('sigma= ', sigma)
print('X_norm= ', X[:5])
mu= [2000.68085106 3.17021277]
sigma= [7.94702354e+02 7.60981887e-01]
X_norm= [[ 0.13000987 -0.22367519]
[-0.50418984 -0.22367519]
[ 0.50247636 -0.22367519]
[-0.73572306 -1.53776691]
[ 1.25747602 1.09041654]]
Note: New mean or avearage value of normalized X feature is 0
mu_testing = np.mean(X, axis = 0) # mean
mu_testing
array([3.77948264e-17, 2.74603035e-16])
Note: New range or standard deviation of normalized X feature is 1
sigma_testing = np.std(X, axis = 0, ddof = 1) # mean
sigma_testing
array([1., 1.])
# Lets use hstack() function from numpy to add column of ones to X feature
# This will be our final X matrix (feature matrix)
X = np.hstack((np.ones((m,1)), X))
X[:5]
array([[ 1. , 0.13000987, -0.22367519],
[ 1. , -0.50418984, -0.22367519],
[ 1. , 0.50247636, -0.22367519],
[ 1. , -0.73572306, -1.53776691],
[ 1. , 1.25747602, 1.09041654]])
Note above, we have added column of ones to X so final dimension of X is mxn i.e 97x3
def compute_cost(X, y, theta):
"""
Compute the cost of a particular choice of theta for linear regression.
Input Parameters
----------------
X : 2D array where each row represent the training example and each column represent the feature ndarray. Dimension(m x n)
m= number of training examples
n= number of features (including X_0 column of ones)
y : 1D array of labels/target value for each traing example. dimension(1 x m)
theta : 1D array of fitting parameters or weights. Dimension (1 x n)
Output Parameters
-----------------
J : Scalar value.
"""
predictions = X.dot(theta)
#print('predictions= ', predictions[:5])
errors = np.subtract(predictions, y)
#print('errors= ', errors[:5])
sqrErrors = np.square(errors)
#print('sqrErrors= ', sqrErrors[:5])
#J = 1 / (2 * m) * np.sum(sqrErrors)
# OR
# We can merge 'square' and 'sum' into one by taking the transpose of matrix 'errors' and taking dot product with itself
# If your confuse about this try to do this with few values for better understanding
J = 1/(2 * m) * errors.T.dot(errors)
return J
def gradient_descent(X, y, theta, alpha, iterations):
"""
Compute cost for linear regression.
Input Parameters
----------------
X : 2D array where each row represent the training example and each column represent the feature ndarray. Dimension(m x n)
m= number of training examples
n= number of features (including X_0 column of ones)
y : 1D array of labels/target value for each traing example. dimension(m x 1)
theta : 1D array of fitting parameters or weights. Dimension (1 x n)
alpha : Learning rate. Scalar value
iterations: No of iterations. Scalar value.
Output Parameters
-----------------
theta : Final Value. 1D array of fitting parameters or weights. Dimension (1 x n)
cost_history: Conatins value of cost for each iteration. 1D array. Dimansion(m x 1)
"""
cost_history = np.zeros(iterations)
for i in range(iterations):
predictions = X.dot(theta)
#print('predictions= ', predictions[:5])
errors = np.subtract(predictions, y)
#print('errors= ', errors[:5])
sum_delta = (alpha / m) * X.transpose().dot(errors);
#print('sum_delta= ', sum_delta[:5])
theta = theta - sum_delta;
cost_history[i] = compute_cost(X, y, theta)
return theta, cost_history
Lets update the gradient descent learning parameters alpha and no of iterations
# We need theta parameter for every input variable. since we have three input variable including X_0 (column of ones)
theta = np.zeros(3)
iterations = 400;
alpha = 0.15;
theta, cost_history = gradient_descent(X, y, theta, alpha, iterations)
print('Final value of theta =', theta)
print('First 5 values from cost_history =', cost_history[:5])
print('Last 5 values from cost_history =', cost_history[-5 :])
Final value of theta = [340412.65957447 110631.0502787 -6649.47427067]
First 5 values from cost_history = [4.76541088e+10 3.48804679e+10 2.57542477e+10 1.92146908e+10
1.45159772e+10]
Last 5 values from cost_history = [2.04328005e+09 2.04328005e+09 2.04328005e+09 2.04328005e+09
2.04328005e+09]
Note: ‘cost_history’ contains the values of cost for every step of gradient descent, and its value should decrease for every step of gradient descent
import matplotlib.pyplot as plt
plt.plot(range(1, iterations +1), cost_history, color ='blue')
plt.rcParams["figure.figsize"] = (10,6)
plt.grid()
plt.xlabel("Number of iterations")
plt.ylabel("cost (J)")
plt.title("Convergence of gradient descent")
Note that curve flattened out at around 20th iteration and ramains same affter that. This is the indication of convergence of gradient descent
iterations = 400;
theta = np.zeros(3)
alpha = 0.005;
theta_1, cost_history_1 = gradient_descent(X, y, theta, alpha, iterations)
alpha = 0.01;
theta_2, cost_history_2 = gradient_descent(X, y, theta, alpha, iterations)
alpha = 0.02;
theta_3, cost_history_3 = gradient_descent(X, y, theta, alpha, iterations)
alpha = 0.03;
theta_4, cost_history_4 = gradient_descent(X, y, theta, alpha, iterations)
alpha = 0.15;
theta_5, cost_history_5 = gradient_descent(X, y, theta, alpha, iterations)
plt.plot(range(1, iterations +1), cost_history_1, color ='purple', label = 'alpha = 0.005')
plt.plot(range(1, iterations +1), cost_history_2, color ='red', label = 'alpha = 0.01')
plt.plot(range(1, iterations +1), cost_history_3, color ='green', label = 'alpha = 0.02')
plt.plot(range(1, iterations +1), cost_history_4, color ='yellow', label = 'alpha = 0.03')
plt.plot(range(1, iterations +1), cost_history_5, color ='blue', label = 'alpha = 0.15')
plt.rcParams["figure.figsize"] = (10,6)
plt.grid()
plt.xlabel("Number of iterations")
plt.ylabel("cost (J)")
plt.title("Effect of Learning Rate On Convergence of Gradient Descent")
plt.legend()
iterations = 100;
theta = np.zeros(3)
alpha = 1.32;
theta_6, cost_history_6 = gradient_descent(X, y, theta, alpha, iterations)
plt.plot(range(1, iterations +1), cost_history_6, color ='brown')
plt.rcParams["figure.figsize"] = (10,6)
plt.grid()
plt.xlabel("Number of iterations")
plt.ylabel("cost (J)")
plt.title("Effect of Large Learning Rate On Convergence of Gradient Descent")
Question: Estimate the price of a 1650 sq-ft, 3-bedroom house
normalize_test_data = ((np.array([1650, 3]) - mu) / sigma)
normalize_test_data = np.hstack((np.ones(1), normalize_test_data))
price = normalize_test_data.dot(theta)
print('Predicted price of a 1650 sq-ft, 3 br house:', price)
Predicted price of a 1650 sq-ft, 3 br house: 293081.46433492796
This concludes our multivariate linear regression. Now we know how to perform the feature normalization and linear regression when there are multiple input variables. In next tutorial we will use scikit-learn linear model to perform the linear regression.
Learning path to gain necessary skills and to clear the Azure Data Fundamentals Certification. This certification is intended for candidates beginning to wor...
Learning path to gain necessary skills and to clear the Azure AI Fundamentals Certification. This certification is intended for candidates with both technica...
In this guide we are going to create and train the neural network model to classify the clothing images. We will use TensorFlow deep learning framework along...
In short NLP is an AI technique used to do text analysis. Whenever we have lots of text data to analyze we can use NLP. Apart from text analysis, NLP also us...
There are multiple ways to split the data for model training and testing, in this article we are going to cover K Fold and Stratified K Fold cross validation...
K-Means clustering is most commonly used unsupervised learning algorithm to find groups in unlabeled data. Here K represents the number of groups or clusters...
Any data recorded with some fixed interval of time is called as time series data. This fixed interval can be hourly, daily, monthly or yearly. Objective of t...
Support vector machines is one of the most powerful ‘Black Box’ machine learning algorithm. It belongs to the family of supervised learning algorithm. Used t...
Random forest is supervised learning algorithm and can be used to solve classification and regression problems. Unlike decision tree random forest fits multi...
Decision tree explained using classification and regression example. The objective of decision tree is to split the data in such a way that at the end we hav...
This tutorial covers basic Agile principles and use of Scrum framework in software development projects.
Main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on un...
In this study we are going to use the Linear Model from Sklearn library to perform Multi class Logistic Regression. We are going to use handwritten digit’s d...
In this tutorial we are going to use the Logistic Model from Sklearn library. We are also going to use the same test data used in Logistic Regression From Sc...
This tutorial covers basic concepts of logistic regression. I will explain the process of creating a model right from hypothesis function to algorithm. We wi...
In this tutorial we are going to study about train, test data split. We will use sklearn library to do the data split.
In this tutorial we are going to study about One Hot Encoding. We will also use pandas and sklearn libraries to convert categorical data into numeric data.
In this tutorial we are going to use the Linear Models from Sklearn library. Scikit-learn is one of the most popular open source machine learning library for...
In this tutorial we are going to use the Linear Models from Sklearn library. Scikit-learn is one of the most popular open source machine learning library for...
In this tutorial we are going to cover linear regression with multiple input variables. We are going to use same model that we have created in Univariate Lin...
This tutorial covers basic concepts of linear regression. I will explain the process of creating a model right from hypothesis function to gradient descent a...
In this tutorial we will see the brief introduction of Machine Learning and preferred learning plan for beginners