Learning Path for DP-900 Microsoft Azure Data Fundamentals Certification
Learning path to gain necessary skills and to clear the Azure Data Fundamentals Certification. This certification is intended for candidates beginning to wor...
This tutorial covers basic concepts of logistic regression. I will explain the process of creating a model right from hypothesis function to algorithm. We will also use plots for better visualization of inner workings of the model. At the end we will test our model for binary classification.
Unlike Linear Regression, Logistic Regression is used to solve classification problem like classifying email as spam or not spam. Don’t get confuse by the name ‘Regression’, Logistic Regression is a ‘Classification Algorithm’. If the value that we are trying to classify takes on only two values 0(negative/false) or 1 (positive/true) then we call it ‘Binary Classification’ and if there are more than two classes, then we call it ‘Multi Class Classification’
We will create our own Logistic regression algorithm and build a classification model that estimates an applicant’s probability of admission based on Exam 1 and Exam 2 scores.
Logistic Regression is Supervised Learning Algorithm.
I am using same notation and example data used in Andrew Ng’s Machine Learning course
So in order to get discrete output from Linear Regression we can use threshold value(0.5) to classify output. Classify anything above 0.5 as positive class(1) and anything below 0.5 as negative class(0).
But this adjustment also do not work if there are outliers in dataset. Notice the slope of line in below figure and incorrect prediction.
Sigmoid function do exactly that, it maps the whole real number range between 0 and 1. It is also called as Logistic function.
The term Sigmoid means ‘S-shaped’ and when plotted this function gives S-shaped curve. In below figure for given range of X values, Y values ranges from 0 to 1 only
h(x) = θ_0 + (θ_1*x_1)....(θ_n*x_n)
We are going to use this function as input to our Sigmoid function to get the discrete values.
z = θ_0 + (θ_1*x_1)....(θ_n*x_n)
h(x) = g(z) = g(θ_0 + (θ_1*x_1)....(θ_n*x_n))
Basically we are using line function as input to sigmoid function in order to get discrete value from 0 to 1. The way our sigmoid function g(z) behaves is that, when its input is greater than or equal to zero, its output is greater than or equal to 0.5
Since positive input results in positive class and negative input results in negative class, we can separate both the classes by setting the weighted sum of inputs to 0. i.e.
z = θ_0 + (θ_1*x_1)....(θ_n*x_n) = 0
Lets create a formula to find decision boundary for two feature(x and x1) dataset
h(x) = θ_0 + (θ_1*x_1) + (θ_2*x_2) = 0
x_2 = -(θ_0 + (θ_1*x_1)) / θ_2
Now we have our hypothesis function and decision boundary formula to classify the given data. Just like linear regression lets define a cost function to find the optimum values of theta parameters.
It is obvious that to find the optimum values of theta parameters we have to try multiple values and then choose the best possible values based on how the predicted class match with given data. To do this we will create a cost function (J). Inner working of cost function is as below
Logistic Regression cost function is as below
Vectorized implementation Of Logistic Regression cost function is as below
Now we are going to use ‘Advanced Optimization Algorithm’ to find theta values.
Gradient of the cost is nothing but partial derivative of cost function
Enough of theory, now lets implement logistic regression algorithm using Python and create our classification model
Now we will implement the Logistic regression algorithm in Python and build a classification model that estimates an applicant’s probability of admission based on Exam 1 and Exam 2 scores.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt
df = pd.read_csv('https://raw.githubusercontent.com/satishgunjal/datasets/master/admission_basedon_exam_scores.csv')
m, n = df.shape
print('Number of training examples m = ', m)
print('Number of features n = ', n - 1) # Not counting the 'Label: Admission status'
df.sample(5) # Show random 5 training examples
Number of training examples m = 100
Number of features n = 2
Exam 1 marks | Exam 2 marks | Admission status | |
---|---|---|---|
38 | 74.789253 | 41.573415 | 0 |
45 | 62.222676 | 52.060992 | 0 |
19 | 76.978784 | 47.575964 | 1 |
3 | 60.182599 | 86.308552 | 1 |
66 | 40.457551 | 97.535185 | 1 |
To plot the data of admitted and not admitted applicants, we need to first create separate data frame for each class(admitted/not-admitted)
df_admitted = df[df['Admission status'] == 1]
print('Dimension of df_admitted= ', df_admitted.shape)
df_admitted.sample(5)
Dimension of df_admitted= (60, 3)
Exam 1 marks | Exam 2 marks | Admission status | |
---|---|---|---|
81 | 94.834507 | 45.694307 | 1 |
46 | 77.193035 | 70.458200 | 1 |
88 | 78.635424 | 96.647427 | 1 |
6 | 61.106665 | 96.511426 | 1 |
4 | 79.032736 | 75.344376 | 1 |
df_notadmitted = df[df['Admission status'] == 0]
print('Dimension of df_notadmitted= ', df_notadmitted.shape)
df_notadmitted.sample(5)
Dimension of df_notadmitted= (40, 3)
Exam 1 marks | Exam 2 marks | Admission status | |
---|---|---|---|
22 | 50.534788 | 48.855812 | 0 |
65 | 66.560894 | 41.092098 | 0 |
63 | 30.058822 | 49.592974 | 0 |
43 | 82.368754 | 40.618255 | 0 |
39 | 34.183640 | 75.237720 | 0 |
Now lets plot the scatter plot for admitted and not admitted students
plt.figure(figsize = (10,6))
plt.scatter(df_admitted['Exam 1 marks'], df_admitted['Exam 2 marks'], color='green', label='Admitted Applicants')
plt.scatter(df_notadmitted['Exam 1 marks'], df_notadmitted['Exam 2 marks'], color='red', label='Not Admitted Applicants')
plt.xlabel('Exam 1 Marks')
plt.ylabel('Exam 2 Marks')
plt.legend()
plt.title('Admitted Vs Not Admitted Applicants')
If you want to know the reason behind adding column of ones to feature matrix, please refer ‘Vector Representation Of Hypothesis Function’ in Univariate Linear Regression
# Get feature columns from dataframe
X = df.iloc[:, 0:2]
#Add column of ones (intercept term)
X = np.hstack((np.ones((m,1)),X))
# Now X is numpy array of 2 dimension
print("Dimension of feature matric X = ", X.shape, '\n')
y = df.iloc[:, -1]
# First 5 records training examples with labels
for i in range(5):
print('x =', X[i, ], ', y =', y[i])
Dimension of feature matric X = (100, 3)
x = [ 1. 34.62365962 78.02469282] , y = 0
x = [ 1. 30.28671077 43.89499752] , y = 0
x = [ 1. 35.84740877 72.90219803] , y = 0
x = [ 1. 60.18259939 86.3085521 ] , y = 1
x = [ 1. 79.03273605 75.34437644] , y = 1
Lets also initialize the theta values with 0
theta = np.zeros(n)
theta
array([0., 0., 0.])
def sigmoid(z):
"""
To convert continuous value into a range of 0 to 1
I/P
----------
z : Continuous value
O/P
-------
Value in range between 0 to 1.
"""
g = 1 / (1 + np.exp(-z))
return g
We are using vector implementation of cost and gradient function formula’s for better performance
def cost_function(theta, X, y):
"""
Compute cost for logistic regression.
I/P
----------
X : 2D array where each row represent the training example and each column represent the feature ndarray. Dimension(m x n)
m= number of training examples
n= number of features (including X_0 column of ones)
y : 1D array of labels/target value for each traing example. dimension(1 x m)
theta : 1D array of fitting parameters or weights. Dimension (1 x n)
O/P
-------
J : The cost of using theta as the parameter for linear regression to fit the data points in X and y.
"""
m, n = X.shape
x_dot_theta = X.dot(theta)
J = 1.0 / m * (-y.T.dot(np.log(sigmoid(x_dot_theta))) - (1 - y).T.dot(np.log(1 - sigmoid(x_dot_theta))))
return J
def gradient(theta, X, y):
"""
Compute gradient for logistic regression.
I/P
----------
X : 2D array where each row represent the training example and each column represent the feature ndarray. Dimension(m x n)
m= number of training examples
n= number of features (including X_0 column of ones)
y : 1D array of labels/target value for each traing example. dimension(1 x m)
theta : 1D array of fitting parameters or weights. Dimension (1 x n)
O/P
-------
grad: (numpy array)The gradient of the cost with respect to the parameters theta
"""
m, n = X.shape
x_dot_theta = X.dot(theta)
grad = 1.0 / m * (sigmoid(x_dot_theta) - y).T.dot(X)
return grad
Testing the cost_function() using initial values
cost = cost_function(theta, X, y)
print ('Cost at initial theta (zeros):', cost)
grad = gradient(theta, X, y)
print ('Gradient at initial theta (zeros):', grad)
Cost at initial theta (zeros): 0.6931471805599453
Gradient at initial theta (zeros): [ -0.1 -12.00921659 -11.26284221]
theta, nfeval, rc = opt.fmin_tnc(func=cost_function, fprime= gradient, x0=theta, args=(X, y))
cost = cost_function(theta, X, y)
print ('Cost at theta found by fminunc:', cost)
print ('theta:', theta)
Cost at theta found by fminunc: 0.20349770158947436
theta: [-25.16131865 0.20623159 0.20147149]
Let’s plot decision boundary to cross-check the accuracy of our model
# Lets calculate the X and Y values using Decision Boundary formula
# For ploting a line we just need 2 points. Here I am taking 'min' and 'max' value as my two X points
x_values = [min(X[:, 1]), np.max(X[:, 2])]
y_values = - (theta[0] + np.dot(theta[1], x_values)) / theta[2]
plt.figure(figsize = (10,6))
plt.scatter(df_admitted['Exam 1 marks'], df_admitted['Exam 2 marks'], color='green', label='Admitted Applicants')
plt.scatter(df_notadmitted['Exam 1 marks'], df_notadmitted['Exam 2 marks'], color='red', label='Not Admitted Applicants')
plt.xlabel('Exam 1 Marks')
plt.ylabel('Exam 2 Marks')
plt.plot(x_values, y_values, color='blue', label='Decision Boundary')
plt.legend()
plt.title('Decision Boundary')
input_data = np.array([1, 45, 85]) # Note the intercept term '1' in array
prob = sigmoid(np.dot(input_data, theta))
print ('Admission probability for applicant with scores 45 in Exam 1 and 85 in Exam 2 is =', prob)
Admission probability for applicant with scores 45 in Exam 1 and 85 in Exam 2 is = 0.7762906229081791
Let’s create a function for prediction on our logistic model. Instead of predicting the probability between 0 and 1, this function will use threshold value of 0.5 to predict the discrete value. 1 when probability >= 0.5 else 0
def predict(theta, X):
"""
Predict the class between 0 and 1 using learned logistic regression parameters theta.
Using threshold value 0.5 to convert probability value to class value
I/P
----------
X : 2D array where each row represent the training example and each column represent the feature ndarray. Dimension(m x n)
m= number of training examples
n= number of features (including X_0 column of ones)
theta : 1D array of fitting parameters or weights. Dimension (1 x n)
O/P
-------
Class type based on threshold
"""
p = sigmoid(X.dot(theta)) >= 0.5
return p.astype(int)
predictedValue = pd.DataFrame(predict(theta, X), columns=['Predicted Admission status']) # Create new dataframe of column'Predicted Price'
actualAdmissionStatus = pd.DataFrame(y, columns=['Admission status'])
df_actual_vs_predicted = pd.concat([actualAdmissionStatus,predictedValue],axis =1)
df_actual_vs_predicted.T
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | ... | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Admission status | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Predicted Admission status | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
2 rows × 100 columns
p = predict(theta, X)
print ('Accuracy:', np.mean(p == y) * 100 )
Accuracy: 89.0
This conludes our logistic regression. We have covered hypothesis function, cost function and cost function optimization using advance optimization technique. We have also tested our model for binary classification using exam test data. I will also create one more study using Sklearn logistic regression model.
Learning path to gain necessary skills and to clear the Azure Data Fundamentals Certification. This certification is intended for candidates beginning to wor...
Learning path to gain necessary skills and to clear the Azure AI Fundamentals Certification. This certification is intended for candidates with both technica...
In this guide we are going to create and train the neural network model to classify the clothing images. We will use TensorFlow deep learning framework along...
In short NLP is an AI technique used to do text analysis. Whenever we have lots of text data to analyze we can use NLP. Apart from text analysis, NLP also us...
There are multiple ways to split the data for model training and testing, in this article we are going to cover K Fold and Stratified K Fold cross validation...
K-Means clustering is most commonly used unsupervised learning algorithm to find groups in unlabeled data. Here K represents the number of groups or clusters...
Any data recorded with some fixed interval of time is called as time series data. This fixed interval can be hourly, daily, monthly or yearly. Objective of t...
Support vector machines is one of the most powerful ‘Black Box’ machine learning algorithm. It belongs to the family of supervised learning algorithm. Used t...
Random forest is supervised learning algorithm and can be used to solve classification and regression problems. Unlike decision tree random forest fits multi...
Decision tree explained using classification and regression example. The objective of decision tree is to split the data in such a way that at the end we hav...
This tutorial covers basic Agile principles and use of Scrum framework in software development projects.
Main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on un...
In this study we are going to use the Linear Model from Sklearn library to perform Multi class Logistic Regression. We are going to use handwritten digit’s d...
In this tutorial we are going to use the Logistic Model from Sklearn library. We are also going to use the same test data used in Logistic Regression From Sc...
This tutorial covers basic concepts of logistic regression. I will explain the process of creating a model right from hypothesis function to algorithm. We wi...
In this tutorial we are going to study about train, test data split. We will use sklearn library to do the data split.
In this tutorial we are going to study about One Hot Encoding. We will also use pandas and sklearn libraries to convert categorical data into numeric data.
In this tutorial we are going to use the Linear Models from Sklearn library. Scikit-learn is one of the most popular open source machine learning library for...
In this tutorial we are going to use the Linear Models from Sklearn library. Scikit-learn is one of the most popular open source machine learning library for...
In this tutorial we are going to cover linear regression with multiple input variables. We are going to use same model that we have created in Univariate Lin...
This tutorial covers basic concepts of linear regression. I will explain the process of creating a model right from hypothesis function to gradient descent a...
In this tutorial we will see the brief introduction of Machine Learning and preferred learning plan for beginners