Learning Path for DP-900 Microsoft Azure Data Fundamentals Certification
Learning path to gain necessary skills and to clear the Azure Data Fundamentals Certification. This certification is intended for candidates beginning to wor...
In this tutorial we are going to use the Linear Models from Sklearn library. We are also going to use the same test data used in Univariate Linear Regression From Scratch With Python tutorial
Scikit-learn is one of the most popular open source machine learning library for python. It provides range of machine learning models, here we are going to use linear model. Sklearn linear models are used when target value is some kind of linear combination of input value. Sklearn library has multiple types of linear models to choose form. The way we have implemented the ‘Batch Gradient Descent’ algorithm in Univariate Linear Regression From Scratch With Python tutorial, every Sklearn linear model also use specific mathematical model to find the best fit line.
The hypothesis function used by Linear Models of Sklearn library is as below
y(w, x) = w_0 + w_1 * x_1
Where,
You must have noticed that above hypothesis function is not matching with the hypothesis function used in Univariate Linear Regression From Scratch With Python tutorial. Actually both are same, just different notations are used
h(θ, x) = θ_0 + θ_1 * x_1
Where,
Yes, we are jumping to coding right after hypothesis function, because we are going to use Sklearn library which has multiple algorithms to choose from.
In case you don’t have any experience using these libraries, don’t worry I will explain every bit of code for better understanding
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
df =pd.read_csv('https://raw.githubusercontent.com/satishgunjal/datasets/master/univariate_profits_and_populations_from_the_cities.csv')
df.head(5) # Show first 5 rows from datset
population | profit | |
---|---|---|
0 | 6.1101 | 17.5920 |
1 | 5.5277 | 9.1302 |
2 | 8.5186 | 13.6620 |
3 | 7.0032 | 11.8540 |
4 | 5.8598 | 6.8233 |
X = df.values[:,0] # Get input values from first column
y = df.values[:,1] # Get output values froms econd column
m = len(X) # Total number training examples
print('X = ', X[: 5]) # Show first 5 records
print('y = ', y[: 5]) # Show first 5 records
print('m = ', m)
X = [6.1101 5.5277 8.5186 7.0032 5.8598]
y = [17.592 9.1302 13.662 11.854 6.8233]
m = 97
Let’s assign the features(independent variables) values to variable X and target(dependent variable) values to variable y For this dataset, we can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). Many other problems that you will encounter in real life are multi-dimensional and can’t be plotted on a 2D plot
plt.scatter(X,y, color='red',marker= '+')
plt.grid()
plt.rcParams["figure.figsize"] = (10,6)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.title('Scatter Plot Of Training Data')
The flow chart below will give you brief idea on how to choose right algorithm
Mathematical formula used by ordinary least square algorithm is as below,
model_ols = linear_model.LinearRegression()
model_ols.fit(X.reshape(m, 1),y)
# fit() method is used for training the model
# Note the first parameter(feature) is must be 2D array(feature matrix). Using reshape function convert 'X' which is 1D array to 2D array of dimension 97x1
# Remember we don' thave to add column of 1 in X matrix, which is not required for sklearn library and we can avoid all that work
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
coef = model_ols.coef_
intercept = model_ols.intercept_
print('coef= ', coef)
print('intercept= ', intercept)
coef= [1.19303364]
intercept= -3.89578087831185
You can compare above values with the values from Univariate Linear Regression From Scratch With Python tutorial. Remember the notation difference…
The values from our earlier model and Ordinary Least Squares model are not matching which is fine. Both models using different algorithm. Remember you have to choose the algorithm based on your data and problem type. And besides that this is just simple example with only 97 rows of data.
Let’s visualize the results..
plt.scatter(X, y, color='red', marker= '+', label= 'Training Data')
plt.plot(X, model_ols.predict(X.reshape(m, 1)), color='green', label='Linear Regression')
plt.rcParams["figure.figsize"] = (10,6)
plt.grid()
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.title('Linear Regression Fit')
plt.legend()
We can predict the result using our model as below
predict1 = model_ols.predict([[3.5]])
print("For population = 35,000, our prediction of profit is", predict1 * 10000)
For population = 35,000, our prediction of profit is [2798.36876352]
So using sklearn library, we can train our model and predict the results with only few lines of code. Lets test our data with few other algorithms
Mathematical formula used by Ridge Regression algorithm is as below,
model_r = linear_model.Ridge(alpha=35)
model_r.fit(X.reshape(m, 1),y)
Ridge(alpha=35, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
random_state=None, solver='auto', tol=0.001)
coef = model_r.coef_
intercept = model_r.intercept_
print('coef= ' , coef)
print('intercept= ' , intercept)
coef= [1.16468008]
intercept= -3.6644214596467215
predict1 = model_r.predict([[3.5]])
print("For population = 35,000, our prediction of profit is", predict1 * 10000)
For population = 35,000, our prediction of profit is [4119.58817955]
Mathematical formula used by LASSO Regression algorithm is as below,
model_l = linear_model.Lasso(alpha=0.55)
model_l.fit(X.reshape(m, 1),y)
Lasso(alpha=0.55, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False)
coef = model_l.coef_
intercept = model_l.intercept_
print('coef= ' , coef)
print('intercept= ' , intercept)
coef= [1.15592566]
intercept= -3.5929871214681945
predict1 = model_l.predict([[3.5]])
print("For population = 35,000, our prediction of profit is", predict1 * 10000)
For population = 35,000, our prediction of profit is [4527.52676756]
As you can notice with Sklearn library we have very less work to do and everything is handled by library. We don’t have to add column of ones, no need to write our cost function or gradient descent algorithm. We can directly use library and tune the hyper parameters (like changing the value of alpha) till the time we get satisfactory results. If you are following my machine learning tutorials from the beginning then implementing our own gradient descent algorithm and then using prebuilt models like Ridge or LASSO gives us very good perspective of inner workings of these libraries and hopeful it will help you to understand it better.
Learning path to gain necessary skills and to clear the Azure Data Fundamentals Certification. This certification is intended for candidates beginning to wor...
Learning path to gain necessary skills and to clear the Azure AI Fundamentals Certification. This certification is intended for candidates with both technica...
In this guide we are going to create and train the neural network model to classify the clothing images. We will use TensorFlow deep learning framework along...
In short NLP is an AI technique used to do text analysis. Whenever we have lots of text data to analyze we can use NLP. Apart from text analysis, NLP also us...
There are multiple ways to split the data for model training and testing, in this article we are going to cover K Fold and Stratified K Fold cross validation...
K-Means clustering is most commonly used unsupervised learning algorithm to find groups in unlabeled data. Here K represents the number of groups or clusters...
Any data recorded with some fixed interval of time is called as time series data. This fixed interval can be hourly, daily, monthly or yearly. Objective of t...
Support vector machines is one of the most powerful ‘Black Box’ machine learning algorithm. It belongs to the family of supervised learning algorithm. Used t...
Random forest is supervised learning algorithm and can be used to solve classification and regression problems. Unlike decision tree random forest fits multi...
Decision tree explained using classification and regression example. The objective of decision tree is to split the data in such a way that at the end we hav...
This tutorial covers basic Agile principles and use of Scrum framework in software development projects.
Main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on un...
In this study we are going to use the Linear Model from Sklearn library to perform Multi class Logistic Regression. We are going to use handwritten digit’s d...
In this tutorial we are going to use the Logistic Model from Sklearn library. We are also going to use the same test data used in Logistic Regression From Sc...
This tutorial covers basic concepts of logistic regression. I will explain the process of creating a model right from hypothesis function to algorithm. We wi...
In this tutorial we are going to study about train, test data split. We will use sklearn library to do the data split.
In this tutorial we are going to study about One Hot Encoding. We will also use pandas and sklearn libraries to convert categorical data into numeric data.
In this tutorial we are going to use the Linear Models from Sklearn library. Scikit-learn is one of the most popular open source machine learning library for...
In this tutorial we are going to use the Linear Models from Sklearn library. Scikit-learn is one of the most popular open source machine learning library for...
In this tutorial we are going to cover linear regression with multiple input variables. We are going to use same model that we have created in Univariate Lin...
This tutorial covers basic concepts of linear regression. I will explain the process of creating a model right from hypothesis function to gradient descent a...
In this tutorial we will see the brief introduction of Machine Learning and preferred learning plan for beginners