Time Series Analysis and Forecasting


Any data recorded with some fixed interval of time is called as time series data. This fixed interval can be hourly, daily, monthly or yearly. e.g. hourly temp reading, daily changing fuel prices, monthly electricity bill, annul company profit report etc. In time series data, time will always be independent variable and there can be one or many dependent variable.

Sales forecasting time series with shampoo sales for every month will look like this,


In above example since there is only one variable dependent on time so its called as univariate time series. If there are multiple dependent variables, then its called as multivariate time series.

Objective of time series analysis is to understand how change in time affect the dependent variables and accordingly predict values for future time intervals.

Time Series Characteristics

Mean, standard deviation and seasonality defines different characteristics of the time series.


Important characteristics of the time series are as below


Trend represent the change in dependent variables with respect to time from start to end. In case of increasing trend dependent variable will increase with time and vice versa. It’s not necessary to have definite trend in time series, we can have a single time series with increasing and decreasing trend. In short trend represent the varying mean of time series data.



If observations repeats after fixed time interval then they are referred as seasonal observations. These seasonal changes in data can occur because of natural events or man-made events. For example every year warm cloths sales increases just before winter season. So seasonality represent the data variations at fixed intervals.



This is also called as noise. Strange dips and jump in the data are called as irregularities. These fluctuations are caused by uncontrollable events like earthquakes, wars, flood, pandemic etc. For example because of COVID-19 pandemic there is huge demand for hand sanitizers and masks.



Cyclicity occurs when observations in the series repeats in random pattern. Note that if there is any fixed pattern then it becomes seasonality, in case of cyclicity observations may repeat after a week, months or may be after a year. These kinds of patterns are much harder to predict.


Time series data which has above characteristics is called as ‘Non-Stationary Data’. For any analysis on time series data we must convert it to ‘Stationary Data’

The general guideline is to estimate the trend and seasonality in the time series, and then make the time series stationary for data modeling. In data modeling step statistical techniques are used for time series analysis and forecasting. Once we have the predictions, in the final step forecasted values converted into the original scale by applying trend and seasonality constraints back.

Time Series Analysis

As name suggest its analysis of the time series data to identify the patterns in it. I will briefly explain the different techniques and test for time series data analysis.

Decomposition of Time Series

Time series decomposition helps to deconstruct the time series into several component like trend and seasonality for better visualization of its characteristics. Using time-series decomposition makes it easier to quickly identify a changing mean or variation in the data


Stationary Data

For accurate analysis and forecasting trend and seasonality is removed from the time series and converted it into stationary series. Time series data is said to be stationary when statistical properties like mean, standard deviation are constant and there is no seasonality. In other words statistical properties of the time series data should not be a function of time.


Test for Stationarity

Easy way is to look at the plot and look for any obvious trend or seasonality. While working on real world data we can also use more sophisticated methods like rolling statistic and Augmented Dickey Fuller test to check stationarity of the data.

Rolling Statistics

In rolling statistics technique we define a size of window to calculate the mean and standard deviation throughout the series. For stationary series mean and standard deviation shouldn’t change with time.

Augmented Dickey Fuller (ADF) Test

I won’t go into the details of how this test works. I will concentrate more on how to interpret the result of this test to determine the stationarity of the series. ADF test will return ‘p-value’ and ‘Test Statistics’ output values.

  • p-value > 0.05: non-stationary.
  • p-value <= 0.05: stationary.
  • Test statistics: More negative this value more likely we have stationary series. Also, this value should be smaller than critical values(1%, 5%, 10%). For e.g. If test statistic is smaller than the 5% critical values, then we can say with 95% confidence that this is a stationary series

Convert Non-Stationary Data to Stationary Data

Accounting for the time series data characteristics like trend and seasonality is called as making data stationary. So by making the mean and variance of the time series constant, we will get the stationary data. Below are the few technique used for the same…


Differencing technique helps to remove the trend and seasonality from time series data. Differencing is performed by subtracting the previous observation from the current observation. The differenced data will contain one less data point than original data. So differencing actually reduces the number of observations and stabilize the mean of a time series.

difference = previous observation - current observation

After performing the differencing it’s recommended to plot the data and visualize the change. In case there is not sufficient improvement you can perform second order or even third order differencing.


A simple but often effective way to stabilize the variance across time is to apply a power transformation to the time series. Log, square root, cube root are most commonly used transformation techniques. Most of the time you can pick the type of growth of the time series and accordingly choose the transformation method. For. e.g. A time series that has a quadratic growth trend can be made linear by taking the square root. In case differencing don’t work, you may first want to use one of above transformation technique to remove the variation from the series.


Moving Average

In moving averages technique, a new series is created by taking the averages of data points from original series. In this technique we can use two or more raw data points to calculate the average. This is also called as ‘window width (w)’. Once window width is decided, averages are calculated from start to the end for each set of w consecutive values, hence the name moving averages. It can also be used for time series forecasting.


Weighted Moving Averages(WMA)

WMA is a technical indicator that assigns a greater weighting to the most recent data points, and less weighting to data points in the distant past. The WMA is obtained by multiplying each number in the data set by a predetermined weight and summing up the resulting values. There can be many techniques for assigning weights. A popular one is exponentially weighted moving average where weights are assigned to all the previous values with a decay factor.

Centered Moving Averages(CMS)

In a centered moving average, the value of the moving average at time t is computed by centering the window around time t and averaging across the w values within the window. For example, a center moving average with a window of 3 would be calculated as

  CMA(t) = mean(t-1, t, t+1)

CMA is very useful for visualizing the time series data

Trailing Moving Averages(TMA)

In trailing moving average, instead of averaging over a window that is centered around a time period of interest, it simply takes the average of the last w values. For example, a trailing moving average with a window of 3 would be calculated as:

 TMA(t) = mean(t-2, t-1, t)

TMA are useful for forecasting.


  • Most important point about values in time series is its dependence on the previous values.
  • We can calculate the correlation for time series observations with previous time steps, called as lags.
  • Because the correlation of the time series observations is calculated with values of the same series at previous times, this is called an autocorrelation or serial correlation.
  • To understand it better lets consider the example of fish prices. We will use below notation to represent the fish prices.
    • P(t)= Fish price of today
    • P(t-1) = Fish price of last month
    • P(t-2) =Fish price of last to last month
  • Time series of fish prices can be represented as P(t-n),….. P(t-3), P(t-2),P(t-1), P(t)
  • So if we have fish prices for last few months then it will be easy for us to predict the fish price for today (Here we are ignoring all other external factors that may affect the fish prices

All the past and future data points are related in time series and ACF and PACF functions help us to determine correlation in it.

Auto Correlation Function (ACF)

  • ACF tells you how correlated points are with each other, based on how many time steps they are separated by.
  • Now to understand it better lets consider above example of fish prices. Let’s try to find the correlation between fish price for current month P(t) and two months ago P(t-2). Important thing to note that, fish price of two months ago can directly affect the today’s fish price or it can indirectly affect the fish price through last months price P(t-1)
  • So ACF consider the direct as well indirect effect between the points while determining the correlation

Partial Auto Correlation Function (PACF)

  • Unlike ACF, PACF only consider the direct effect between the points while determining the correlation
  • In case of above fish price example PACF will determine the correlation between fish price for current month P(t) and two months ago P(t-2) by considering only P(t) and P(t-2) and ignoring P(t-1)

Time Series Forecasting

Forecasting refers to the future predictions based on the time series data analysis. Below are the steps performed during time series forecasting

  • Step 1: Understand the time series characteristics like trend, seasonality etc
  • Step 2: Do the analysis and identify the best method to make the time series stationary
  • Step 3: Note down the transformation steps performed to make the time series stationary and make sure that the reverse transformation of data is possible to get the original scale back
  • Step 4: Based on data analysis choose the appropriate model for time series forecasting
  • Step 5: We can assess the performance of a model by applying simple metrics such as residual sum of squares(RSS). Make sure to use whole data for prediction.
  • Step 6: Now we will have an array of predictions which are in transformed scale. We just need to apply the reverse transformation to get the prediction values in original scale.
  • Step 7: At the end we can do the future forecasting and get the future forecasted values in original scale.

Models Used For Time Series Forecasting

  • Autoregression (AR)
  • Moving Average (MA)
  • Autoregressive Moving Average (ARMA)
  • Autoregressive Integrated Moving Average (ARIMA)
  • Seasonal Autoregressive Integrated Moving-Average (SARIMA)
  • Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
  • Vector Autoregression (VAR)
  • Vector Autoregression Moving-Average (VARMA)
  • Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
  • Simple Exponential Smoothing (SES)
  • Holt Winter’s Exponential Smoothing (HWES)

Next part of this article we are going to analyze and forecast air passengers time series data using ARIMA model. Brief introduction of ARIMA model is as below


  • ARIMA stands for Auto-Regressive Integrated Moving Averages. It is actually a combination of AR and MA model.
  • ARIMA has three parameters ‘p’ for the order of Auto-Regressive (AR) part, ‘q’ for the order of Moving Average (MA) part and ‘d’ for the order of integrated part.

Auto-Regressive (AR) Model:

  • As the name indicates, its the regression of the variables against itself. In this model linear combination of the past values are used to forecast the future values.
  • To figure out the order of AR model we will use PACF function


  • Uses differencing of observations (subtracting an observation from observation at the previous time step) in order to make the time series stationary. Differencing involves the subtraction of the current values of a series with its previous values d number of times.
  • Most of the time value of d = 1, means first order of difference.

Moving Average (MA) Model:

  • Rather than using past values of the forecast variable in a regression, a moving average model uses linear combination of past forecast errors
  • To figure out the order of MA model we will use ACF function

Python Example

We have a monthly time series data of the air passengers from 1 Jan 1949 to 1 Dec 1960. Each row contains the air passenger number for a month of that particular year. Objective is to build a model to forecast the air passenger traffic for future months.

For source code please refer my kaggle kernel


ANN Model to Classify Images

12 minute read

In this guide we are going to create and train the neural network model to classify the clothing images. We will use TensorFlow deep learning framework along...

Introduction to NLP

8 minute read

In short NLP is an AI technique used to do text analysis. Whenever we have lots of text data to analyze we can use NLP. Apart from text analysis, NLP also us...

K Fold Cross Validation

14 minute read

There are multiple ways to split the data for model training and testing, in this article we are going to cover K Fold and Stratified K Fold cross validation...

K-Means Clustering

13 minute read

K-Means clustering is most commonly used unsupervised learning algorithm to find groups in unlabeled data. Here K represents the number of groups or clusters...

Time Series Analysis and Forecasting

10 minute read

Any data recorded with some fixed interval of time is called as time series data. This fixed interval can be hourly, daily, monthly or yearly. Objective of t...

Support Vector Machines

9 minute read

Support vector machines is one of the most powerful ‘Black Box’ machine learning algorithm. It belongs to the family of supervised learning algorithm. Used t...

Random Forest

11 minute read

Random forest is supervised learning algorithm and can be used to solve classification and regression problems. Unlike decision tree random forest fits multi...

Decision Tree

14 minute read

Decision tree explained using classification and regression example. The objective of decision tree is to split the data in such a way that at the end we hav...

Agile Scrum Framework

7 minute read

This tutorial covers basic Agile principles and use of Scrum framework in software development projects.

Underfitting & Overfitting

2 minute read

Main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on un...

Binary Logistic Regression Using Sklearn

5 minute read

In this tutorial we are going to use the Logistic Model from Sklearn library. We are also going to use the same test data used in Logistic Regression From Sc...

Train Test Split

3 minute read

In this tutorial we are going to study about train, test data split. We will use sklearn library to do the data split.

One Hot Encoding

11 minute read

In this tutorial we are going to study about One Hot Encoding. We will also use pandas and sklearn libraries to convert categorical data into numeric data.

Back to top ↑