Time Series Forecasting is not very popular among beginners due to its seemingly complex structure. However, it is one of the key solutions in solving a fundamental data problem – prediction of future outcomes when no future records are available. Breaking it down; what is Time Series in simple terms? Let’s find out in this blog.
What is Time Series Forecasting?
To understand Time Series Forecasting, we must first understand what a Time Series is.
A Time Series, as the name suggests, is a series of information that has been collected over time. For instance, the intensity of rain on all days in the month of August can be considered as a time series. August has 31 days and if against each day, the rain intensity is recorded and arranged in ascending order with respect to the date, the series will look something like this:
Now, we can comfortably understand Time Series Forecasting. Just with the data available as above – which is information against dates, we create a Time Series Forecasting Model. Suppose we are to predict the rain intensity on the 1st of September. In a typical machine learning problem, we will have data like a number of pedestrians on the street, traffic intensity, pollution and more of such details to support our prediction of rain intensity.
However, as you must have already figured out, this data will not be available for future dates and thus, the only data we can be certain of is the future date – 1st of September. This is when it becomes a Time Series Forecasting Problem. Predicting future outcomes based on past time-based data. However, do note that Time Series Forecasting heavily differs from Time Series Analysis.
Time Series Analysis tries to describe the data at hand. It defines the reason behind the record analysis of the dates. Whereas, Time Series Forecasting takes on a predictive approach. It also goes through a few common stages of Time Series Analysis to understand the data in-depth. Therefore, it can be said that Time Series Forecasting has both descriptive and predictive nature.
Elements of Time Series
Like we saw in the previous section, the defining factor of a time series problem is its sole dependence on time-based features. Let us go through a few time-based elements that are key to Time Series problems –
- Level – Level can be considered as the average or baseline value in the series.
- Seasonality – This is a very important factor which notes the repetitive patterns in time. For instance, a customer is likely to order more on Fridays. That is a 7-day repetitive pattern.
- Trend – Trend is the increase or decrease of the values of the target variable with time. This often has a linear pattern which means there is mostly a progressive increase, or continuous decrease over time.
- Cycles – Irregular cyclic patterns can pop up in the data that might not be bound to either seasonal boundaries or particular trends.
- Noise – Noise is the random variations in data over time which cannot be explained either through trends, seasonality or any other pattern.
In simple terms, the nature of the elements of time series can be represented through the images below:
In real data, the elements of time series would look something like this:
Key Concepts of Forecasting
To jump into a time series problem, having a fundamental understanding of the following concepts is vital. Let us look into some commonly used techniques in time series.
1. Rolling features – Rolling features attempt to capture the average or any central feature of the past data. For example, rolling mean with a window of 3 days will calculate the mean of the last three days and populate it on the fourth day. This helps to capture increasing or decreasing trends in the data.
If you are looking to build time series forecasting model in python, the below code snippet can be used to create rolling features for a window of 3 days:
data['rolling_mean'] = data['net_amount'].rolling(window=3).mean()
Where ‘net_amount’ is the column on which rolling mean is to be calculated.
2. Lagging Features – Lagging features are used to capture the seasonality of the model. If we create a lagging feature with a window of 7, it will take the value from 7 days before and populate it on the current date. This can be easily understood with the help of the order example.
Suppose today is Friday, a lagging feature of window 7 will take the number of orders from last Friday and populate today’s record with that number. If the customer does have a tendency to order more on Fridays, the lagging feature will be easily able to capture that. Therefore, primarily, lagging features help in capturing seasonal patterns.
In python, lagging feature for a window of 7 days can be constructed in the following way:
data['lag'] = data['number_of_orders'].shift(7)
3. ARIMA Model – ARIMA models are one of the most popular models used for solving time series problems. ARIMA stands for AutoRegressive Integrated Moving Average. As the name suggests, two methods are integrated in ARIMA – the AutoRegressive approach and the mean average factor.
The AutoRegessive model linearly combines past values of the target variable. The moving average model, on the other hand, linearly combines the errors of past predictions. This integration allows the model to learn from the mistakes during the training itself!
Here is a code snippet that can help in the enablement of the ARIMA model:
Import pandas as pd
from statsmodels.tsa.arima_model import ARIMA
return datetime.strptime('190'+x, '%Y-%m')
series = pd.read_csv('data.csv', header=0, parse_dates=, index_col=0, squeeze=True, date_parser=parser)
# fit model
model = ARIMA(series, order=(3,1,2))
#The lag value is set to 3, a difference order of 1 is used to make the time series stationary and a moving average window of 1 is used.
model_fit = model.fit()
The summary provides the coefficients of the model along with the goodness of fit.
Use Cases of Time Series
- Banking Sector: The banking sector is very sensitive since it deals with finances. It is always expected to be prepared and full-proof. In banking, time series forecasting can be very effectively used to predict the loan amount requests for a future month. This helps the bank to reorganise the funds in advance. It can also be used to predict the total withdrawals and deposits on future dates.
- Manufacturing Sector: The manufacturing sector deals with heavy batch productions. Time series forecasting can help in predicting the batch quantities for each day in upcoming months. This can, in turn, help the manufacturer to estimate the overall profits and invest accordingly.
- Agriculture: Agriculture is a key sector in our country and is also known to be heavily burdened with heavy exports and high demands in the domestic market. Technology can do wonders for this sector. A solution like time series can estimate production during harvests, predict the weather details like rain and sunlight intensities and can also attempt to predict demand and supply curves in the market. This can help the farmers to prepare in advance and sometimes even allow them to evade heavy losses.
- COVID: Time series model has been heavily used in identifying future COVID cases across the globe. It has effectively helped many experts to identify the supplies and arrangements required to fight the global crisis. To know more about how time series is supporting COVID initiatives, you can refer this blog on Interactive Time Series Maps for COVID-19 Spread with GeoPandas and Ipywidgets.
Even you can build your own time series solution by going through a list of assisting Data Science tools.
Like all Machine Learning models, Time Series Forecasting also has a set of challenges or concerns.
- Staleness of model: Over time, the trends, seasonality and some other features of the data has a tendency to change. This makes the model old and calls for retraining on data that has been recorded on more recent dates.
- Determination of Forecasting Frequency: Forecasting frequency refers to the frequency at which the predictions should occur. For instance, in a weather forecasting problem it is clear that the forecasts need to happen every day and the original data needs to be tallied against the forecasted data at the end of each day. However, there are many use cases where the daily tally is not feasible and an apt frequency needs to be decided upon.
- Data Quality and Data Collection: As mentioned in the previous point, data collection is often challenging. For instance, population data is tough to find, and the data quality also cannot be vouched for owing to several factors like manual mistakes, absence of relevant data, restricted information and many others!
- Long prediction range: Going ahead with the same example, population data can only be tallied in, say, a minimum of one to two years. This is a long-range of time for which predictions are to be generated. Unless backed by other supportive information, time series predictions can turn out to be largely inaccurate over the long ranges.
Through the course of this blog, we noticed that time series has a very wide range of applications and is in fact, a highly in-demand skill. It has created huge benefits for market leaders across various sectors and shows promise of doing much more in the future. Learning about time series in depth can not only increase the market value of the candidate but can also give a very refined perspective when it comes to data and predictive problems!
Kick-start your AI-ML learning today with concepts such as Time series with Springboard’s six-months online machine learning career track program. In addition to a world-class curriculum, the program comes with 1:1 mentorship from industry experts, career coaching and guidance, as well as a job guarantee. Apply to the project-led online program Now!