Welcome! This workshop is from WinderResearch.com. Sign up to receive more free workshops, training and videos.
statsmodels is a comprehensive library for time series data analysis. And it has a really neat set of functions to detrend data. So if you see that your features have any trends that are time-dependent, then give this a try.
It’s essentially fitting the multiplicative model:
$y(t) = Level * Trend * Seasonality * Noise$
Below we have some data from the 1950’s showing the number of people (monthly, in thousands) flying with an airline. You can see that there is clearly some seasonal variation.
from pandas import Series import matplotlib.pyplot as plt series = Series.from_csv('https://s3.eu-west-2.amazonaws.com/assets.winderresearch.com/data/international-airline-passengers.csv', header=0) series.plot() plt.show()
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py:2849: FutureWarning: from_csv is deprecated. Please use read_csv(...) instead. Note that some of the default arguments are different, so please refer to the documentation for from_csv when changing your function calls infer_datetime_format=infer_datetime_format)
from statsmodels.tsa.seasonal import seasonal_decompose result = seasonal_decompose(series, model='multiplicative') result.plot() plt.show()
/opt/conda/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead. from pandas.core import datetools
Note how well it de-seasonal-ises the data. After removing the seasonal variation the trend is quite consistent.