Data is an essential asset of modern business. It empowers companies by surfacing unique insights about their customers and creates actionable products. The more data you possess, the better you meet and exceed your customers’ expectations.
Time-series (TS) filters are often used in digital signal processing for distributed acoustic sensing (DAS). The goal is to remove a subset of frequencies from a digitised TS signal. To filter a signal you must touch all of the data and perform a convolution. This is a slow process when you have a large amount of data. The purpose of this post is to investigate which filters are fastest in Python.
Nearly everyone using Python for Data Science has used or is using the Pandas Data Analysis/Preprocessing library. It is as much of a mainstay as Scikit-Learn. Despite this, one continuing bugbear is the different core data types used by each: pandas.DataFrame and np.array. Wouldn’t it be great if we didn’t have to worry about converting DataFrames to numpy types and back again? Yes, it would. Step forward Scikit Pandas. Sklearn Pandas Sklearn Pandas, part of the Scikit Contrib package, adds some syntactic sugar to use Dataframes in sklearn pipelines and back again.