functime-org / functime

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.

Home Page:https://docs.functime.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enhanced Feature Selection for (Economic) Data in Models

jhug12 opened this issue · comments

Background and Problem Statement
Incorporating economic data into models is often challenging, particularly in deciding which features to include. This decision-making process can benefit from a more standardized and automated approach. Standardizing feature selection can improve model consistency, reliability, and ease of use in various applications.

Proposed Solution
The addition of a feature_selection/filters sub-package is proposed to introduce automated feature selection methods tailored specifically for economic data. This package would initially include one or two specific filters, providing a foundation for further development.

Suggested Workflow Integration:
The integration of this feature would ideally follow this workflow:

  1. Preprocessing: Initial data cleaning and preparation.
  2. Feature Extraction: Deriving new features from the processed data.
  3. Feature Selection/Filtering: Using the new feature_selection/filters sub-package to select the most relevant features.
  4. Forecasting: Building the model to forecast or predict outcomes using the selected features.

Key Features
P-Value Filter: This filter would exclude features whose p-value from a simple regression with the target variable exceeds a specified threshold, thereby focusing on statistically significant features.
[Optional] Additional Filter: [Describe another potential filter, such as one based on correlation, variance inflation factor (VIF), information gain, etc.]

Expected Benefits
A systematic approach to feature selection with functime would build on the expertise of the developers and there be consistent and performant, not breaking any of the design decisions. As an outsider it is not clear to me on how to approach this

Ciao Jhug, sorry if this took so long to respond. This is definitely a great feature to add but also a broad one, that should be discussed alongside the project's roadmap. But we can get started with something.

There are a bunch of methods that can be used, from stepwise feature selection and other p-value-based methods as you denote. I would love to integrate them (I also come from an econometric background).

The other "blocker" slowing down this feature is that Polars does not currently offer a lot of "econometric" features such as computing the p-values of coefficients after a regression. As functime, I think we would need to sit down and agree on a polars plugin (such as @abstractqqq's) that does OLS and see how to integrate it.