The Wayback Machine - https://web.archive.org/web/20220709005818/https://github.com/awslabs/autogluon/issues/1459
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Tools integration #1459

Open
willsmithorg opened this issue Dec 26, 2021 · 1 comment
Open

Feature Tools integration #1459

willsmithorg opened this issue Dec 26, 2021 · 1 comment
Labels
enhancement good first issue help wanted module: tabular

Comments

@willsmithorg
Copy link
Contributor

@willsmithorg willsmithorg commented Dec 26, 2021

Could FeatureTools be implemented as an automated preprocessor to Autogluon, adding the ability to handle multi-entity problems (i.e. Data split across multiple normalised database tables)? So if you supply Autogluon with a list of Dataframes instead of a single Dataframe it would first invoke FeatureTools:

The FeatureTools license is BSD-3-Clause not Apache2. I don't suppose this is a problem. Alteryx is a $5bn company so they're around for a while. The only amber flag I see is their list of feature primitives (https://primitives.featurelabs.com/) has both open source and "premium" primitives. The premium ones aren't that impressive. Companies that try to segment a single product into open and proprietary can cause trouble down the line, they have a conflict of interest if external people want to improve the open and there's overlap with their proprietary.

Their "woodwork" typing system (https://woodwork.alteryx.com/en/stable/index.html) to add/infer more logical and semantic meaning to a set of features is also interesting. Might become a standard if FeatureTools does well.

Would be grateful for your thoughts. If implemented well, this would allow Autogluon to do SOTA automated feature engineering as well as model selection, stacking and ensembling.

@Innixma Innixma added enhancement help wanted module: tabular labels Dec 29, 2021
@Innixma Innixma added this to the Feature Backlog milestone Dec 29, 2021
@Innixma Innixma added the good first issue label Dec 29, 2021
@Innixma
Copy link
Collaborator

@Innixma Innixma commented Dec 30, 2021

This is certainly a possibility to have FeatureTools be an optional dependency in AutoGluon that can be used to enhance feature preprocessing and expand the set of inputs AutoGluon can handle such as multi-DF. As the tool is quite extensive, I don't have time at present to dive into it personally, so I'm opening this issue up to the community to create a POC that adds value to AutoGluon's users.

Things that need to be addressed in the POC:

  • API
  • Set of new functionality that wasn't possible prior (multi-table input?)
  • Set of enhanced functionality that is integrated with existing APIs (such as new FeatureGenerators that use FeatureTools). These additions will need to be tested against real datasets to ensure they are helpful.
  • Resolving redundant functionality that exists in both AutoGluon and FeatureTools (feature primitives / type inference / feature processing pipelines, etc.)

If interested in contributing this functionality, please respond to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first issue help wanted module: tabular
2 participants