Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Feature Extractor #6

Open
mkaliberda opened this issue Jul 3, 2024 · 0 comments
Open

[RFC] Feature Extractor #6

mkaliberda opened this issue Jul 3, 2024 · 0 comments
Assignees
Labels

Comments

@mkaliberda
Copy link
Collaborator

Please identify RFC type

Feature

Description

The objective is to build a feature extraction functionality to enhance the feature development while creating a pipeline.
We propose to add a feature extractor into WebUI Pipeline Builder that gives

Motivation

Building a feature extractor for the WebUI unlocks powerful benefits for model development:

  • Simplified experimentation: Users can easily extract and analyze specific features from data directly within the WebUI, without running a whole pipeline.

  • Improved model training: Feature extraction allows users to focus models on relevant data aspects, potentially leading to faster training times and better model performance.

  • Enhanced user experience: Integrating feature extraction into the WebUI empowers users with a much more flexible development, fostering a smoother workflow.

Design Proposal

UI

We propose to build a feature extractor based on the pipeline builder. Also, we propose to use 3 different pipeline builder modes: Feature Extractor​, Train Model​, AutoML
And change the left nav menu to group these screens

They all have the same home screen with a pipeline table and templates.

Existing pipelines can be opened with any of those modes, which gives the flexibility to build models based on a properly selected set of transforms.

Feature Extractor detailed screen:

To run it calls action submitOptimizationRequest with post request project/${projectUuid}/sandbox-async/${pipelineUuid}/ with parameter
execution_type = pipeline

To get data for the Feature Visualisation and Feature Graph it calls an action with get request project/${projectUuid}/sandbox-async/${pipelineUuid}/

Train Model and AutoML use pipeline builder with different custom_training​ parameter and execution_type = automl
calls action submitOptimizationRequest with post request project/${projectUuid}/sandbox-async/${pipelineUuid}/

Back-end
The endpoint project/${projectUuid}/sandbox-async/${pipelineUuid}/ should also return
feature_summary
To implement this we need to implement feature_summary calculation base on feature_table similarly to model's feature_summary

The endpoint project/${projectUuid}/sandbox-async/${pipelineUuid}/ should also return result_type field

Performance Implications

There are some extra calculation of feature_summary during get request project/${projectUuid}/sandbox-async/${pipelineUuid}/ for pipelines with execution_type = pipeline

Dependencies

No 3rd party dependencies would be added

User Impact

Benefit: This feature simplifies experimentation, improves model training, and enhances user experience by enabling feature extraction directly within the UI.

  • Faster workflows: Extract and analyze features without running entire pipelines.

  • Optimized models: Focus models on relevant data aspects for potentially faster training and better performance.

  • Smoother development: Integrates seamlessly into UI fostering a streamlined workflow.

@mkaliberda mkaliberda added the RFC label Jul 3, 2024
@mkaliberda mkaliberda moved this to Backlog in PiccoloAI Jul 3, 2024
@mkaliberda mkaliberda moved this from Backlog to Ready in PiccoloAI Jul 3, 2024
@mkaliberda mkaliberda moved this from Ready to In progress in PiccoloAI Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In progress
Development

No branches or pull requests

2 participants