For data scientists, ML engineers, and AI researchers who want to simplify feature engineering, manage complex dependencies, and boost productivity.
Feature Fabrica is an open-source Python library designed to improve engineering practices and transparency in feature engineering. It allows users to define features declaratively using YAML, manage dependencies between features, and apply complex transformations in a scalable and convenient manner.
By providing a structured approach to feature engineering, Feature Fabrica aims to save time, reduce errors, and enhance the transparency and reproducibility of your machine learning workflows. Whether you're working on small projects or managing large-scale pipelines, Feature Fabrica is designed to meet your needs.
- 📝 Declarative Feature Definitions: Define features, data types, and dependencies using a simple YAML configuration.
- 🔄 Transformations: Apply custom transformations to raw features to derive new features.
- 🔗 Dependency Management: Automatically handle dependencies between features.
- ✔️ Pydantic Validation: Ensure data types and values conform to expected formats.
- 🛡️ Fail-Fast with Beartype: Catch type-related errors instantly during development, ensuring your transformations are robust.
- 🚀 Scalability: Designed to scale from small projects to large machine learning pipelines.
- 🔧 Hydra Integration: Leverage Hydra for configuration management, enabling flexible and dynamic configuration of transformations.
To install Feature Fabrica, simply run:
pip install feature-fabrica
Features are defined in a YAML file. See examples in examples/
folder. Here’s an example:
feature_a:
description: "Raw feature A"
data_type: "int32"
group: "training"
feature_b:
description: "Raw feature B"
data_type: "float32"
group: "training"
transformation:
scale_feature:
_target_: ().scale(factor=2)
feature_c:
description: "Derived feature C"
data_type: "float32"
group: "training_experiment"
dependencies: ["feature_a", "feature_b"]
transformation:
solve:
_target_: (feature_a + feature_b) / 2
feature_e:
description: "Raw feature E"
data_type: "int32"
group: "draft"
transformation:
_target_: ().upper().lower().one_hot(categories=['apple', 'orange'])
You can define custom transformations by subclassing the Transformation class:
from feature_fabrica.transform import Transformation
class MyCustomTransform(Transformation):
_name_ = "my_custom_transform"
def execute(self, data):
return data * 2
feature_a:
description: "Raw feature A"
data_type: "int32"
group: "training"
transformation:
_target_: ().my_custom_transform()
To compile and execute features:
import numpy as np
from feature_fabrica.core import FeatureManager
data = {
"feature_a": np.array([10.0], dtype=np.float32),
"feature_b": np.array([20.0], dtype=np.float32),
}
feature_manager = FeatureManager(
config_path="../examples", config_name="basic_features"
)
results = feature_manager.compute_features(data)
print(results["feature_c"]) # 0.5 * (10 + 20) = 15.0
print(results.feature_c) # 0.5 * (10 + 20) = 15.0
Track & trace Transformation Chains
import numpy as np
from feature_fabrica.core import FeatureManager
data = {
"feature_a": np.array([10.0], dtype=np.float32),
"feature_b": np.array([20.0], dtype=np.float32),
}
feature_manager = FeatureManager(
config_path="../examples", config_name="basic_features"
)
results = feature_manager.compute_features(data)
print(feature_manager.features.feature_c.get_transformation_chain())
# Transformation Chain: (Transformation: sum_fn, Value: 30.0 Time taken: 9.5367431640625e-07 seconds) -> (Transformation: scale_feature, Value: 15.0, Time taken: 9.5367431640625e-07 seconds)
Visualize Dependencies
from feature_fabrica.core import FeatureManager
feature_manager = FeatureManager(
config_path="../examples", config_name="basic_features"
)
feature_manager.get_visual_dependency_graph()
First, thank you for taking the time to contribute! 🎉 Contributions are essential to making Feature Fabrica a better library, and we truly appreciate your involvement.
The following is a set of guidelines for contributing to Feature Fabrica, including reporting bugs, adding new features, and improving documentation.
- NLP support
- Embeddings support
- Simplify UI
- Better visualizations/reports
-
Fork the repository to your own GitHub account by clicking the "Fork" button at the top of the page.
-
Clone your fork locally:
git clone https://github.com/your-username/feature-fabrica.git cd feature-fabrica
-
Set the original repository as a remote:
git remote add upstream https://github.com/cowana-ai/feature-fabrica.git
-
Before creating a new branch, ensure your
main
branch is up-to-date:git checkout main git pull upstream main
-
Create a new branch for your feature or bug fix:
git checkout -b feature/my-new-feature
-
Make your changes in this new branch.
If you discover a bug in Feature Fabrica, please open an issue on GitHub. Before submitting your report, please check if an issue already exists to avoid duplicates. Include the following details in your report:
- A clear and concise description of the bug.
- Steps to reproduce the issue.
- Expected behavior vs. actual behavior.
- If applicable, screenshots or code snippets.
We welcome suggestions to improve Feature Fabrica. Feel free to open an issue describing the enhancement. Please be as detailed as possible in describing:
- The feature you'd like to see.
- The reason it would be beneficial.
- Any potential drawbacks.