Federatedml includes implementation of many common machine learning algorithms as well as necessary utility tools. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:
-
FML Algorithms: Federated machine learning algorithms serving for DataIO, Data-preprocessing, feature engineering and modeling. More details are listed below.
-
Utilities: Tools that enable federated learning such as encryption tools, statistic modules, parameter definitions, and transfer variable autogenerator etc.
-
Framework: Kits and base models for developing new algorithm modules. Framework provides reusable functions to standardize modules and make them compact.
-
Secure Protocol: Provides multiple security protocols for more secure multi-party interaction calculations.
1. DataIO
This component is typically the first component of a modeling task. It will transform user-uploaded date into Instance object which can be used for the following components.
-
Corresponding module name: DataIO
-
Data Input: DTable, values are raw data.
-
Data Output: Transformed DTable, values are data instance define in federatedml/feature/instance.py
2. Intersect
Compute intersect data set of two parties without leakage of difference set information. Mainly used in hetero scenario task.
-
Corresponding module name: Intersection
-
Data Input: DTable
-
Data Output: DTable which keys are occurred in both parties.
Federated Sampling data so that its distribution become balance in each party.This module support both federated and standalone version
-
Corresponding module name: FederatedSample
-
Data Input: DTable
-
Data Output: the sampled data, supports both random and stratified sampling.
Module for feature scaling and standardization.
-
Corresponding module name: FeatureScale
-
Data Input: DTable, whose values are instances.
-
Data Output: Transformed DTable.
-
Model Output: Transform factors like min/max, mean/std.
With binning input data, calculates each column's iv and woe and transform data according to the binned information.
-
Corresponding module name: HeteroFeatureBinning
-
Data Input: DTable with y in guest and without y in host.
-
Data Output: Transformed DTable.
-
Model Output: iv/woe, split points, event counts, non-event counts etc. of each column.
Transfer a column into one-hot format.
- Corresponding module name: OneHotEncoder
- Data Input: Input DTable.
- Data Output: Transformed DTable with new headers.
- Model Output: Original header and feature values to new header map.
Provide 5 types of filters. Each filters can select columns according to user config.
- Corresponding module name: HeteroFeatureSelection
- Data Input: Input DTable.
- Model Input: If iv filters used, hetero_binning model is needed.
- Data Output: Transformed DTable with new headers and filtered data instance.
- Model Output: Whether left or not for each column.
8. Hetero-LR
Build hetero logistic regression module through multiple parties.
- Corresponding module name: HeteroLR
- Data Input: Input DTable.
- Model Output: Logistic Regression model.
9. Hetero-LinR
Build hetero linear regression module through multiple parties.
- Corresponding module name: HeteroLinR
- Data Input: Input DTable.
- Model Output: Linear Regression model.
10. Hetero-Poisson
Build hetero poisson regression module through multiple parties.
- Corresponding module name: HeteroPoisson
- Data Input: Input DTable.
- Model Output: Poisson Regression model.
11. Homo-LR
Build homo logistic regression module through multiple parties.
- Corresponding module name: HomoLR
- Data Input: Input DTable.
- Model Output: Logistic Regression model.
12. Homo-NN
Build homo neural network module through multiple parties.
- Corresponding module name: HomoNN
- Data Input: Input DTable.
- Model Output: Neural Network model.
Build hetero secure boosting model through multiple parties.
Corresponding module name: HeteroSecureBoost
- Data Input: DTable, values are instances.
- Model Output: SecureBoost Model, consists of model-meta and model-param
14. Evaluation
Output the model evaluation metrics for user.
- Corresponding module name: Evaluation
More available algorithms are coming soon.