Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation misleading about search with no parameters #27

Open
micahjsmith opened this issue Oct 15, 2020 · 0 comments · May be fixed by #30
Open

Documentation misleading about search with no parameters #27

micahjsmith opened this issue Oct 15, 2020 · 0 comments · May be fixed by #30

Comments

@micahjsmith
Copy link
Contributor

micahjsmith commented Oct 15, 2020

  • AutoBazaar version: 560730b
  • Python version: 3.7.7
  • Operating System (python -c 'import platform;print(platform.platform())'): Darwin-19.6.0-x86_64-i386-64bit

Description

Documentation has this claim:

For example if you want to search for the best

$ abz search -i /path/to/your/datasets/folder name_of_your_dataset

This will evaluate the default pipeline without performing additional tuning iteration on it.

This seems to be misleading, as running the search with no arguments actually evaluates 1000+ iterations before I killed it.

What I Did

$ time abz search 196_autoMpg
Using TensorFlow backend.
20201015192335979857 - Processing Datasets: ['196_autoMpg']
###############################
#### Searching 196_autoMpg ####
###############################
[15:23:37] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
<repeated 8000 times>
^C
###############################
#### Executing 196_autoMpg ####
###############################
[16:23:50] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Executing best pipeline ABPipeline({
    "primitives": [
        "mlprimitives.custom.feature_extraction.CategoricalEncoder",
        "sklearn.impute.SimpleImputer",
        "sklearn.preprocessing.RobustScaler",
        "xgboost.XGBRegressor"
    ],
    "init_params": {},
    "input_names": {},
    "output_names": {},
    "hyperparameters": {
        "mlprimitives.custom.feature_extraction.CategoricalEncoder#1": {
            "keep": false,
            "copy": true,
            "features": "auto",
            "max_unique_ratio": 0,
            "max_labels": 25
        },
        "sklearn.impute.SimpleImputer#1": {
            "missing_values": NaN,
            "fill_value": null,
            "verbose": false,
            "copy": true,
            "strategy": "median"
        },
        "sklearn.preprocessing.RobustScaler#1": {
            "quantile_range": [
                25.0,
                75.0
            ],
            "copy": true,
            "with_centering": true,
            "with_scaling": true
        },
        "xgboost.XGBRegressor#1": {
            "n_jobs": -1,
            "n_estimators": 617,
            "max_depth": 9,
            "learning_rate": 0.03240539972838852,
            "gamma": 0.27690923264683187,
            "min_child_weight": 5
        }
    },
    "tunable_hyperparameters": {
        "mlprimitives.custom.feature_extraction.CategoricalEncoder#1": {
            "max_labels": {
                "type": "int",
                "default": 0,
                "range": [
                    0,
                    100
                ]
            }
        },
        "sklearn.impute.SimpleImputer#1": {
            "strategy": {
                "type": "str",
                "default": "mean",
                "values": [
                    "mean",
                    "median",
                    "most_frequent",
                    "constant"
                ]
            }
        },
        "sklearn.preprocessing.RobustScaler#1": {
            "with_centering": {
                "description": "If True, center the data before scaling. This will cause transform to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory",
                "type": "bool",
                "default": true
            },
            "with_scaling": {
                "description": "If True, scale the data to interquartile range",
                "type": "bool",
                "default": true
            }
        },
        "xgboost.XGBRegressor#1": {
            "n_estimators": {
                "type": "int",
                "default": 100,
                "range": [
                    10,
                    1000
                ]
            },
            "max_depth": {
                "type": "int",
                "default": 3,
                "range": [
                    3,
                    10
                ]
            },
            "learning_rate": {
                "type": "float",
                "default": 0.1,
                "range": [
                    0,
                    1
                ]
            },
            "gamma": {
                "type": "float",
                "default": 0.1,
                "range": [
                    0,
                    1
                ]
            },
            "min_child_weight": {
                "type": "int",
                "default": 1,
                "range": [
                    1,
                    10
                ]
            }
        }
    },
    "outputs": {
        "default": [
            {
                "name": "y",
                "type": "array",
                "variable": "xgboost.XGBRegressor#1.y"
            }
        ]
    },
    "id": "e168ec26-31f0-4e78-a3a7-3ef18bf432c8",
    "name": "single_table/regression/default",
    "template": null,
    "loader": {
        "data_modality": "single_table",
        "task_type": "regression"
    },
    "score": 8.4004691556447,
    "rank": 8.400469155645126,
    "metric": "meanSquaredError"
})
#############################
#### Scoring 196_autoMpg ####
#############################
Score: 7.041906911649814
       predictions     targets
count   100.000000  100.000000
mean     23.589642   23.478000
std       7.581228    7.573446
min      10.351545   10.000000
25%      17.002141   17.375000
50%      24.067155   23.250000
75%      29.522121   28.000000
max      38.241291   44.000000
                                         pipeline     score      rank  cv_score            metric data_modality   task_type task_subtype     elapsed  iterations  load_time  trivial_time      cv_time error  step
dataset
196_autoMpg  e168ec26-31f0-4e78-a3a7-3ef18bf432c8  7.041907  8.400469  8.400469  meanSquaredError  single_table  regression   univariate  3613.11274      1693.0   0.059046      1.091654  3307.688052  None  None

real    60m17.985s
user    61m12.325s
sys     50m16.661s
@micahjsmith micahjsmith linked a pull request Jun 25, 2021 that will close this issue
micahjsmith added a commit to micahjsmith/AutoBazaar that referenced this issue Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant