drctrl

drctrl is a tool for automatically configuration for DataRobot. drctrl can manage features provided datarobot like building project, training, freezing, prediction.

python support: 3.6.x and greater

1. Installation

$ pip install drctrl

2. Get started

Setting up credential

$ cat << _EOF_ > ~/.config/datarobot/drconfig.yaml
token: <datarobot-user-token>
endpoint: <datarobot-api-endpoint>
_EOF_

# get all projects
$ drctrl get_projects

# get project detail
$ drctrl get_project <project_id>

# you can get the exist project configuration for drctrl with
$ drctrl get_project_setting <project_id>

Build a new project

Download Boston Housing dataset in UCI url.

# downloading dataset into `./data/raw/`
$ drctrl create_dataset

Setting up configure.yml

environment:
   project_id: # if null, build a new project
   project_name: 'sample_project'
   target_feature: target
   metric: 'RMSE' 
   cv_method: 'random'
   validation_type: 'CV' 
   validation_params:
      holdout_pct: 20
      #validation_pct: 10
      reps: 3    # number of cross validation folds to use
      seed: 2017 # a seed to use for randomization
   dataset:
       type: file
       path: './data/raw'
       filename: 'boston.csv'
   autopilot: 'manual' # fullauto, quick, manual
   convert_features:
      - {name: RAD, rename_to: RAD_categoricalInt, variable_type: categoricalInt}

fit:
   model_id:  # if None, run autopilot
   autopilot: 'fullauto'
   featurelist_name: 'without_feature'  # if already exist, current time string will be used
   source_featurelist: 'Raw Features'
   except_features:
      - 'NOX'
      
predict:
   model_id:  # if None, a model will be automatically selected 
   input:     # prediction target dataset
       type: 'file'
       path: './data/raw/'
       filename: 'boston.csv'
   reasoncode: True
   merge_origin: True 
   feature_impact: True
   output:   # output format
       type: 'file'
       path: './'
       filename: 'prediction.csv'

Run drctrl with configuration

$ drctrl apply configure.yml

Details of commands and options is here

3. Commands

Usage: drctrl [OPTIONS] COMMAND [ARGS]...

Options:
  --credential PATH
  --help             Show this message and exit.

Commands:
  apply                  Apply all commands in configuration file
  build                  building project on the basis of a configuration file
  create_dataset         download and install boston housing and iris dataset
  fit                    training model on the basis of a configuration file
  frozen                 freezing model on the basis of configuration file
  get_project            fetch the project detail
  get_project_setting    dump the project parameter as yaml file
  get_projects           fetch project details
  predict                predicting on the basis of a configuration file
  validate               validate configuration file

4. I/O format

There are several options for I/O format. redshift, file, url format can be specified as dataset param in environment, input / output param in predict for now.

Details are here

file format

environment:
   dataset:
      type: file
      path: /path/to/dataset
      filename: dataset.csv

or

predict:
   input:
      type: redshift
      aws_key_id: <aws_access_key_id>
      aws_secret_key: <aws_secret_access_key>
      bucket: <s3_bucket>
      key_path: <s3_key>
      dbname: <redshift_dbname>
      host: <redshift_host>
      port: <redshift_port>
      user: <redshift_user>
      password: <redshift_password>
      schema: <target_table_schema>
      table:  <target_table_name>
   output:
      type: redshift
      aws_key_id: <aws_access_key_id>
      aws_secret_key: <aws_secret_access_key>
      bucket: <s3_bucket>
      key_path: <s3_key>
      dbname: <redshift_dbname>
      host: <redshift_host>
      port: <redshift_port>
      user: <redshift_user>
      password: <redshift_password>
      schema: <target_table_schema>
      table:  <target_table_name>

and so on.

5. template

drctrl support Jinja2 template format. Configuration file have to satisfy file extention format .yml.tmple .

In tmpl file, env['FILE_PATH'] variable is replaced by environment variable FILE_PATH. The following is an example.

environment:
    project_id: {{ env.PROJECT_ID }}

predict:
    model_id: {{ env.MODEL_ID }}
    dataset:
      type: file
      path: {{ env['DATASET_PATH'] }}
      filename: {{ env['DATASET_FILE'] }}
    feature_impact: false
    reasoncode: false
    merge_origin: true

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
drctrl		drctrl
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt
sample.yml		sample.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

drctrl

1. Installation

2. Get started

Build a new project

3. Commands

4. I/O format

file format

5. template

About

Releases

Packages

Contributors 2

Languages

License

recruit-tech/drctrl

Folders and files

Latest commit

History

Repository files navigation

drctrl

1. Installation

2. Get started

Build a new project

3. Commands

4. I/O format

file format

5. template

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages