Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cross-validation): Split time-related results into their own plots #986

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

augustebaum
Copy link
Contributor

@augustebaum augustebaum commented Dec 18, 2024

Closes #902

Screen-record:

2024-12-18T15_00_29_screen_record.mp4
  • fix(cross-validation): Remove "estimator" key even if "indices" is not present
  • feat(cross-validation): Remove times from plot
  • test: Test for CrossValidationItem's plots attribute
  • feat: Group CrossValidationReporter plots into a single attribute
  • feat: Add timing results plot function
  • feat: Integrate timing results plot function to CrossValidationReporter
  • feat: Add normalized timing results plot function
  • feat: Integrate normalized timing results plot function to CrossValidationReporter
  • refactor: Rename plot_cross_validation function
  • refactor: Move cross-validation plots to their own module

Copy link
Contributor

github-actions bot commented Dec 18, 2024

Coverage

pytest coverage report
FileStmtsMissCoverMissing
src/skore
   __init__.py180100% 
   __main__.py811 80%
   exceptions.py40100% 
src/skore/cli
   __init__.py80100% 
   cli.py320100% 
   launch_dashboard.py22120 42%
   quickstart_command.py1220 83%
src/skore/item
   __init__.py210100% 
   cross_validation_item.py6322 95%
   item.py2210 95%
   item_repository.py4221 93%
   media_item.py6041 93%
   numpy_array_item.py2111 92%
   pandas_dataframe_item.py3011 94%
   pandas_series_item.py3011 94%
   polars_dataframe_item.py2811 94%
   polars_series_item.py2311 93%
   primitive_item.py2321 91%
   sklearn_base_estimator_item.py2911 94%
   skrub_table_report_item.py1011 86%
src/skore/persistence
   __init__.py00100% 
   abstract_storage.py2210 95%
   disk_cache_storage.py3311 95%
   in_memory_storage.py200100% 
src/skore/project
   __init__.py40100% 
   create.py5080 88%
   load.py2330 89%
   project.py6244 91%
src/skore/sklearn
   __init__.py30100% 
   find_ml_task.py1923 85%
   types.py20100% 
src/skore/sklearn/cross_validation
   __init__.py20100% 
   cross_validation_helpers.py4041 89%
   cross_validation_reporter.py3711 95%
src/skore/sklearn/cross_validation/plots
   __init__.py00100% 
   compare_scores_plot.py3112 92%
   timing_normalized_plot.py3111 95%
   timing_plot.py3111 95%
src/skore/sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py3421 94%
src/skore/sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py1732 78%
   high_class_imbalance_warning.py1821 88%
   random_state_unset_warning.py1111 87%
   shuffle_true_warning.py901 91%
   stratify_is_set_warning.py1111 87%
   time_based_column_warning.py2212 89%
   train_test_split_warning.py510 80%
src/skore/ui
   __init__.py00100% 
   app.py2552 71%
   dependencies.py710 86%
   project_routes.py12553 95%
src/skore/utils
   __init__.py00100% 
   _show_versions.py290100% 
src/skore/view
   __init__.py00100% 
   view.py50100% 
   view_repository.py1621 83%
TOTAL12588491% 

Tests Skipped Failures Errors Time
171 0 💤 0 ❌ 0 🔥 36.268s ⏱️

Copy link
Contributor

@rouk1 rouk1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, just a few feedbacks.

skore/src/skore/item/cross_validation_item.py Show resolved Hide resolved
skore/src/skore/item/cross_validation_item.py Outdated Show resolved Hide resolved
Comment on lines +46 to +48
def linspace(lo, hi, num):
interval = (hi - lo) / (num - 1)
return [lo + k * interval for k in range(0, num)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can use numpy.linspace here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did originally, but re-implementing it avoids bringing in numpy as a direct dependency for just 1 function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...although I see that we import numpy in cross_validation_item.py...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And numpy is a direct dependency of sklearn which is our direct dependency. ♻️

Comment on lines +44 to +46
def linspace(lo, hi, num):
interval = (hi - lo) / (num - 1)
return [lo + k * interval for k in range(0, num)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above could you use numpy.linspace ?

# FIXME: Maybe this logic belongs in CrossValidationPlots
plots_bytes = {
plot_name: plotly.io.to_json(plot, engine="json").encode("utf-8")
for plot_name, plot in dataclasses.asdict(plots).items()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the plot name be "humanized" ?
timing_normalized => Normalized timings ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean just in the front-end right? I mean, in their Python code user would still have to write

reporter.plots.timing_normalized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should the other plots be renamed? Maybe something like

  • compare_scores -> "Scores"
  • timing -> "Timings"
  • normalized_timing -> "Normalized timings"

@MarieS-WiMLDS Does this make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes only frontend side !

Copy link
Contributor

@MarieS-WiMLDS MarieS-WiMLDS Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it completely makes sense! I like that it's easy from the human name to find how to call it in python. With that logic, maybe we could even switch from compare_scores to just scores?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split the CrossValidationReporter plot to have scores in one side, and time on the other side
3 participants