Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probability classification report - does not show the entire size of data #240

Open
Guidosalimbeni opened this issue May 29, 2022 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@Guidosalimbeni
Copy link

Hello,
have been investigating your great library. However, after updating the tool and running a classification report I am noticing that the count of rows in the report is inconsistent with the data used for the calculation.
I am not in the position of sharing the error but I hope is a quick thing to check on your side?
I tried all the possible changes and debugging. I know for sure that the reference data has 2500 rows but the report only shows 480 records. Really not sure what else to check and any helps would be really appreciated.

@emeli-dral emeli-dral added the bug Something isn't working label May 30, 2022
@emeli-dral
Copy link
Contributor

Hi @Guidosalimbeni ,
thank for sharing, we will try to figure it out.

I have quick questions:

  • Do I got it right, that this bug appeared in the latest version, and in the older one everything worked correctly? Or you built the dashboard in the latest version only? This will allow us to understand a little faster what the problem might come from.
  • For some reports we filter out rows with nan values, it might be the reason of the problem here. Could you please check the amount of rows with at least one nan value: df.isna().any(axis=1).sum() ? May it be, that there are 2020 rows with nan values?

@danieljmv01
Copy link

Hi @Guidosalimbeni,

In case it helps, these might be related: #241 and #242

Do the data include columns that have nans or np.inf values (even if they are not the target/prediction columns)?
Does the count change if used with a dataset with only the target and/or prediction columns?

@Guidosalimbeni
Copy link
Author

Thanks, @emeli-dral and @danieljmv01

  1. I have noticed it only in the new version but it might be that I did not notice the error in the previous version. Apologies I am not a great help here.
  2. I feel the point on Null values might likely be the issue. Let me test on Monday and I will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants