You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been exploring data drift detection and have been wanting to test how good evidently is at determining how much a given dataset has drifted. However, my main concern right now is wondering how to generate drifted data in the first place, and how much to skew them, so that I can get evidently to detect how much drift was applied to them.
So let's say I have a tabular dataframe like this, where I want to drift just the feature of Age.
What are the types of ways to artificially create a drifted dataset from a given dataset?
What I've been doing is splitting it into 2 extreme ranges (e.g. one set of <50 Age and one set of >=50 Age), and then mixing the two datasets more and more to create "less" drift. But supposedly for tabular data would something simpler do the trick, such as applying a uniform difference to all the Ages of one dataset work? Applying a random noise to all of the Ages, the noise following some normal distribution? What other standard techniques could be used to apply drift in this manner, and of a degree that can be varied?
Thank you!
The text was updated successfully, but these errors were encountered:
Hi there,
I've been exploring data drift detection and have been wanting to test how good evidently is at determining how much a given dataset has drifted. However, my main concern right now is wondering how to generate drifted data in the first place, and how much to skew them, so that I can get evidently to detect how much drift was applied to them.
So let's say I have a tabular dataframe like this, where I want to drift just the feature of Age.
What are the types of ways to artificially create a drifted dataset from a given dataset?
What I've been doing is splitting it into 2 extreme ranges (e.g. one set of <50 Age and one set of >=50 Age), and then mixing the two datasets more and more to create "less" drift. But supposedly for tabular data would something simpler do the trick, such as applying a uniform difference to all the Ages of one dataset work? Applying a random noise to all of the Ages, the noise following some normal distribution? What other standard techniques could be used to apply drift in this manner, and of a degree that can be varied?
Thank you!
The text was updated successfully, but these errors were encountered: