A small library for creating pandas DataFrame fixtures.
This library generates pandas dataframes with prescribed columns and types, and filled up rows. It can therefore be used to generate arbitrary data for fixtures to be used in unit tests.
Dataframer allows you, by passing a dictionary mapping column names to data typesand data, to generate a fixture dataframe.
Data types are to be passed as strings:
'timestamp'
: np.datetime64 with minute precision.'date'
: np.datetime64 with day precision.'int'
: np.int64.'float'
: np.float64.'str'
: strings.'constant_str'
: a column of a single repeated constant string.'constant_int'
: a column of a single repeated constant integer.'enum'
: a column of values ranging from 0 to a small integer.
After fixing a numpy random seed, the generated fixture is constant and can be used for testing purposes.
If no parameters are passed, a dataframe with a single column named 'id'
and containing integers is created.
from dataframer import DataFrameMaker
maker = DataFrameMaker(seed=1) # seed fixes the numpy random seed.
df = maker.make_df(nrows=5)
yields
index | id |
---|---|
0 | 98539 |
1 | 77708 |
2 | 5192 |
3 | 98047 |
4 | 50057 |
In order to use any of the supported types, pass them as a dictionary as follows.
from dataframer import DataFrameMaker
columns = {
'a': 'str',
'b': 'float',
'c': 'int'
}
maker = DataFrameMaker(seed=1)
df = maker.make_df(nrows=3, cols=columns)
a | b | c |
---|---|---|
LRmijlfpaqbmhT | 1.624345 | 98539 |
8gzYuLsul8QCDo | -0.611756 | 77708 |
YexxPX3EGwnPjh | -0.528172 | 5192 |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.