Skip to content

A handy class to deal with a project's subdirectories and data & results files, specially CSVs.

Notifications You must be signed in to change notification settings

biocodices/project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project

Project is a handy Python class that will deal with your project's data and results subdirectories. Typical usage:

from project import Project

pj = Project('path/to/your/project/basedir')
# => results and data subdirectories will be created if they don't exist

If you move some data files to the data dir, you can list all, some or one:

pj.data_files()
# => All files under /data

pj.data_files(pattern='*.vcf')
# => Glob pattern for VCF files under /data

pj.data_files(regex=r'(vcf|bam)$')
# => Regex for VCF and BAM files under /data

pj.data_file('sample1.vcf')
# Returns the complete path to that data file

Whenever you need to write some data to results, you can use Project to easily get full filepaths:

path = pj.results_file('my_new_results.txt')
# => /home/juan/myproject/results/my_new_results.txt

with open(path, 'w') as f:
    f.write(some_new_results)

Project is specially handy when you need to dump a pandas DataFrame to a CSV or load a CSV file as a pandas DataFrame:

pj.dump_df(my_dataframe, 'results')
# => Will write a 'results.csv' under /results

pj.dump_df(other_dataframe, 'new_data', subdir='data')
# => Will write a 'new_data.csv' under /data

pj.dump_df(my_dataframe, 'results.tsv')
# => Specify '.tsv' to get a TSV file written instead of a CSV

pj.dump_df(my_dataframe, 'results', header=None, index=None)
# => Any extra keyword arguments will be passed to pandas.DataFrame.to_csv()

Project will try to JSONify fields when all non-null data belong to the same Python type, e.g. lists or dicts.

df = pd.DataFrame({'a': [[1, 2], [3, 4]], 'b': [[5, 6], [7, 8]]})
pj.dump_df(df, 'with_lists')
# => Will write "with_lists.csv" and jsonify the lists [1, 2], [3, 4], etc.

Next time you read that same CSV, Project will load the JSON fields and convert them back to Python objects.

df = pj.read_csv('with_lists')
# => Get a DataFrame with the JSON fields loaded back to Python objects.

df = pj.read_csv('some_data.csv', subdir='data', dtype={'colname': int})
# => Read CSV from another subdir. The extra keyword arguments (here, dtype)
#    are passed to pandas.read_csv()

Project also has read and dump utilities to conver a dataframe to JSON and to read the JSON back to a dataframe later:

pj.dump_df_as_json(my_dataframe, 'info')
# => Will write a 'info.json' under /results
pj.read_json_df('info')
# => Will read the 'info.json' previously saved in /results

Installation

git clone https://github.com/biocodices/project.git
cd project
python setup.py install

About

A handy class to deal with a project's subdirectories and data & results files, specially CSVs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published