data_prism.py can be used to build discrete multivariate frequency/probability distribution, from large input datafiles (for example alignment files of nextgen sequence data), containing mixed continuous and discrete multivariate data (the continuous variables are binned)
data_prism.py contains a small kernel of methods. There are a number of examples included of using it with various different types of input such as tab-delimited text, fasta and fastq , bam files, vcf files
Examples
taxonomy_prism.py : summarises tabular blast data by taxonomy (kingdom and family), across multiple input files kmer_prism.py : summarises kmer distributions across multiple input files