Why different handling between GFF and mzml/genbank in polars. #93

tshauck · 2024-02-05T19:22:37Z

All have maps, but GFF is converted ok into a polars df, but not the latter to.

Genbank: List of structs... overall quiet complex :/ https://github.com/wheretrue/exon/blob/b2726a591294ff9b0b45e6632e28727eaa9bc1ce/exon/exon-genbank/src/config.rs#L22 .. maybe provide example unnest if possible
mzml: map of structs... https://github.com/wheretrue/exon/blob/b2726a591294ff9b0b45e6632e28727eaa9bc1ce/exon/exon-mzml/src/config.rs#L92-L184
GFF: map of string lists... https://github.com/wheretrue/exon/blob/b2726a591294ff9b0b45e6632e28727eaa9bc1ce/exon/exon-gff/src/config.rs#L67-L94

abearab · 2024-03-09T08:10:34Z

@tshauck have you seen https://github.com/BiocPy?

tshauck · 2024-03-11T17:10:44Z

@abearab I've seen it, but not used it. I am a fan of those packages in R (e.g. granges). I also know there's some other analogues in Python for some of those bioconductor packages (e.g. AnnData).

abearab · 2024-03-11T22:14:17Z

I've seen it, but not used it. I am a fan of those packages in R (e.g. granges). I also know there's some other analogues in Python for some of those bioconductor packages (e.g. AnnData).

Yeah, I do like AnnData and scverse ecosystem a lot. However, a good implementation of granges for python has been missing for long time! I thought considering this granges can be very relevant to your implementation of annotation file formats (i.e. GTF, GFF, BED, BAM, etc.). This is just another suggestion, feel free to ignore it :)

tshauck · 2024-03-12T04:08:14Z

100% -- you can actually do a little of granges stuff via SQL joins, but it's not quite as intuitive or as specialized to genomic intervals. E.g. say you ran bakta, and wanted to get CDSs where a spacer annotation is within 100bp of a CDS

WITH cds AS (
  SELECT *
  FROM gff_scan('bakta.gff')
  WHERE type = 'cds'
), spacers AS (
  SELECT *
  FROM gff_scan('bakta.gff')
  WHERE type = 'crispr-spacer'
)

SELECT *
FROM cds
  JOIN spacers
    ON spacers.start > (cds.start - 100) OR spacers.end < (cds.end - 100)

I would certainly like to make it easier to do more complex granges stuff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why different handling between GFF and mzml/genbank in polars. #93

Why different handling between GFF and mzml/genbank in polars. #93

tshauck commented Feb 5, 2024 •

edited

Loading

abearab commented Mar 9, 2024

tshauck commented Mar 11, 2024

abearab commented Mar 11, 2024

tshauck commented Mar 12, 2024 •

edited

Loading

Why different handling between GFF and mzml/genbank in polars. #93

Why different handling between GFF and mzml/genbank in polars. #93

Comments

tshauck commented Feb 5, 2024 • edited Loading

abearab commented Mar 9, 2024

tshauck commented Mar 11, 2024

abearab commented Mar 11, 2024

tshauck commented Mar 12, 2024 • edited Loading

tshauck commented Feb 5, 2024 •

edited

Loading

tshauck commented Mar 12, 2024 •

edited

Loading