Skip to content
/ MSFTBX Public
forked from chhh/MSFTBX

MS File ToolBox - tools for parsing some mass-spectrometry related file formats (mzML, mzXML, pep.xml, prot.xml, etc.)

License

Notifications You must be signed in to change notification settings

KaiLiCn/MSFTBX

 
 

Repository files navigation

MSFTBX

The acronym stands for Mass Spectrometry File Toolbox. This is a library for access to some common mass-spectrometry/proteomics data formats from Java:

  • mzML
  • mzXML
  • pepXML/pep.xml
  • protXML/prot.xml
  • mzIdentML
  • cef (Agilent)
  • GPMdb XML

This library is what drives BatMass.

Maven dependency

<dependency>
    <groupId>com.github.chhh</groupId>
    <artifactId>msftbx</artifactId>
    <version>1.3.1</version>
</dependency>

How to use

To get started quickly, follow the tutorial: http://www.batmass.org/tutorial/data-access-layer/#parsing-lc-ms-data-mzml-mzxml-files

Features

  • Parsers for mzML/mzXML with unified API
    • Very fast, multi-threaded
    • Rich standardized API for contents of those files (scan and run meta-info, not just spectra).
    • msNumpress compression support for mzML
    • Automated LC/MS run structure determination:
      • Data structures for parent-child relationship between spectra
      • Indexes for scans based on scan numbers, retention times both globally and for each MS level separately
      • Convenient methods to get next-previous scans at the same MS level
    • Tolerant to malformed data
      • Can handle MS2 scan tags nested inside MS1 scans
      • Tolerant to missing or broken file index
      • Reindexing on the fly
    • Memory management
      • Automated spectra parsing on demand
        • You can parse just the structure of an LC/MS run without the spectral data, the memory footprint in this case will be very small. Only when spectra are requested will they be parsed.
        • Soft referencing of spectral data for GC
      • Tracking of which loaded data is not being used by any components with automated unloading.
  • Upcoming support for Thermo RAW files on Windows
  • pepXML parser and writer
  • protXML parser and writer
  • mzIdentML parser
  • GPMdb XML files parser
  • Agilent .cef files parser

Binary distribution

Get jars from Maven Central.
Some older pre-compiled binaries can be found here.

Building with Maven (preferred)

cd ./MSFileToolbox && mvn clean package
Will produce the jar files with just the library msftbx-X.X.X.jar as well as one large jar msftbx-X.X.X-jar-with-dependencies.jar. The latter can be used as is, it includes all the needed dependencies.

Building a NetBeans Platform module

NetBeans Module: Open the root directory in NetBeans as a project. You will see MSFTBX module suite which consists of 3 modules: MSFileToolbox Module - (this is the main thing), MSFileToolbox Libx - these are the depencies, and Auto Update (MSFTBX) - this is the update center for NetBeans Platform projects (you definitely don't need this) .

Dependencies

  • SLF4J
  • Google Guava
  • Apache Commons Pool 2
  • OboParser from Biojava's submodule Ontology
  • Javolution Core (slightly modified, sources are here, this modified dependency is published on Maven Central)

About

MS File ToolBox - tools for parsing some mass-spectrometry related file formats (mzML, mzXML, pep.xml, prot.xml, etc.)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%