The acronym stands for Mass Spectrometry File Toolbox
. This is a library for access to some common mass-spectrometry/proteomics data formats from Java:
- mzML
- mzXML
- pepXML/pep.xml
- protXML/prot.xml
- mzIdentML
- cef (Agilent)
- GPMdb XML
This library is what drives BatMass.
<dependency>
<groupId>com.github.chhh</groupId>
<artifactId>msftbx</artifactId>
<version>1.3.1</version>
</dependency>
To get started quickly, follow the tutorial: http://www.batmass.org/tutorial/data-access-layer/#parsing-lc-ms-data-mzml-mzxml-files
- Parsers for mzML/mzXML with unified API
- Very fast, multi-threaded
- Rich standardized API for contents of those files (scan and run meta-info, not just spectra).
- msNumpress compression support for mzML
- Automated LC/MS run structure determination:
- Data structures for parent-child relationship between spectra
- Indexes for scans based on scan numbers, retention times both globally and for each MS level separately
- Convenient methods to get next-previous scans at the same MS level
- Tolerant to malformed data
- Can handle MS2 scan tags nested inside MS1 scans
- Tolerant to missing or broken file index
- Reindexing on the fly
- Memory management
- Automated spectra parsing on demand
- You can parse just the structure of an LC/MS run without the spectral data, the memory footprint in this case will be very small. Only when spectra are requested will they be parsed.
- Soft referencing of spectral data for GC
- Tracking of which loaded data is not being used by any components with automated unloading.
- Automated spectra parsing on demand
- Upcoming support for Thermo RAW files on Windows
- pepXML parser and writer
- protXML parser and writer
- mzIdentML parser
- GPMdb XML files parser
- Agilent .cef files parser
Get jars from Maven Central.
Some older pre-compiled binaries can be found here.
cd ./MSFileToolbox && mvn clean package
Will produce the jar files with just the library msftbx-X.X.X.jar
as well as one large jar msftbx-X.X.X-jar-with-dependencies.jar
.
The latter can be used as is, it includes all the needed dependencies.
NetBeans Module: Open the root directory in NetBeans as a project. You will see MSFTBX
module suite which consists of 3 modules: MSFileToolbox Module - (this is the main thing), MSFileToolbox Libx - these are the depencies, and Auto Update (MSFTBX) - this is the update center for NetBeans Platform projects (you definitely don't need this) .
- SLF4J
- Google Guava
- Apache Commons Pool 2
- OboParser from Biojava's submodule Ontology
- Javolution Core (slightly modified, sources are here, this modified dependency is published on Maven Central)