Skip to content

Commit

Permalink
Merge pull request #2 from Noble-Lab/updates-v0.0.2
Browse files Browse the repository at this point in the history
Minor Updates to results class and documentation
  • Loading branch information
donnyyy777 authored Dec 16, 2020
2 parents 1491f3a + f7c3d5c commit c83c5fc
Show file tree
Hide file tree
Showing 9 changed files with 8,645 additions and 47 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<img src="https://github.com/Noble-Lab/crema/blob/master/docs/_static/crema_logo_caramel_light.png" width=300>

<img src="https://raw.githubusercontent.com/Noble-Lab/crema/master/docs/static/crema_logo.svg" width=300>
---

Confidence Estimation for Mass Spectrometry Proteomics
Expand Down
Binary file modified crema/__pycache__/crema.cpython-37.pyc
Binary file not shown.
Binary file modified crema/__pycache__/parsers.cpython-37.pyc
Binary file not shown.
Binary file modified crema/__pycache__/result.cpython-37.pyc
Binary file not shown.
6 changes: 2 additions & 4 deletions crema/crema.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,9 @@ def main():
params = Params().parser
args = params.parse_args()

# Set up output and logging files
out_file = "crema.psm_results.txt"
# Set up logging files
log_file = "crema.logfile.log"
if args.file_root is not None:
out_file = args.file_root + out_file
log_file = args.file_root + log_file
if args.output_dir is None:
args.output_dir = os.getcwd()
Expand Down Expand Up @@ -56,7 +54,7 @@ def main():

# Write result to file
logging.info("Writing to file...")
result.write_csv(os.path.join(args.output_dir, out_file))
result.write_file(output_dir=args.output_dir, file_root=args.file_root)

# Calculate how long the confidence estimation took
end_time = time.time()
Expand Down
14 changes: 7 additions & 7 deletions crema/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ def read_file(
Parameters
----------
input_files : str or tuple of str
one or more tab-delimited file(s) to read
spectrum_col : str or tuple of str
one or more column names that identify the psm
score_col : str
name of the column that defines the scores (p-values) of the psms
target_col : str
name of the column that indicates if a psm is a target/decoy
One or more tab-delimited file(s) to read
spectrum_col : str or tuple of str, optional
One or more column names that identify the psm. Defaults to "scan".
score_col : str, optional
Name of the column that defines the scores (p-values) of the psms. Defaults to "combined p-value".
target_col : str, optional
Name of the column that indicates if a psm is a target/decoy. Defaults to "target/decoy".
Returns
-------
Expand Down
29 changes: 26 additions & 3 deletions crema/result.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
False Discovery Rates and Q-Values.
"""

import os


class Result:
"""
Expand Down Expand Up @@ -46,6 +48,27 @@ def get_col(self, col_name):
"""The column specified by the col_name as a :py:class:`pandas.DataFrame`."""
return self._data.loc[:, col_name]

def write_csv(self, filepath):
"""Exports the data as a CSV file to the specified filepath"""
return self.data.to_csv(filepath)
def write_file(self, output_dir=None, file_root=None):
"""
Exports the data as a .txt file with the suffix "crema.psm_results.txt".
Parameters
----------
output_dir : str, optional
The directory in which to save the files. Defaults to the current working directory if not specified.
file_root : str, optional
A prefix concatenated to the output result file. Defaults to none.
Returns
-------
str
The file path to the exported results file
"""
out_file = "crema.psm_results.txt"
if output_dir is None:
output_dir = os.getcwd()
if file_root is not None:
out_file = file_root + out_file
file_path = os.path.join(output_dir, out_file)
self.data.to_csv(file_path)
return file_path
63 changes: 32 additions & 31 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,22 @@
Getting Started
---------------
**crema** produces confidence estimates for peptide detection in mass spectrometry proteomics experiments.
It takes files holding data regarding peptide-spectrum matches (PSMs) as input, executes the
desired estimation method, and produces confidence estimates of the PSMs as output.
It takes as input files holding data regarding peptide-spectrum matches (PSMs), executes the
desired estimation method, and produces as ouptut confidence estimates of the PSMs.

Introduction
------------
One of the fundamental tasks in proteomics is detecting peptides from mass spectra that we identify through
Mass Spectrometry. We have tools that assign peptides to spectra, but unfortunately this matching is not 100% accurate -
meaning there is uncertainty about whether Peptide Spectrum Matches are real or false positives. We want to be able to
quantify this uncertainty so that we can be confident in our data and what it represents because this will further
ensure that expensive proteomics experiments use relevant and accurate data.

**crema** is a Python package that implements various methods to estimate false discovery rates (FDR) of peptide
detection in mass spectrometry proteomics experiments. Although there are many ways to estimate FDR, crema focuses on
methods that rely on the concept of target decoy competition. The sole purposes of crema is to do this, and to do this
well. As a result, we developed crema to be lightweight and flexible. It has very minimal dependencies and supports a
wide range of input and output formats. On top of that, it is extremely simple to use.
One of the fundamental tasks in mass spectrometry proteomics is detecting peptides on the basis of the observed mass
spectra. Many tools exist to assign peptides to spectra, but unfortunately this matching is never 100% accurate,
meaning that there is uncertainty about whether a given PSM is correct or a false positive. We want to be able to
quantify this uncertainty so that we can be confident in our conclusions and ensure that expensive
downstream validation experiments use relevant and accurate data.

crema is a Python package that implements various methods to estimate false discovery rates (FDR)
in mass spectrometry proteomics experiments. crema focuses on
methods that rely on the concept of "target-decoy competition." The sole purposes of crema is to do decoy-based FDR
estimation, and to do it well. As a result, crema is lightweight and flexible. It has minimal dependencies and
supports a wide range of input and output formats. On top of that, it is extremely simple to use.

Ready to try crema for your analyses? See below for details on how to install and use crema.

Expand Down Expand Up @@ -76,21 +76,21 @@ Simple crema analyses can be performed from the command line:
$ crema data/single_basic.csv
That's it. Giving crema nothing but the input file will force it to search for
three specific column names: "combined p-value", "scan", and "target/decoy". It will then run the
Target-Decoy Competition FDR method using the information from these columns
three specific column names: "combined p-value," "scan," and "target/decoy". The software will then run the
target-decoy competition FDR estimation method using the information from these columns
to calculate confidence estimates for the given data.

Your results will be saved in your working directory as a
csv file named `crema.psm_results.txt`. This file will contain two additional columns
(False Discovery Rate and Q-Value) that are
csv file named "crema.psm_results.txt". This file will contain two additional columns
("false discovery rate" and "q-value") that are
appended to the initial few columns specified from the input file.

For a full list of parameters, see the :doc:`Command Line Interface <cli>`.

Use **crema** as a Python package
###################################

Here's a simple demonstration of how to use crema as an API:
Here is a simple demonstration of how to use crema as an API:

.. code-block:: Python
Expand All @@ -99,7 +99,7 @@ Here's a simple demonstration of how to use crema as an API:
>>> results = crema.calculate_tdc(psms)
>>> results.write_csv("save_to_here.txt")
Let's break this down and see what's really happening!
Let's break this down and see what's really happening.


First, start up the Python interpreter:
Expand All @@ -114,33 +114,34 @@ Next, import crema as a package:
>>> import crema
Call the read_file method and pass in the desired input files. In this example,
the files "data/multi_target.csv" and "data/multi_decoy.csv" are already in crux
format. Thus we do not need to specify non-default column names.
This will return a dataset object that we will save as "psms" in this example:
Call the :doc:`read_file() <api/functions>` method and pass in the desired input files. In this example,
the files "data/multi_target.csv" and "data/multi_decoy.csv" are already in the required
format. Thus we do not need to specify column names.
The :doc:`read_file() <api/functions>` method will return a :doc:`dataset <api/dataset>` object that we will save as
"psms" in this example:

.. code-block:: Python
>>> psms = crema.read_file(["data/multi_target.csv", "data/multi_decoy.csv"])
Execute the desired FDR estimation method by calling the "calculate_[algorithm]" method and
passing in the dataset object that we created above. This will return a result object that
we will save as "results" in this example:
Execute the desired FDR estimation method by calling the :doc:`calculate_[algorithm] <api/functions>` method and
passing in the dataset object that we created above. This operation will return a :doc:`result <api/result>` object that
we will save as "results":

.. code-block:: Python
>>> results = crema.calculate_tdc(psms)
Result objects contain a "write_csv" method that allows you to write your result to a csv file.
Result objects contain a :doc:`write_file() <api/result>` method that allows you to write your result to a csv file.
Your results will be saved in your working directory (unless otherwise specified) as a
csv file named by the parameter you pass when calling the method.
This file will contain two additional columns
(False Discovery Rate and Q-Value) that are
("false discovery rate" and "q-value") that are
appended to the initial few columns specified from the input file.

.. code-block:: Python
>>> results.write_csv("save_to_here.txt")
>>> results.write_file("save_to_here.txt")
That's all there is to it! You've successfully used crema as an API to
calculate confidence estimates for your data!
That's all there is to it! You have successfully used crema as an API to
calculate confidence estimates for your data.
Loading

0 comments on commit c83c5fc

Please sign in to comment.