Merge pull request #2 from Noble-Lab/updates-v0.0.2

Minor Updates to results class and documentation
Noble-Lab · Dec 16, 2020 · c83c5fc · c83c5fc
2 parents 1491f3a + f7c3d5c
commit c83c5fc
Show file tree

Hide file tree

Showing 9 changed files with 8,645 additions and 47 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
-<img src="https://github.com/Noble-Lab/crema/blob/master/docs/_static/crema_logo_caramel_light.png" width=300>  
-
+<img src="https://raw.githubusercontent.com/Noble-Lab/crema/master/docs/static/crema_logo.svg" width=300>
+ 
 ---
 
 Confidence Estimation for Mass Spectrometry Proteomics

diff --git a/crema/__pycache__/crema.cpython-37.pyc b/crema/__pycache__/crema.cpython-37.pyc
diff --git a/crema/__pycache__/parsers.cpython-37.pyc b/crema/__pycache__/parsers.cpython-37.pyc
diff --git a/crema/__pycache__/result.cpython-37.pyc b/crema/__pycache__/result.cpython-37.pyc
diff --git a/crema/crema.py b/crema/crema.py
@@ -19,11 +19,9 @@ def main():
     params = Params().parser
     args = params.parse_args()
 
-    # Set up output and logging files
-    out_file = "crema.psm_results.txt"
+    # Set up logging files
     log_file = "crema.logfile.log"
     if args.file_root is not None:
-        out_file = args.file_root + out_file
         log_file = args.file_root + log_file
     if args.output_dir is None:
         args.output_dir = os.getcwd()
@@ -56,7 +54,7 @@ def main():
 
     # Write result to file
     logging.info("Writing to file...")
-    result.write_csv(os.path.join(args.output_dir, out_file))
+    result.write_file(output_dir=args.output_dir, file_root=args.file_root)
 
     # Calculate how long the confidence estimation took
     end_time = time.time()

diff --git a/crema/parsers.py b/crema/parsers.py
@@ -19,13 +19,13 @@ def read_file(
     Parameters
     ----------
     input_files : str or tuple of str
-        one or more tab-delimited file(s) to read
-    spectrum_col : str or tuple of str
-        one or more column names that identify the psm
-    score_col : str
-        name of the column that defines the scores (p-values) of the psms
-    target_col : str
-        name of the column that indicates if a psm is a target/decoy
+        One or more tab-delimited file(s) to read
+    spectrum_col : str or tuple of str, optional
+        One or more column names that identify the psm. Defaults to "scan".
+    score_col : str, optional
+        Name of the column that defines the scores (p-values) of the psms. Defaults to "combined p-value".
+    target_col : str, optional
+        Name of the column that indicates if a psm is a target/decoy. Defaults to "target/decoy".
 
     Returns
     -------

diff --git a/crema/result.py b/crema/result.py
@@ -3,6 +3,8 @@
 False Discovery Rates and Q-Values.
 """
 
+import os
+
 
 class Result:
     """
@@ -46,6 +48,27 @@ def get_col(self, col_name):
         """The column specified by the col_name as a  :py:class:`pandas.DataFrame`."""
         return self._data.loc[:, col_name]
 
-    def write_csv(self, filepath):
-        """Exports the data as a CSV file to the specified filepath"""
-        return self.data.to_csv(filepath)
+    def write_file(self, output_dir=None, file_root=None):
+        """
+        Exports the data as a .txt file with the suffix "crema.psm_results.txt".
+
+        Parameters
+        ----------
+        output_dir : str, optional
+            The directory in which to save the files. Defaults to the current working directory if not specified.
+        file_root : str, optional
+            A prefix concatenated to the output result file. Defaults to none.
+
+        Returns
+        -------
+        str
+            The file path to the exported results file
+        """
+        out_file = "crema.psm_results.txt"
+        if output_dir is None:
+            output_dir = os.getcwd()
+        if file_root is not None:
+            out_file = file_root + out_file
+        file_path = os.path.join(output_dir, out_file)
+        self.data.to_csv(file_path)
+        return file_path
diff --git a/docs/index.rst b/docs/index.rst
@@ -19,22 +19,22 @@
 Getting Started
 ---------------
 **crema** produces confidence estimates for peptide detection in mass spectrometry proteomics experiments.
-It takes files holding data regarding peptide-spectrum matches (PSMs) as input, executes the
-desired estimation method, and produces confidence estimates of the PSMs as output.
+It takes as input files holding data regarding peptide-spectrum matches (PSMs), executes the
+desired estimation method, and produces as ouptut confidence estimates of the PSMs.
 
 Introduction
 ------------
-One of the fundamental tasks in proteomics is detecting peptides from mass spectra that we identify through
-Mass Spectrometry. We have tools that assign peptides to spectra, but unfortunately this matching is not 100% accurate -
-meaning there is uncertainty about whether Peptide Spectrum Matches are real or false positives. We want to be able to
-quantify this uncertainty so that we can be confident in our data and what it represents because this will further
-ensure that expensive proteomics experiments use relevant and accurate data.
-
-**crema** is a Python package that implements various methods to estimate false discovery rates (FDR) of peptide
-detection in mass spectrometry proteomics experiments. Although there are many ways to estimate FDR, crema focuses on
-methods that rely on the concept of target decoy competition. The sole purposes of crema is to do this, and to do this
-well. As a result, we developed crema to be lightweight and flexible. It has very minimal dependencies and supports a
-wide range of input and output formats. On top of that, it is extremely simple to use.
+One of the fundamental tasks in mass spectrometry proteomics is detecting peptides on the basis of the observed mass
+spectra. Many tools exist to assign peptides to spectra, but unfortunately this matching is never 100% accurate,
+meaning that there is uncertainty about whether a given PSM is correct or a false positive. We want to be able to
+quantify this uncertainty so that we can be confident in our conclusions and ensure that expensive
+downstream validation experiments use relevant and accurate data.
+
+crema is a Python package that implements various methods to estimate false discovery rates (FDR)
+in mass spectrometry proteomics experiments. crema focuses on
+methods that rely on the concept of "target-decoy competition." The sole purposes of crema is to do decoy-based FDR
+estimation, and to do it well. As a result, crema is lightweight and flexible. It has minimal dependencies and
+supports a wide range of input and output formats. On top of that, it is extremely simple to use.
 
 Ready to try crema for your analyses? See below for details on how to install and use crema.
 
@@ -76,21 +76,21 @@ Simple crema analyses can be performed from the command line:
    $ crema data/single_basic.csv
 
 That's it. Giving crema nothing but the input file will force it to search for
-three specific column names: "combined p-value", "scan", and "target/decoy". It will then run the
-Target-Decoy Competition FDR method using the information from these columns
+three specific column names: "combined p-value," "scan," and "target/decoy". The software will then run the
+target-decoy competition FDR estimation method using the information from these columns
 to calculate confidence estimates for the given data.
 
 Your results will be saved in your working directory as a
-csv file named `crema.psm_results.txt`. This file will contain two additional columns
-(False Discovery Rate and Q-Value) that are
+csv file named "crema.psm_results.txt". This file will contain two additional columns
+("false discovery rate" and "q-value") that are
 appended to the initial few columns specified from the input file.
 
 For a full list of parameters, see the :doc:`Command Line Interface <cli>`.
 
 Use **crema** as a Python package
 ###################################
 
-Here's a simple demonstration of how to use crema as an API:
+Here is a simple demonstration of how to use crema as an API:
 
 .. code-block:: Python
 
@@ -99,7 +99,7 @@ Here's a simple demonstration of how to use crema as an API:
    >>> results = crema.calculate_tdc(psms)
    >>> results.write_csv("save_to_here.txt")
 
-Let's break this down and see what's really happening!
+Let's break this down and see what's really happening.
 
 
 First, start up the Python interpreter:
@@ -114,33 +114,34 @@ Next, import crema as a package:
 
    >>> import crema
 
-Call the read_file method and pass in the desired input files. In this example,
-the files "data/multi_target.csv" and "data/multi_decoy.csv" are already in crux
-format. Thus we do not need to specify non-default column names.
-This will return a dataset object that we will save as "psms" in this example:
+Call the :doc:`read_file() <api/functions>` method and pass in the desired input files. In this example,
+the files "data/multi_target.csv" and "data/multi_decoy.csv" are already in the required
+format. Thus we do not need to specify column names.
+The :doc:`read_file() <api/functions>` method will return a :doc:`dataset <api/dataset>` object that we will save as
+"psms" in this example:
 
 .. code-block:: Python
 
    >>> psms = crema.read_file(["data/multi_target.csv", "data/multi_decoy.csv"])
 
-Execute the desired FDR estimation method by calling the "calculate_[algorithm]" method and
-passing in the dataset object that we created above. This will return a result object that
-we will save as "results" in this example:
+Execute the desired FDR estimation method by calling the :doc:`calculate_[algorithm] <api/functions>` method and
+passing in the dataset object that we created above. This operation will return a :doc:`result <api/result>` object that
+we will save as "results":
 
 .. code-block:: Python
 
    >>> results = crema.calculate_tdc(psms)
 
-Result objects contain a "write_csv" method that allows you to write your result to a csv file.
+Result objects contain a :doc:`write_file() <api/result>` method that allows you to write your result to a csv file.
 Your results will be saved in your working directory (unless otherwise specified) as a
 csv file named by the parameter you pass when calling the method.
 This file will contain two additional columns
-(False Discovery Rate and Q-Value) that are
+("false discovery rate" and "q-value") that are
 appended to the initial few columns specified from the input file.
 
 .. code-block:: Python
 
-   >>> results.write_csv("save_to_here.txt")
+   >>> results.write_file("save_to_here.txt")
 
-That's all there is to it! You've successfully used crema as an API to
-calculate confidence estimates for your data!
+That's all there is to it! You have successfully used crema as an API to
+calculate confidence estimates for your data.