deploy: 296f3f4

bigbio · Aug 26, 2024 · c683e2b · c683e2b
1 parent 7909223
commit c683e2b
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/index.html b/index.html
@@ -95,7 +95,7 @@ <h2 id="_abstract">2. Abstract</h2>
 <h2 id="_introduction">3. Introduction</h2>
 <div class="sectionbody">
 <div class="paragraph">
-<p>Many resources have emerged that provide raw or integrated proteomics data in the public domain. If these are valuable individually, their integration through re-analysis represents a huge asset for the community [1]. Unfortunately, proteomics experimental design and sample related information are often missing in public repositories or stored in very diverse ways and formats. For example, the CPTAC consortium (<a href="https://cptac-data-portal.georgetown.edu/" class="bare">https://cptac-data-portal.georgetown.edu/</a>) provides for every dataset a set of excel files with the information on each sample (e.g. <a href="https://cptac-data-portal.georgetown.edu/study-summary/S048" class="bare">https://cptac-data-portal.georgetown.edu/study-summary/S048</a>) including tumor size, origin, but also how every sample is related to a specific raw file (e.g. instrument configuration parameters). As a resource routinely re-analysing public datasets, ProteomicsDB, captures for each sample in the database a minimum number of properties to describe the sample and the related experimental protocol such as tissue, digestion method and instrument (e.g. <a href="https://www.proteomicsdb.org/#projects/4267/6228" class="bare">https://www.proteomicsdb.org/#projects/4267/6228</a>). Such heterogeneity often prevents data interpretation, reproducibility, and integration of data from different resources. This is why we propose a homogenous standard for proteomics metadata annotation. For every proteomics dataset we propose to capture at least three levels of metadata: (i) dataset description, (ii) the sample and data files related information; and (iii) the technical/proteomics specific information in standard data file formats (e.g. the PSI formats mzIdentML, mzML, or mzTab, among others).</p>
+<p>Many resources have emerged that provide raw or integrated proteomics data in the public domain. If these are valuable individually, their integration through re-analysis represents a huge asset for the community [1]. Unfortunately, proteomics experimental design and sample related information are often missing in public repositories or stored in very diverse ways and formats. For example, the CPTAC consortium (<a href="https://cptac-data-portal.georgetown.edu/" class="bare">https://cptac-data-portal.georgetown.edu/</a>) provides for every dataset a set of Excel files with the information on each sample (e.g. <a href="https://cptac-data-portal.georgetown.edu/study-summary/S048" class="bare">https://cptac-data-portal.georgetown.edu/study-summary/S048</a>) including tumor size, origin, but also how every sample is related to a specific raw file (e.g. instrument configuration parameters). As a resource routinely re-analysing public datasets, ProteomicsDB, captures for each sample in the database a minimum number of properties to describe the sample and the related experimental protocol such as tissue, digestion method and instrument (e.g. <a href="https://www.proteomicsdb.org/#projects/4267/6228" class="bare">https://www.proteomicsdb.org/#projects/4267/6228</a>). Such heterogeneity often prevents data interpretation, reproducibility, and integration of data from different resources. This is why we propose a homogenous standard for proteomics metadata annotation. For every proteomics dataset we propose to capture at least three levels of metadata: (i) dataset description, (ii) the sample and data files related information; and (iii) the technical/proteomics specific information in standard data file formats (e.g. the PSI formats mzIdentML, mzML, or mzTab, among others).</p>
 </div>
 <div class="paragraph">
 <p>The general description includes minimum information to describe the study overall: title, description, date of publication, type of experiment (e.g. <a href="http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD016060.0-1&amp;outputMode=XML" class="bare">http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD016060.0-1&amp;outputMode=XML</a>). The standard data files contain mostly the technical metadata associated with the dataset including search engine settings, scores, workflows, configuration files, but do not include information about the sample metadata and/or the experimental design. Currently, all ProteomeXchange partners mandate this information for each dataset. However, the information regarding the sample and its relation to the data files (<strong>Figure 1</strong>) is mostly missing [1].</p>
@@ -377,7 +377,7 @@ <h3 id="sdrf-file-standarization">8.2. SDRF-Proteomics values</h3>
 <p>Key=value representation (Human and Computer readable): The current representation aims to provide a mechanism to represent the complete information of the ontology/CV term including Accession, Name and other additional properties. In the key=value pair representation, the Value of the property is represented as an Object with multiple properties, where the key is one of the properties of the object and the value is the corresponding value for the particular key. An example of key value pairs is post-translational modification <a href="#ptms">Section 10.2.1</a></p>
 <div class="literalblock">
 <div class="content">
-<pre>NT=Glu-&gt;pyro-Glu; MT=fixed; PP=Anywhere;AC=Unimod:27; TA=E</pre>
+<pre>NT=Glu-&gt;pyro-Glu;MT=fixed;PP=Anywhere;AC=Unimod:27;TA=E</pre>
 </div>
 </div>
 </li>
@@ -1489,7 +1489,7 @@ <h2 id="_references">19. References</h2>
 </div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2024-08-26 09:05:13 UTC
+Last updated 2024-08-26 19:03:27 UTC
 </div>
 </div>
 </body>