Skip to content

Commit

Permalink
migrate v20131226 from Google Code
Browse files Browse the repository at this point in the history
  • Loading branch information
Yutaka Saito committed Dec 12, 2014
1 parent e5ebbaa commit 19426c0
Show file tree
Hide file tree
Showing 32 changed files with 2,391 additions and 1,067 deletions.
101 changes: 74 additions & 27 deletions ComMet/README
Original file line number Diff line number Diff line change
@@ -1,35 +1,40 @@
= ComMet: an HMM-based approach to detection of differentially methylated regions
== ComMet: an HMM-based approach to detection of differentially methylated regions

== Required libraries

Boost C++ libraries
http://www.boost.org/
==== Required libraries

Boost C++ libraries
http://www.boost.org/

== Build

tar xzvf ComMet-xxx.tar.gz
cd ComMet-xxx
make
==== Build

$ tar xzvf ComMet-xxx.tar.gz
$ cd ComMet-xxx
$ make

You will find the executable in src directory.


== Usage
==== Usage

For help,
$ ComMet --help

For help
ComMet --help
For test run,
$ ComMet example/example.in example/example.out1 example/example.out2

For test run
ComMet example/example.in example/example.out1 example/example.out2
For more output DMRs,
$ ComMet --threshold -5 example/example.in example/example.out1 example/example.out2

For more accurate, but slower DMR detection
ComMet --dual example/example.in example/example.out1 example/example.out2
ComMet --threshold -5 example/example.in example/example.out1 example/example.out2
ComMet --dual --threshold -5 example/example.in example/example.out1 example/example.out2
For more accurate DMRs,
$ ComMet --dual example/example.in example/example.out1 example/example.out2

Combining them,
$ ComMet --dual --threshold -5 example/example.in example/example.out1 example/example.out2

== Input format

==== Input format

See example/example.in

Expand All @@ -45,8 +50,16 @@ Col.| Description
reads supporting mC = C-C matches
reads not supporting mC = otherwise

Make sure chromosome names and genomic positions are sorted by "sort -k1,1 -k2,2n".

Note that input files do not contain strand information.
Normally, you should integrate both strands by summing the read counts at two neighbor CpGs,
i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand.
Alternatively, if you are interested in strand-specific DMRs, you can prepare two input files
for plus and minus strands, and apply them to ComMet separately.

== Output format

==== Output format

output1 contains information of differential methylation at individual cytosines.
See example/example.out1_
Expand All @@ -59,7 +72,7 @@ Col.| Description
4 | mC ratio in sample2
5 | prob. for hypermethylation (UP) in sample1 against sample2
6 | prob. for hypomethylation (DOWN) in sample1 against sample2
7 | prob. for no methylation change (NC) between sample1 and sample2
7 | prob. for no methylation change (NoCh) between sample1 and sample2

output2 contains information of detected DMRs.
See example/example.out2_
Expand All @@ -73,29 +86,63 @@ Col.| Description
5 | log-likelihood ratio score


== Bisulfighter
==== FAQ

Q. What is the meaning of the error "distance between neighbor CpGs must not be less than 2"?
A.
Your input file contains invalid genomic positions.
By definition of CpG, the base next to C must be G, and therefore two neighbor CpGs should be
separated by at least two bases. Your input file may violate this rule for several reasons.
First, the input file may contain two neighbor CpGs from different strands,
i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand.
See the "Input format" section above for this issue.
Second, the input file may contain cytosines in non-CpG context; just remove them.

Q. Does ComMet support DMRs in non-CpG context (CHG or CHH)?
A.
No. But we are planning to address this issue in the next version of ComMet.

Q. The read counts in the example input file are decimals rather than integers. Why?
A.
Either decimals or integers can be used for read counts in input files.
The reason that the example input file contains decimals is that some alignment tools produce
probability-weighted read counts. Of course, you can use your favorite aligners for preparing
input files that may contain integers only.


==== History

* version 1.0
- tuned the default parameters
- added the FAQ in README

* version 0.1
- implemented the algorithms described in [Saito et al., Nucleic Acids Res, 2013]


==== Bisulfighter

ComMet was first developed as the DMR detection module in Bisulfighter,
a software package for mC calling and DMR detection.
Compatibility between ComMet and the mC calling module in Bisulfighter is
kept by input-output transformation scripts in util directory.


== License
==== License

Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
http://creativecommons.org/licenses/by-nc-sa/3.0/


== References
==== References

Y. Saito, J. Tsuji, T. Mituyama,
Yutaka Saito, Junko Tsuji, and Toutai Mituyama,
Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions,
submitted.
Nucleic Acids Research, accepted for publication, 2013.


== Contact
==== Contact

Yutaka SAITO
Yutaka Saito
yutaka.saito AT aist.go.jp

8 changes: 5 additions & 3 deletions ComMet/src/ComMet.cc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include <boost/program_options.hpp>
#include <boost/bind.hpp>

#include "Utility.h"
#include "FrameworkComMet.h"
#include "NaiveModel.h"
#include "CGIModel.h"
Expand All @@ -23,6 +24,8 @@ namespace po = boost::program_options;
int
main(int argc, char** argv)
{
progress(whoami());

Options opts;

// parse command line options
Expand All @@ -44,10 +47,9 @@ main(int argc, char** argv)
po::notify(vm);

if (vm.count("help") || extra_args.size()<3) {
cout << "ComMet v0.1 - identification of differentially methylated regions (DMRs)" << endl
<< endl
cout << endl
<< "Usage:" << endl
<< " " << argv[0] << " [options] input output1 output2" << endl
<< argv[0] << " [options] input output1 output2" << endl
<< endl
<< desc << endl;
return 1;
Expand Down
17 changes: 4 additions & 13 deletions ComMet/src/FrameworkComMet.cc
Original file line number Diff line number Diff line change
Expand Up @@ -23,29 +23,20 @@ add_options(po::options_description& opts)
po::value<uint>(&nitr)->default_value(100),
"set the number of training iterations")
("separate",
po::value<uint>(&dsep)->default_value(100),
po::value<uint>(&dsep)->default_value(5),
"divide procedures when CpGs are separated by this distance (Kb)")
("threshold",
po::value<float>(&thsh)->default_value(0.0),
"set the threshold for log likelihood ratio scores")
("alpha",
po::value<float>(&alpha)->default_value(8.0),
"set pseudocounts for the emission function")
//("gcpg",
// po::value<float>(&gcpg)->default_value(1.0),
// "set gamma for the CpG gain function")
//("ggap",
// po::value<float>(&ggap)->default_value(1.0),
// "set gamma for the GAP gain function")
("dual",
po::value<bool>(&dual)->zero_tokens()->default_value(false),
"use the dual model instead of the naive model")
//("viterbi",
// po::value<bool>(&dovtb)->zero_tokens()->default_value(false),
// "use Viterbi decoding instead of posterior decoding")
("noslim",
po::value<bool>(&noslim)->zero_tokens()->default_value(false),
"use straightforward implementation instead of slim one")
//("noslim",
// po::value<bool>(&noslim)->zero_tokens()->default_value(false),
// "use straightforward implementation instead of slim one")
("verbose",
po::value<bool>(&verbose)->zero_tokens()->default_value(false),
"make verbose reports to stdout")
Expand Down
17 changes: 2 additions & 15 deletions ComMet/src/FrameworkComMet.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,8 @@ struct Options
uint dsep;
float thsh;
float alpha;
static const float gcpg = 1.0;
static const float ggap = 1.0;
bool dual;
static const bool dovtb = false;
bool noslim;
static const bool noslim = false;
bool verbose;

Options() : imc_file(), omc_file(), dmr_file() {}
Expand All @@ -59,14 +56,6 @@ class App
std::cout << "alpha must be larger than 2" << std::endl;
return false;
}
else if (! (opts_.gcpg > 0.0)) {
std::cout << "gcpg must be larger than 0" << std::endl;
return false;
}
else if (! (opts_.ggap > 0.0)) {
std::cout << "ggap must be larger than 0" << std::endl;
return false;
}

return identify();
}
Expand All @@ -76,10 +65,8 @@ class App
{
bool res = false;

progress("ComMet v0.1 - identification of differentially methylated regions (DMRs)");
std::vector<Data> data;

progress("load methylated bases");
std::vector<Data> data;
res = load_data(data);
if (!res) return false;

Expand Down
1 change: 1 addition & 0 deletions ComMet/src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ all: $(PROGRAM1) Makefile

clean:
\rm -f *.o *~ $(PROGRAM1)
chmod 644 *.cc *.h *.hpp

$(PROGRAM1): $(OBJ1)
$(CXX) $(CPPFLAGS) -o $(PROGRAM1) $(OBJ1) $(LDFLAGS) $(LDLIBS)
Expand Down
9 changes: 9 additions & 0 deletions ComMet/src/ProbabilityModel.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,15 @@ reset(const MethylList& met, ValueType alpha)
return false;
}

// dist
for (uint i=0; i!=met.pos_.size()-1; ++i) {
uint dist = met.pos_[i+1] - met.pos_[i];
if (dist < 2) {
cout << "error: distance between neighbor CpGs must not be less than 2 " << met.name_ << endl;
return false;
}
}

// For the reuse of a class instance, clear() is necessary before resize()
// because resize() a vector to the same size as the previous use DOES NOT
// overwrite its values.
Expand Down
2 changes: 1 addition & 1 deletion ComMet/src/SlimProbabilityModel.cc
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ reset(const MethylList& met, ValueType alpha)
for (uint i=0; i!=dpsize_-1; ++i) {
Dist[i] = met.pos_[i+1] - met.pos_[i];
if (Dist[i] < 2) {
cout << "error: distance between neighbor CpGs must not be less than 2" << met.name_ << endl;
cout << "error: distance between neighbor CpGs must not be less than 2 " << met.name_ << endl;
return false;
}
}
Expand Down
11 changes: 11 additions & 0 deletions ComMet/src/Utility.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,17 @@ progress(const char* message) {
cout << message << endl << flush;
}

void
progress(const string& message) {
cout << message << endl << flush;
}

string
whoami() {
string iam = "ComMet v1.0 - identification of differentially methylated regions (DMRs)";
return iam;
}

uint
round(float val) {
return (uint) (val + 0.5);
Expand Down
8 changes: 7 additions & 1 deletion ComMet/src/Utility.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,18 @@
#define PI 3.141592653589793238462643383279502884197169399375105820974944

#ifndef uint
#define uint unsigned int
#define uint unsigned int
#endif

void
progress(const char* message);

void
progress(const std::string& message);

std::string
whoami();

uint
round(float val);

Expand Down
1 change: 1 addition & 0 deletions README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ II. PACKAGE COMPONENTS

3) bsf-diff
An optional component for DMR identification using statistical test and signal smoothing.
This program is not officially released. You can use this one but we are not going to provide details about this.

4) script
Scripts to generate simulated reads for differentially methylated
Expand Down
Loading

0 comments on commit 19426c0

Please sign in to comment.