forked from dib-lab/khmer
-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
80 lines (45 loc) · 1.63 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
find-knot speedup:
- too many redundant rounds of partitioning?
memory improvement for counting hash
sequence loading into both counting & hashbits
graphsize filtering with stop-big-traversal
why does stop-big-traversal still connect???
cython integrate!
pull reads into partitions?
look more assiduously for memory leaks
----
counting hash generalization to n < 8 bits => memory efficiency
load-counting/bigcount loading is slooooow
----
screed slice
screed fasta/fastq output
---
fix tests cleanup
pyrex/cython stuff
docs!
---
the N dilemma
---
put in logging.
fix tests and test cases to properly isolate/remove temp files.
fix dir(ht)
###
Semi-obsolete comments, pre partitioning:
also, can probably arrange parallelization split & hash table size so
that modulus exactly fills split portion of table.
mismatch hashing - compute min num - each base?
copyright, C++ file comments at top; license; etc.
tests for readmask updating, etc. in various circumstances!
tests for khmer saturation vs valid reads
profiling
reading on hash functions:
http://www.burtleburtle.net/bob/hash/doobs.html
http://www.burtleburtle.net/bob/c/lookup3.c
http://www.burtleburtle.net/bob/hash/doobs.html
http://www.codinghorror.com/blog/2007/12/hashtables-pigeonholes-and-birthdays.html
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1131890/
things to reference or at least look at:
http://sourceforge.net/apps/mediawiki/kmer/index.php?title=Getting_Started_with_Meryl
http://www.hgsc.bcm.tmc.edu/cascade-tech-software_bang-ti.hgsc
http://www.hgsc.bcm.tmc.edu/cascade-tech-software_cbt-ti.hgsc
http://code.google.com/p/google-sparsehash/