Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault for RefSeq Genes #21

Open
DarioS opened this issue May 13, 2014 · 4 comments
Open

Segmentation Fault for RefSeq Genes #21

DarioS opened this issue May 13, 2014 · 4 comments

Comments

@DarioS
Copy link

DarioS commented May 13, 2014

The program runs some steps, then an error occurs.

after relabling, there are 140259 eq classes
Building the k-mer equiv. class <=> transcript mappings
0% [> ] ETA > 1 week

***** FATAL TRIGGER RECEIVED *******
Received fatal signal: SIGSEGV(11)
PID: 31416
stack dump [1] /lib/x86_64-linux-gnu/libpthread.so.0+0xf030 [0x7fa38c39e030]
stack dump [2] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libsailfish_core.so+0xfbf8e [0x7fa38da5df8e]
stack dump [3] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libsailfish_core.so+0xfc6cd [0x7fa38da5e6cd]
stack dump [4] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x2203a [0x7fa38bf6803a]
stack dump [5] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x1dd96 [0x7fa38bf63d96]
stack dump [6] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x1d45b [0x7fa38bf6345b]
stack dump [7] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x1ae5f [0x7fa38bf60e5f]
stack dump [8] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x1b059 [0x7fa38bf61059]
stack dump [9] /lib/x86_64-linux-gnu/libpthread.so.0+0x6b50 [0x7fa38c395b50]
stack dump [10] /lib/x86_64-linux-gnu/libc.so.6clone+0x6d [0x7fa38b2af0ed]

***** RETHROWING SIGNAL SIGSEGV(11)

g2log exiting after receiving fatal event

This problem does not happen with the sample data transcripts file.

My command was $ /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/bin/sailfish index -k 30 -t data/sequence/hg19/hg19genes.fasta -o hg19

The transcripts file I obtained from UCSC Genome Browser looks like $ head data/sequence/hg19/hg19genes.fasta

hg19_refGene_NM_032291 range=chr1:66999825-67210768 5'pad=0 3'pad=0 strand=+ repeatMasking=none
TTTCTCTCAGCATCTTCTTGGTAGCCTGCCTGTAGGTGAAGAAGCACCAG
CAGCATCCATGGCCTGTCTTTTGGCTTAACACTTATCTCCTTTGGCTTTG
ACAGCGGACGGAATAGACCTCAGCAGCGGCGTGGTGAGGACTTAGCTGGG
ACCTGGAATCGTATCCTCCTGTGTTTTTTCAGACTCCTTGGAAATTAAGG
AATGCAATTCTGCCACCATGATGGAAGGATTGAAAAAACGTACAAGGAAG
GCCTTTGGAATACGGAAGAAAGAAAAGGACACTGATTCTACAGGTTCACC
AGATAGAGATGGAATTCAGCCCAGCCCACACGAACCACCCTACAATAGCA
AAGCAGAGTGTGCGCGTGAAGGAGGAAAAAAAGTTTCGAAGAAAAGCAAT
GGGGCACCAAATGGATTTTATGCGGAAATTGATTGGGAAAGATATAACTC

If the code is rerun, the program thinks it completed successfully.

Checking that jellyfish hash is up to date
All index files seem up-to-date.

I ran quant next, but it seems that the indexes weren't completely created.

Creating optimizer . . .done
optimizing using iterative optimization [1000] iterations
reading Kmer equivalence classes
updating Kmer group counts
updating transcript map

***** FATAL TRIGGER RECEIVED *****
Received fatal signal: SIGSEGV(11)
PID: 32239
stack dump [1] /lib/x86_64-linux-gnu/libpthread.so.0+0xf030 [0x7fcaf0aeb030]
stack dump [2] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/bin/sailfish : CollapsedIterativeOptimizer::prepareCollapsedMaps_(std::string const&, bool)+0x223 [0x460393]
stack dump [3] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/bin/sailfish : CollapsedIterativeOptimizer::initialize_(std::string const&, std::string const&, std::string const&, bool)+0xb2 [0x460932]
stack dump [4] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/bin/sailfish : CollapsedIterativeOptimizer::optimize(std::string const&, std::string const&, std::string const&, unsigned long, double, double)+0x4a [0x46838a]
stack dump [5] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/bin/sailfish : runIterativeOptimizer(int, char
)+0x17a1 [0x44ed21]
stack dump [6] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libsailfish_core.so+0xdfe32 [0x7fcaf218ee32]
stack dump [7] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libstdc++.so.6+0xb4210 [0x7fcaf01f7210]
stack dump [8] /lib/x86_64-linux-gnu/libpthread.so.0+0x6b50 [0x7fcaf0ae2b50]
stack dump [9] /lib/x86_64-linux-gnu/libc.so.6clone+0x6d [0x7fcaef9fc0ed]

***** RETHROWING SIGNAL SIGSEGV(11)

@kingsfordgroup
Copy link
Owner

Hi DarioS,

Thanks for reporting this. This error seems very similar to this previous (closed) issue. In that case the issue was caused by a combination of duplicate transcripts in the input file, and was resolved by removing the duplicates. As I commented in that ticket; crashing is obviously not the desired behavior, and the suggested behavior in future versions is to ignore all but one of the duplicates, report their existence to the user, and continue building the index.

@jpiper
Copy link
Contributor

jpiper commented Nov 26, 2014

After encountering the same issue, I looked into this in depth and have worked out why this happens.

e.g. If you use the ensembl GTF (Homo_sapiens.GRCh38.77.gtf) and the reference genome (Homo_sapiens.GRCh38.77.dna.toplevel.fa) (downloadable from here)

and you extract your transcripts using gffread thus

gffread -w Homo_sapiens.GRCh38.77.transcripts.fa -g Homo_sapiens.GRCh38.77.dna.toplevel.fa Homo_sapiens.GRCh38.77.gtf

you end up with duplicate transcripts because of extraneous annotations (it appears that all the ones that are duplicates in the outputs are those with selenocysteine annotations). You can get around this by stripping the GTF file down to just the CDS and exons first,

awk '($3 == "exon" || $3 == "CDS")' Homo_sapiens.GRCh38.77.gtf > Homo_sapiens.GRCh38.77.filtered.gtf

then extract your transcripts

gffread -w Homo_sapiens.GRCh38.77.transcripts.fa -g Homo_sapiens.GRCh38.77.dna.toplevel.fa Homo_sapiens.GRCh38.77.filtered.gtf

then run sailfish

sailfish index -m Homo_sapiens.GRCh38.77.filtered.gtf -t Homo_sapiens.GRCh38.77.transcripts.fa -o Sailfish_Index_Human_GRCh38.77 -k 20

tada 🎉 !

(p.s. this is also a handy tutorial for those wanting to build a sailfish index for Ensembl GRCh38 😄 )

I can't tell if this is a problem with gffread or if this is to be expected - I'm not an expect on the GTF format, but hopefully this information is helpful to others.

@rob-p
Copy link
Collaborator

rob-p commented Nov 26, 2014

Hi @jpiper,

Thanks for the detailed response here! I'm going to see if the same issue affects Salmon. It also probably makes sense (as hinted above) to flag, remove and notify the user of duplicates. However, for the time being, would you mind if used your example in our documentation (with attribution, of course)?

@jpiper
Copy link
Contributor

jpiper commented Nov 26, 2014

Hey Rob, no problem whatsoever - happy to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants