TRAnscriptomic resource of Immune cells using Long-read Sequencing (TRAILS)

Full-length transcript annotation focusing on human immune cells

Alternative splicing events are a major causal mechanism for complex traits, but they have been understudied due to the limitation of short-read sequencing. Here, we generated a full-length isoform annotation of human immune cells, TRAILS, by long-read sequencing for 29 cell subsets. TRAILS contained a number of unannotated transcripts and functional characteristics of transcripts including encoded domains, inserted repetitive elements, cell-type specific expression, and translational efficiency. Further, we identified a number of disease-associated isoforms by isoform-switch analysis and by integration of several quantitative trait loci analyses with genome-wide association study data. These results are open on the web and the genome browser.

Citation

Jun Inamo, Akari Suzuki, Mahoko Takahashi Ueda, Kensuke Yamaguchi, Hiroshi Nishida, Katsuya Suzuki, Yuko Kaneko, Tsutomu Takeuchi, Hiroaki Hatano, Kazuyoshi Ishigaki, Yasushi Ishihama, Kazuhiko Yamamoto & Yuta Kochi. Long-read sequencing for 29 immune cell subsets reveals disease-linked isoforms. Nature Communications, doi:https://doi.org/10.1038/s41467-024-48615-4

Figure 1. Study design

Sequenced cell-subsets

Subset name	Abbreviation
Naïve CD4 T cells	Naïve CD4
Memory CD4 cells	Mem CD4
Fraction I naive regulatory T cells	FraI nTreg
Fraction II effector regulatory T cells	FraII aTreg
Fraction III non-regulatory T cells	FraIII non-Treg
Low-Density Granulocytes regulatory T cells	LAG3 Treg
T helper 1 cells	Th1
T helper 2 cells	Th2
T helper 17 cells	Th17
CXCR3 +/−CCR6− T cells	X3lowR6negT
T follicular helper cells	Tfh
Naïve CD8 T cells	Naïve CD8
Effector CD8 cells	Eff CD8
Central Memory CD8 T cells	CM CD8
Effector Memory CD8 T cells	EM CD8
Naïve B cells	Naïve B
Switched memory B cells	SM B
Unswitched memory B cells	USM B
Double Negative B cells	DN B
Plasmablasts	Plasmablast
Natural Killer cells	NK
Classical Monocytes	CL Mono
NonClassical Monocytes	NC Mono
Intermediate Monocytes	Int Mono
CD16 positive Monocytes	CD16p Mono
Myeloid Dendric cells	mDC
Plasmacytoid Dendric cells	pDC
Neutroohils	Neu
Peripheral blood mononuclear cells	PBMC

Highlights

The database of full-length isoforms for 29 immune cell types.
Transposable elements comprise a major fraction of isoform diversity.
Alternative 3’UTR usage contributes to cell-type specific expression of isoforms.
Integrated analysis of genetic and transcriptomic data with TRAILS reveals unknown pathogenesis of diseases.

How can users utilize TRAILS?

User-friendly web app is available
Users can remapping own RNA-seq datasets to TRAILS (TRAILS.gtf.gz, GRCh38) and investigate expression of isoforms in interested phenotypes.

Files

TRAILS.gtf.gz: GTF file of TRAILS.
isoform_info.txt.gz: Detailed information for each isoform in TRAILS.
male_PBMC.gtf.gz: GTF file of independent dataset by long-read RNA sequencing using the ONT platform (PromethION R10.4.1, V14 chemistry). We used this dataset to validate isoforms in TRAILS and investigate sex-differences.
Validated_TRAILS_by_PBMC.gtf.gz: GTF file of TRAILS validated by independent male PBMC
IsoQuant.transcript_models.gtf.gz: GTF file using IsoQuant pipeline (Prjibelski, A.D., et al. Accurate isoform discovery with IsoQuant using long reads. Nat Biotechnol (2023).)
IsoQuant.transcript_models.extended_annotation.gtf.gz: GTF file using IsoQuant pipeline (Extended).
Espresso.gtf.gz: GTF file using ESPRESSO pipeline (without junction correction) (Gao Y, et al. ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Sci Adv. 2023 Jan 20;9(3):eabq5072.).
Espresso_SJ.gtf.gz: GTF file using ESPRESSO pipeline (with junction correction)
code/: Source codes for each analysis.
Figures/: Source codes for generating figures in our paper.

Colums of "isoform_info.txt"

isoform: isoform ID
associated_gene: gene symbol
chrom: chrosome
start: start position
end: end position
strand: strand
5'UTR_length: length of 5’UTR region
ORF_length: length of coding region
3'UTR_length: length of 3’UTR region
polyA_motif: motif of poly A signal (“no-PAS” means no canonical motif)
kozak_score: kozak score [This is G c c A/G c c atg G. The most important nts are +4, -3 and -6. Scoring these as +3 and the others as +1. Max score = 13]
avg_codon_freq: codon frequency averaged across CDS (codon table retrieved on 11/20/2014)
AU_element_count: number of AU-stretches
AU_element_frac: percentage of UTR covered by AREs
max_AU_length: longest A/U stretch
5'UTRcap_MFE: minimum folding energy at 5' end (for 5' UTR, specifically affects 43S loading). This is calculated using the sequence of the 50nt after the 5' end, or if the 5' UTR is less than 50nt just calculate using the whole 5’UTR sequence, using viennaRNA
unique_TSS: transcription start site is specific to the isoform only
unique_ORF coding sequence is specific to the isoform only
unique_FE: no overlap with first exon of other isoforms
unique_LE: no overlap with last exon of other isoforms
translational_efficiency_rank: ranking according to translation efficiency (top10, others, and bottom 10: e.g., top10 means top 10% of translational efficiency). Translation efficiency is calculated using samples from 52 Yoruba (ribo-seq [GSE61742] and RNA-seq [GEUVADIS cohort, Nature 2013;501:506–511.])
immune_genes: immune genes annotated by InnateDB
TF: transcription factors
transmembrane: transmembrane proteins
signal_peptide: isoforms containing signal peptide sequence
IDR: isoforms containing intrinsically disordered protein region
ANCHOR2: isoforms containing intrinsically disordered binding region
uORF: isoforms containing predicted upstream open reading frame using ribo-TISH (ribo-seq datasets were downloaded from GSE39561, GSE56887, GSE61742, GSE74279, GSE75290, GSE81802, and GSE97140)
predicted_NMD: isoforms predicted to cause nonsense-mediated decay
specificity_LR: specifically expressed isoforms in any of the long-read sequenced 29 cell-subsets based on both expression and transcript ratio using ROKU function in TCC package
specific_cell_LR: specifically expressed cell in any of the long-read sequenced 29 cell-subsets
specificity_LRgroup: specifically expressed isoforms in any of the long-read sequenced 8 cell-groups based on expression and transcript ratio using ROKU function in TCC package
specific_cell_LRgroup: specifically expressed group in any of the long-read sequenced 8 cell-groups
- CD4T: NaiveCD4, Th1, Th2, Th17, Tfh, FraI nTreg, FraII aTreg, FraIII non-Treg, LAG3 Treg, Mem CD4, X3lowR6negT
- CD8T: NaiveCD8, Eff CD8, CM CD8, EM CD8
- B: NaiveB, USMB, SMB,DNB, plasmablast
- DC: mDC, pDC
- NK: NK
- monocyte: CL Mono, NC Mono, Int Mono, CD16p Mono
- PBMC: PBMC
- Neutrophil: Neu
repeat_elements: repetitive elements contained in the isoform
coloc_eQTL: colocalization between eQTL signal of associated gene and any of GWAS signal
cell_disease_eQTL: cell condition and phenotype of colocalization
coloc_sQTL: colocalization between sQTL signal of the isoform and any of GWAS signal
cell_disease_sQTL: cell condition and phenotype of colocalization
coloc_3'aQTL: colocalization between 3'aQTL signal of the isoform and any of GWAS signal
cell_disease_3'aQTL: cell condition and phenotype of colocalization
male_PBMC: validated isoforms by PBMC sample (PromethION R10.4.1, V14 chemistry) from 40 y/o male

Contact us

Please contact us (Jun Inamo: [email protected]) with any questions or comments.

The data presented here comes from the laboratory of Yuta Kochi through collaborating with the laboratory of Yasushi Ishihama, RIKEN, and Keio University.

Acknowledgments

This study was supported by the Japan Society for the Promotion of Science, the MEXT of Japan, and grants from Nanken-Kyoten, TMDU and Medical Research Center Initiative for High Depth Omics. Computations were partially performed on the NIG supercomputer at the ROIS National Institute of Genetics.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
analytical_codes		analytical_codes
images		images
source_codes		source_codes
.DS_Store		.DS_Store
.Rhistory		.Rhistory
Espresso.gtf.gz		Espresso.gtf.gz
Espresso_SJ.gtf.gz		Espresso_SJ.gtf.gz
IsoQuant.transcript_models.extended_annotation.gtf.gz		IsoQuant.transcript_models.extended_annotation.gtf.gz
IsoQuant.transcript_models.gtf.gz		IsoQuant.transcript_models.gtf.gz
README.md		README.md
TRAILS.gtf.gz		TRAILS.gtf.gz
Validated_TRAILS_by_PBMC.gtf.gz		Validated_TRAILS_by_PBMC.gtf.gz
isoform_info.txt.gz		isoform_info.txt.gz
male_PBMC.gtf.gz		male_PBMC.gtf.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRAnscriptomic resource of Immune cells using Long-read Sequencing (TRAILS)

Full-length transcript annotation focusing on human immune cells

Citation

Sequenced cell-subsets

Highlights

How can users utilize TRAILS?

Files

Colums of "isoform_info.txt"

Contact us

Acknowledgments

About

Releases 1

Packages

Languages

juninamo/TRAILS

Folders and files

Latest commit

History

Repository files navigation

TRAnscriptomic resource of Immune cells using Long-read Sequencing (TRAILS)

Full-length transcript annotation focusing on human immune cells

Citation

Sequenced cell-subsets

Highlights

How can users utilize TRAILS?

Files

Colums of "isoform_info.txt"

Contact us

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages