Histone modifications are playing an important role in affecting gene regulation. Nowadays, predicting gene expression from histone modification signals is a widely studied research topic.
The dataset of this competition is on "E047" (Primary T CD8+ naive cells from peripheral blood) celltype from Roadmap Epigenomics Mapping Consortium (REMC) database. For each gene, it has 100 bins with five core histone modification marks [1]. (We divide the 10,000 basepair(bp) DNA region (+/-5000bp) around the transcription start site (TSS) of each gene into bins of length 100 bp [2], and then count the reads of 100 bp in each bin. Finally, the signal of each gene has a shape of 100x5.)
The goal of this competition is to develop algorithms for accurate predicting gene expression level. High gene expression level corresponds to target label = 1, and low gene expression corresponds to target label = 0.
link of competition in kaggle website: [gene expression prediction competition] (https://inclass.kaggle.com/c/gene-expression-prediction) read more about the data: DeepChrome
- Trying other algorithms.
- Ploting learnig curve.
- Finding a way for data augmantation.
- Switching to Deep Learning algorithms.
- [Python 2.7 & 3.5] (https://www.python.org/doc/)
- scikit-learn 0.18 - The machine learning framework used
- Pia Niemala - [email protected]
- Zeinab R.Yousefi - [email protected]
- Azarkhsh Hamedi - [email protected]
- Saboktakin Hayati - [email protected]
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
- Tampere university of technology
- Department of Signal Processing, SGN-41007 course and the lectrure Heikki Huttunen, which made us to participate in this competition :).
- Hat tip to anyone who's code was used
- Inspiration
- etc