Gene_Expression_Predict_Kaggle

Histone modifications are playing an important role in affecting gene regulation. Nowadays, predicting gene expression from histone modification signals is a widely studied research topic.

The dataset of this competition is on "E047" (Primary T CD8+ naive cells from peripheral blood) celltype from Roadmap Epigenomics Mapping Consortium (REMC) database. For each gene, it has 100 bins with five core histone modification marks [1]. (We divide the 10,000 basepair(bp) DNA region (+/-5000bp) around the transcription start site (TSS) of each gene into bins of length 100 bp [2], and then count the reads of 100 bp in each bin. Finally, the signal of each gene has a shape of 100x5.)

The goal of this competition is to develop algorithms for accurate predicting gene expression level. High gene expression level corresponds to target label = 1, and low gene expression corresponds to target label = 0.

link of competition in kaggle website: [gene expression prediction competition] (https://inclass.kaggle.com/c/gene-expression-prediction) read more about the data: DeepChrome

To Do

Trying other algorithms.
Ploting learnig curve.
Finding a way for data augmantation.
Switching to Deep Learning algorithms.

Built With

[Python 2.7 & 3.5] (https://www.python.org/doc/)
scikit-learn 0.18 - The machine learning framework used

Authors

Pia Niemala - [email protected]
Zeinab R.Yousefi - [email protected]
Azarkhsh Hamedi - [email protected]
Saboktakin Hayati - [email protected]

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Tampere university of technology
Department of Signal Processing, SGN-41007 course and the lectrure Heikki Huttunen, which made us to participate in this competition :).
Hat tip to anyone who's code was used
Inspiration
etc

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Assignment4		Assignment4
CNN1D.py		CNN1D.py
CNN2D.py		CNN2D.py
DeepTUT.py		DeepTUT.py
KNN.py		KNN.py
MultipleClassifier.py		MultipleClassifier.py
README.md		README.md
RF.py		RF.py
RF2.py		RF2.py
SemiSupervised_CNN.py		SemiSupervised_CNN.py
XGboost		XGboost
XGboost.py		XGboost.py
XGboostTuned.py		XGboostTuned.py
mauro_rfe_l1.py		mauro_rfe_l1.py
reference2.pdf		reference2.pdf
sample.py		sample.py
task3_LogReg_C_penalty.py		task3_LogReg_C_penalty.py
x_test.csv		x_test.csv
x_train.csv		x_train.csv
y_train.csv		y_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gene_Expression_Predict_Kaggle

To Do

Built With

Authors

License

Acknowledgments

About

Releases

Packages

Contributors 3

Languages

alitakin/Gene_Expression_Predict_Kaggle

Folders and files

Latest commit

History

Repository files navigation

Gene_Expression_Predict_Kaggle

To Do

Built With

Authors

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages