DeepSEA is a multi-task convolutional neural network (CNN) that was shown to accurately predict large-scale chromatin-profiling data, namely TF binding, DNase I sensitivity, and histone-mark profiles (Zhou and Troyanskaya 2015). We explore various aspects of the CNN to try to understand if the network is learning to detect binding site motifs or more generally, how the CNN is making its predictions. In this repository, we provide code to
- extract position weight matrices (PWMs) from the first convolutional layer as was done in (Alipanahi et al. 2015)
- find matches between these PWMs and the JASPAR database (Sandelin et al. 2004) using the Tomtom algorithm (Gupta et al. 2007)
- evaluate filter importances, that is, which filter is important for which response, using a knockout approach inspired by (Maslova et al. 2019)
- extract motifs from the learned network using TF-MoDISco (Shrikumar et al. 2018)
Source: http://deepsea.princeton.edu/help/
- DeepSEA training and test data bundle from here
- Names of 919 responses from here (File location: resources/predictor.names)
Part of our code is adapted from the following repositories: