#+STARTUP lognoteredeadline
Crazy idea: what about putting the plots into a single column of a DataFrame, with extra factors for grouping them? I think we talked about a similar (but different) idea with the plot templates in visnab. Then, have a geom_plot that will draw plots (with the center X/Y coming from those grouping factors) so that they do not overlap. Faceting would need to be supported, but the faceting modes currently in ggplot might not work so well, because every plot needs to be the same size. This would end up wasting space. Might want a special class for this, like PlotFrame. Then have an autoplot method for it. I am not sure if it would always be possible/desirable to unify the legends. Question: is this really a coordinate system or a stat? My understanding is that a coordinate system changes the data->pixel mapping but it does not change the data itself. So coord_truncate does not change the coordinates (as labeled in the axis), it just squishes stuff together in the plot. In this case though, we need the X axis to be the same across many different origins/tracks, so the coordinates need to be transformed through a ‘stat’. Right? It needs to be more like Gviz. If the tick labels are over 1mb, use mb as the unit, else, use kb, unless less than 1kb, then use bp. Those long numbers are tough to read.Focus should be on multivariate (multiple sample) plots, like ExpressionSet. This would include parallel coordinate plots and scatterplot matrices. If those plots are by-row, i.e., the variables correspond to ranges, then the data-linked-to-ranges plots would work. If the variables are the samples, the pcp/splom could be a margin plot, where each track shows something for each sample in genomic context. Or in the case of the splom, we could use one triangle for the traditional scatterplot and the other triangle would be something else incorporating range information.
As a first step, we could just make this method behave just like autoplot,ExpressionSet. Then come up with clever ways of incorporating the range information.
Grabs cytoband information automatically There is a similarity, I think, between the ideogram and this idea. The ideogram is drawn over the entire chromosome but then somehow it knows to draw a red rectangle around the region being plotted below. That currently works for only a single range, but it could be extended for multiple ranges. Those ranges would be assumed to be directly adjacent in the bottom track, and lines would be drawn from the rectangle sides down to the breakpoints. I think visnab did this line drawing for the ideogram (single range only).We might need a new geom, maybe called geom_splice, that delegates to another geom (geom_ideogram, geom_alignment, etc) and then draws lines from sub-regions of the global space down to adjacent, spliced regions. The bottom end points of those lines would somehow depend on the coordinate system, while the top end points somehow use the global coordinates. For the linear coordinate system, the lines simply go to the X axis limits. We would then have a coord_splice that does the necessary removal of gaps, with the structure stored in a GRanges. coord_truncate_gaps is really just a special case of coord_splice, where the exons have been (invisibly) extended a little. So maybe we could replace that with coord_splice and add a parameter for the buffer width.
For protein space though, it sort of no longer makes sense to speak in genomic coordinates. Instead, we have protein coordinates that start at 1, so that requires a ‘stat’ transformation similar to that in stat_views. So sometimes we want a coord_splice, other times a stat_splice, depending on whether we still want global, genomic coordinates on the X axis. They should share a lot of code.
This sounds a bit involved, but I think it’s really important for biological plotting.
What sort of data would we parse? This is probably the domain of some other package.when use autoplot,seqinfo, and with cytoband = TRUE, and provide genome names, need to download that automatically.
to autoplot,TxDb, it shows me the region containing that tx_id, instead of just showing that exact transcript. I think that’s a little surprising. Any reason why you do this?
ignores the ‘which’ argument when method = “estimate”. Btw, I fixed a bug in the coverage estimation when there were no reads on a chromosome. It is also debatable as to whether we want to use method = “estimate” by default. People do not know that it is an estimate.
does not use the kb/Mb/etc labels. The X axis label is just “Genomic Position” when method = “raw”.
need to use special formatting, otherwise the trailing zeros are dropped off, like 120.768, 120.77 [missing zero], 120.772.
it is using old scales and labels. And it would be nice to get rid of the “seqs” label over the legend. And I’m not sure if we even need the legend when using the text geom. It’s kind of weird looking.