Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effect of Irrelevant Bulk RNA-Seq Sample and Selection of Optimal Projects for Test Data #15

Open
cwarden45 opened this issue Dec 13, 2022 · 1 comment

Comments

@cwarden45
Copy link

cwarden45 commented Dec 13, 2022

Hi,

Thank you very much for putting together this code.

I would like to better understand when Bulk2Space might help versus when there are limits to applicability to Bulk2Space, following a journal club presentation where I learned more about the paper and method.

I apologize that I am not sure how best to precisely ask my question, but I have tried to use a few examples to try and give a sense of what I am asking about.

Example 1 (Exact Code for Concrete Test):

In the spirit of a GitHub “issue,” I tried to start with concrete examples for discussion based upon issue #8 .

I have attached a summary of that analysis (PDAC_Test.pdf), and I have also attached any input files not already provided on this repository.

However, when I changed the bulk RNA-Seq gene symbols in order to use the same gene symbols for both the PDAC example and the demo1 example, I lost the Ductal cells in the PDAC example that otherwise still used only files derived from the same samples used for the PDAC example. I also have some more details notes in the uploaded PDF.

Nevertheless, if that might possibly help the discussion, I have provided those.

If there are any other relatively small files that it would help to upload to GitHub, then I would also be happy to add those. For example, I also ran the analysis with epoch_num=1000 instead of epoch_num=3500. I am currently not providing those results, but my impression is that they look qualitatively similar in terms of cancer cell and ductal cell assignments (for all of the provided PDAC files).

Example 2 (Theoretical Question):

Is it possible to run bulk2spatial as described below?

1) Use bulk RNA-Seq + scRNA-Seq + spatial data that all come from Patient A.

2) Export model from Patient A.

3) Only provide bulk RNA-Seq data from patient B, and test how predictions from model defined on Patient A compare to scRNA-Seq and spatial data generated for Patient B.

  • Additionally, if I understand correctly, then I think an image for the tissue for Patient B can not be provided. If so, I think the shape of the issue section for Patient B can’t be known, and I would guess the spatial coordinates from Bulk2Space might not be directly applicable to interpret Patient B. However, if I might be misunderstanding anything, then please let me know.

Example 3 (Summary Questions):

Am I correctly understanding that consecutive slides are often used in the paper? For example, the 2 slices in Figure S17f already have different shapes, and it looks like you a projection of estimations on the histology image for slide 2 was not (or could not?) be provided.

Data from different patients would be even more different. So, is it reasonable and/or correct to say that there is a preference to use all 3 data types generated from the same experiment? Even if the exact slice is not the same, the true composition of the multiple data types can hopefully be as close as possible?

For example, I am not sure if the difference is sufficiently extreme, but let’s say Patient A has histology like the “Inflammation” sample in Figure 6 and Patient B has histology like the “Cancer” sample in Figure 6. If you didn’t have a spatial transcriptomics (ST) dataset for Patient B, then I think use of the ST data from Patient A might not be of much benefit to Patient B. Do you think that is a fair conclusion?

Similarly, if your training sample had 90% tumor, then I would expect limitations is looking at the projection from a spatial transcriptomics project where the tissues had a very different percent tumor such as closer to 20% tumor. I would also expect there often could be a challenge in even knowing the general shape of an independent/unrelated tumor sample, and I believe that you should not be able to know the spatial information for the tumor cells within an independent tissue without a more direct measurement.

I am not sure if the points above might also possibly relate to the shift in the frequency of cancer cells per spot with the reduced/matching gene symbols in the uploaded PDF for Example 1.

However, if I am then understanding correctly, then might that be at least somewhat contradictory to what I believe is a recommendation to use public data in issue #7? If I might be misunderstanding anything, then please let me know.

Thank you very much for your help!

Sincerely,
Charles

Code.zip
demo1_bulk-FALSE_PDAC_LABEL.csv
demo1_bulk-FALSE_PDAC_LABEL-MATCHING_SUBSET.csv
pdac_bulk-MATCHING_SUBSET.csv

SC Cell_Type_Counts.pdf
SC Cell_Type_Correlation.pdf
ST Spot_Deconvolution.pdf
ST Cancer_Cells_per_Spot.pdf

PDAC_Test.pdf

@cwarden45 cwarden45 changed the title Effect of Irrelevant Bulk RNA-Seq Sample and Applicability to Completely Independent Test Data Effect of Irrelevant Bulk RNA-Seq Sample and Selection of Optimal Projects for Test Data Dec 13, 2022
@SpaTrek
Copy link
Collaborator

SpaTrek commented Jul 2, 2023

Thanks for your valuable questions. Your questions are so specific that we may not be able to answer them all. If you have a problem that cannot run during use, we will help you to solve it. If there are doubts about purely scientific hypotheses, then we are welcome to discuss them, but we will not respond to you so quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants