Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model migration consultation #14

Open
yihp opened this issue Aug 27, 2024 · 13 comments
Open

Model migration consultation #14

yihp opened this issue Aug 27, 2024 · 13 comments

Comments

@yihp
Copy link

yihp commented Aug 27, 2024

Hi! Thanks for your contribution. It is an excellent piece of work!

My task language is Chinese. I have trained a Chinese tokenizer and trained it from scratch, but I have the following questions:
Can I still use CheXbert metrics? I am still using monitor: val_report_chexbert_f1_macro for my training. Should I change to other monitor?

Thank you very much for your time and consideration. I eagerly look forward to your response.

@anicolson
Copy link
Member

Hi @yihp,

Oof, unfortunately, I think you can only use CheXbert in English. Unless you can translate to English before evaluation? But you can certainly change monitor to something else.

@yihp
Copy link
Author

yihp commented Aug 27, 2024

OK, which monitor do you recommend for my Chinese task?

@github-staff github-staff deleted a comment from yihp Aug 27, 2024
@yihp
Copy link
Author

yihp commented Aug 28, 2024

Hi @anicolson ,

I learned from your paper that CheXbert, RadGraph ER, and CXR-BERT were intended to capture the clinical semantic similarity between the generated and radiologist reports, but these models are for English tasks and I can't reuse them. BERTscore seems to be able to evaluate Chinese tasks. Then I have the following questions:

  1. I can use BERTScore as the semantic similarity reward, but the results in your paper are not very good, and the effect of CXR-BERT is very good
  2. Because CheXbert is only applicable to English tasks, I have to change the monitor: 'val_report_chexbert_f1_macro', do you have any suggestions for the choice of monitor? BERTscore, CIDEr, ROUGE-L, or BLEU-4?

@anicolson
Copy link
Member

Hi @yihp,

I am not quite sure to be honest. Maybe you could use a Chinese BERT for BERTScore? You could modify here as such:

bert_scorer = BERTScorer(

Here are those options you mentioned for monitor:

val_report_bertscore_f1
val_report_nlg_bleu_4
val_report_nlg_cider
val_report_nlg_rouge

I pushed bertscore to the repo as well.

@yihp
Copy link
Author

yihp commented Aug 29, 2024

Hi @anicolson ,

Thank you very much for your reply.
You use val_report_chexbert_f1_macro as monitor. I would like to ask you about the specific process. Are you use the trained cxrmate model to generate radiology reports, then let the chexbert model predict the labels (14 categories), and then calculate the chexbert_f1 value with the actual labels.

Is this the process?

@anicolson
Copy link
Member

Hi @yihp,

So during validation/testing, the model will generate a report. Then, the generated report and the radiologist report are passed through chexbert (giving the chexbert labels for each). Classification scores are then calculated between the chexbert labels of the generated and radiologist reports.

@yihp
Copy link
Author

yihp commented Aug 29, 2024

Hi @anicolson ,

OK, I got it. I changed the tokenizer and retrained the model, and the results are as follows:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                        Test metricDataLoader 0                        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test_report_chexbert_accuracy_atelectasis0.6504310369491577                     │
│        test_report_chexbert_accuracy_cardiomegaly1.0                            │
│        test_report_chexbert_accuracy_consolidation1.0                            │
│            test_report_chexbert_accuracy_edema1.0                            │
│ test_report_chexbert_accuracy_enlarged_cardiomediastinum0.9993842244148254                     │
│           test_report_chexbert_accuracy_example0.9673740863800049                     │
│          test_report_chexbert_accuracy_fracture1.0                            │
│         test_report_chexbert_accuracy_lung_lesion1.0                            │
│        test_report_chexbert_accuracy_lung_opacity0.9978448152542114                     │
│            test_report_chexbert_accuracy_macro0.9673740863800049                     │
│            test_report_chexbert_accuracy_micro0.9673740863800049                     │
│         test_report_chexbert_accuracy_no_finding1.0                            │
│      test_report_chexbert_accuracy_pleural_effusion0.9910714030265808                     │
│        test_report_chexbert_accuracy_pleural_other1.0                            │
│          test_report_chexbert_accuracy_pneumonia1.0                            │
│        test_report_chexbert_accuracy_pneumothorax1.0                            │
│       test_report_chexbert_accuracy_support_devices0.9045053124427795                     │
│            test_report_chexbert_f1_atelectasis0.7660866379737854                     │
│           test_report_chexbert_f1_cardiomegaly0.0                            │
│           test_report_chexbert_f1_consolidation0.0                            │
│               test_report_chexbert_f1_edema0.0                            │
│    test_report_chexbert_f1_enlarged_cardiomediastinum0.0                            │
│              test_report_chexbert_f1_example0.5966299176216125                     │
│             test_report_chexbert_f1_fracture0.0                            │
│            test_report_chexbert_f1_lung_lesion0.0                            │
│           test_report_chexbert_f1_lung_opacity0.0                            │
│               test_report_chexbert_f1_macro0.08107323199510574                    │
│               test_report_chexbert_f1_micro0.7244199514389038                     │
│            test_report_chexbert_f1_no_finding0.0                            │
│         test_report_chexbert_f1_pleural_effusion0.0                            │
│           test_report_chexbert_f1_pleural_other0.0                            │
│             test_report_chexbert_f1_pneumonia0.0                            │
│           test_report_chexbert_f1_pneumothorax0.0                            │
│          test_report_chexbert_f1_support_devices0.3689386248588562                     │
│            test_report_chexbert_num_dicom_ids2872.0                           │
│            test_report_chexbert_num_study_ids1624.0                           │
│        test_report_chexbert_precision_atelectasis0.8176434636116028                     │
│        test_report_chexbert_precision_cardiomegaly0.0                            │
│       test_report_chexbert_precision_consolidation0.0                            │
│           test_report_chexbert_precision_edema0.0                            │
│ test_report_chexbert_precision_enlarged_cardiomediastinum0.0                            │
│          test_report_chexbert_precision_example0.6533148884773254                     │
│          test_report_chexbert_precision_fracture0.0                            │
│        test_report_chexbert_precision_lung_lesion0.0                            │
│        test_report_chexbert_precision_lung_opacity0.0                            │
│           test_report_chexbert_precision_macro0.0843597799539566                     │
│           test_report_chexbert_precision_micro0.7660516500473022                     │
│         test_report_chexbert_precision_no_finding0.0                            │
│      test_report_chexbert_precision_pleural_effusion0.0                            │
│       test_report_chexbert_precision_pleural_other0.0                            │
│         test_report_chexbert_precision_pneumonia0.0                            │
│        test_report_chexbert_precision_pneumothorax0.0                            │
│      test_report_chexbert_precision_support_devices0.3633934557437897                     │
│          test_report_chexbert_recall_atelectasis0.7206460237503052                     │
│         test_report_chexbert_recall_cardiomegaly0.0                            │
│         test_report_chexbert_recall_consolidation0.0                            │
│             test_report_chexbert_recall_edema0.0                            │
│  test_report_chexbert_recall_enlarged_cardiomediastinum0.0                            │
│            test_report_chexbert_recall_example0.5752052664756775                     │
│           test_report_chexbert_recall_fracture0.0                            │
│          test_report_chexbert_recall_lung_lesion0.0                            │
│         test_report_chexbert_recall_lung_opacity0.0                            │
│             test_report_chexbert_recall_macro0.07823583483695984                    │
│             test_report_chexbert_recall_micro0.6870800852775574                     │
│          test_report_chexbert_recall_no_finding0.0                            │
│       test_report_chexbert_recall_pleural_effusion0.0                            │
│         test_report_chexbert_recall_pleural_other0.0                            │
│           test_report_chexbert_recall_pneumonia0.0                            │
│         test_report_chexbert_recall_pneumothorax0.0                            │
│        test_report_chexbert_recall_support_devices0.3746556341648102                     │
│                   test_report_cxr-bert0.7429220676422119                     │
│                  test_report_nlg_bleu_10.3031856417655945                     │
│                  test_report_nlg_bleu_20.03638414293527603                    │
│                  test_report_nlg_bleu_30.016369516029953957                    │
│                  test_report_nlg_bleu_40.0022414636332541704                   │
│                   test_report_nlg_cider0.04183460399508476                    │
│                  test_report_nlg_meteor0.1805824488401413                     │
│               test_report_nlg_num_dicom_ids2872.0                           │
│               test_report_nlg_num_study_ids1624.0                           │
│                   test_report_nlg_rouge0.34699246287345886 

The question is why test_report_cxr-bert is so high. Is it because cxr-bert has good Chinese generalization ability? I plan to test it.
And because I use val_report_chexbert_f1_macro as monitor, my task is Chinese, so the chexbert_f1 result is not referenceable. I will replace the monitor or fine-tune a chinese_chexbert according to you mentioned.

@anicolson
Copy link
Member

anicolson commented Aug 29, 2024

How do the reports look? E.g., in experiments/.../trial_0/metric_outputs/reports/...

And I was suggesting a Chinese pre-trained Transformer encoder for BERTScore, not CheXbert or CXR-BERT (because I am not sure that they exist for the later two).

@yihp
Copy link
Author

yihp commented Aug 29, 2024

Another question is that I don't see any code for calculating BERTScore? There is no BERTScore in the test results, only test_report_cxr-bert.

@anicolson
Copy link
Member

Please pull the repo, it has been updated

@yihp
Copy link
Author

yihp commented Sep 1, 2024

How do the reports look? E.g., in experiments/.../trial_0/metric_outputs/reports/...

And I was suggesting a Chinese pre-trained Transformer encoder for BERTScore, not CheXbert or CXR-BERT (because I am not sure that they exist for the later two).

Hi @anicolson ,

Thank you very much for your reply.

The generated reports seem to be fine, but many of the generated reports with different dicom_ids are identical, this indicates that the model's ability to generate reports is relatively poor.

Then I just tested the performance of CXR-BERT in Chinese, and the effect was very poor, which also shows that CXR-BERT is only for English chest X-ray tasks. But I am not sure if there is a similar Chinese BERT model that can calculate similarity and will do some test.

In addition, because CheXbert is only applicable to English tasks, it is not realistic for me to retrain a CheXbert in chinese language. So do you have any suggestions for the choice of monitor on my Chinese tasks? Is a Chinese pre-trained Transformer encoder for BERTScore a good choice? Or other indicators.

Looking forward to your reply !

@anicolson
Copy link
Member

Hi @yihp,

I guess your best starting point would be a non-model based metric, such as a word overlap metric that is language agnostic (I assume these fit into this category, but you will have to double check: val_report_nlg_bleu_4,
val_report_nlg_cider, val_report_nlg_rouge).

You could use this until you find a Chinese-based model that could be used as a metric perhaps.

@yihp
Copy link
Author

yihp commented Sep 2, 2024

Hi @anicolson ,

OK, I am doing experimental verification.

I have a question about eval_loss_step. In the tesorboard training monitoring page, I only see train_loss_step, but no eval_loss_step. How should I add it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@anicolson @yihp and others