-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model migration consultation #14
Comments
Hi @yihp, Oof, unfortunately, I think you can only use CheXbert in English. Unless you can translate to English before evaluation? But you can certainly change |
OK, which monitor do you recommend for my Chinese task? |
Hi @anicolson , I learned from your paper that
|
Hi @yihp, I am not quite sure to be honest. Maybe you could use a Chinese BERT for BERTScore? You could modify here as such: cxrmate/tools/metrics/bertscore.py Line 84 in 820607a
Here are those options you mentioned for
I pushed bertscore to the repo as well. |
Hi @anicolson , Thank you very much for your reply. Is this the process? |
Hi @yihp, So during validation/testing, the model will generate a report. Then, the generated report and the radiologist report are passed through chexbert (giving the chexbert labels for each). Classification scores are then calculated between the chexbert labels of the generated and radiologist reports. |
Hi @anicolson , OK, I got it. I changed the tokenizer and retrained the model, and the results are as follows: ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric ┃ DataLoader 0 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ test_report_chexbert_accuracy_atelectasis │ 0.6504310369491577 │
│ test_report_chexbert_accuracy_cardiomegaly │ 1.0 │
│ test_report_chexbert_accuracy_consolidation │ 1.0 │
│ test_report_chexbert_accuracy_edema │ 1.0 │
│ test_report_chexbert_accuracy_enlarged_cardiomediastinum │ 0.9993842244148254 │
│ test_report_chexbert_accuracy_example │ 0.9673740863800049 │
│ test_report_chexbert_accuracy_fracture │ 1.0 │
│ test_report_chexbert_accuracy_lung_lesion │ 1.0 │
│ test_report_chexbert_accuracy_lung_opacity │ 0.9978448152542114 │
│ test_report_chexbert_accuracy_macro │ 0.9673740863800049 │
│ test_report_chexbert_accuracy_micro │ 0.9673740863800049 │
│ test_report_chexbert_accuracy_no_finding │ 1.0 │
│ test_report_chexbert_accuracy_pleural_effusion │ 0.9910714030265808 │
│ test_report_chexbert_accuracy_pleural_other │ 1.0 │
│ test_report_chexbert_accuracy_pneumonia │ 1.0 │
│ test_report_chexbert_accuracy_pneumothorax │ 1.0 │
│ test_report_chexbert_accuracy_support_devices │ 0.9045053124427795 │
│ test_report_chexbert_f1_atelectasis │ 0.7660866379737854 │
│ test_report_chexbert_f1_cardiomegaly │ 0.0 │
│ test_report_chexbert_f1_consolidation │ 0.0 │
│ test_report_chexbert_f1_edema │ 0.0 │
│ test_report_chexbert_f1_enlarged_cardiomediastinum │ 0.0 │
│ test_report_chexbert_f1_example │ 0.5966299176216125 │
│ test_report_chexbert_f1_fracture │ 0.0 │
│ test_report_chexbert_f1_lung_lesion │ 0.0 │
│ test_report_chexbert_f1_lung_opacity │ 0.0 │
│ test_report_chexbert_f1_macro │ 0.08107323199510574 │
│ test_report_chexbert_f1_micro │ 0.7244199514389038 │
│ test_report_chexbert_f1_no_finding │ 0.0 │
│ test_report_chexbert_f1_pleural_effusion │ 0.0 │
│ test_report_chexbert_f1_pleural_other │ 0.0 │
│ test_report_chexbert_f1_pneumonia │ 0.0 │
│ test_report_chexbert_f1_pneumothorax │ 0.0 │
│ test_report_chexbert_f1_support_devices │ 0.3689386248588562 │
│ test_report_chexbert_num_dicom_ids │ 2872.0 │
│ test_report_chexbert_num_study_ids │ 1624.0 │
│ test_report_chexbert_precision_atelectasis │ 0.8176434636116028 │
│ test_report_chexbert_precision_cardiomegaly │ 0.0 │
│ test_report_chexbert_precision_consolidation │ 0.0 │
│ test_report_chexbert_precision_edema │ 0.0 │
│ test_report_chexbert_precision_enlarged_cardiomediastinum │ 0.0 │
│ test_report_chexbert_precision_example │ 0.6533148884773254 │
│ test_report_chexbert_precision_fracture │ 0.0 │
│ test_report_chexbert_precision_lung_lesion │ 0.0 │
│ test_report_chexbert_precision_lung_opacity │ 0.0 │
│ test_report_chexbert_precision_macro │ 0.0843597799539566 │
│ test_report_chexbert_precision_micro │ 0.7660516500473022 │
│ test_report_chexbert_precision_no_finding │ 0.0 │
│ test_report_chexbert_precision_pleural_effusion │ 0.0 │
│ test_report_chexbert_precision_pleural_other │ 0.0 │
│ test_report_chexbert_precision_pneumonia │ 0.0 │
│ test_report_chexbert_precision_pneumothorax │ 0.0 │
│ test_report_chexbert_precision_support_devices │ 0.3633934557437897 │
│ test_report_chexbert_recall_atelectasis │ 0.7206460237503052 │
│ test_report_chexbert_recall_cardiomegaly │ 0.0 │
│ test_report_chexbert_recall_consolidation │ 0.0 │
│ test_report_chexbert_recall_edema │ 0.0 │
│ test_report_chexbert_recall_enlarged_cardiomediastinum │ 0.0 │
│ test_report_chexbert_recall_example │ 0.5752052664756775 │
│ test_report_chexbert_recall_fracture │ 0.0 │
│ test_report_chexbert_recall_lung_lesion │ 0.0 │
│ test_report_chexbert_recall_lung_opacity │ 0.0 │
│ test_report_chexbert_recall_macro │ 0.07823583483695984 │
│ test_report_chexbert_recall_micro │ 0.6870800852775574 │
│ test_report_chexbert_recall_no_finding │ 0.0 │
│ test_report_chexbert_recall_pleural_effusion │ 0.0 │
│ test_report_chexbert_recall_pleural_other │ 0.0 │
│ test_report_chexbert_recall_pneumonia │ 0.0 │
│ test_report_chexbert_recall_pneumothorax │ 0.0 │
│ test_report_chexbert_recall_support_devices │ 0.3746556341648102 │
│ test_report_cxr-bert │ 0.7429220676422119 │
│ test_report_nlg_bleu_1 │ 0.3031856417655945 │
│ test_report_nlg_bleu_2 │ 0.03638414293527603 │
│ test_report_nlg_bleu_3 │ 0.016369516029953957 │
│ test_report_nlg_bleu_4 │ 0.0022414636332541704 │
│ test_report_nlg_cider │ 0.04183460399508476 │
│ test_report_nlg_meteor │ 0.1805824488401413 │
│ test_report_nlg_num_dicom_ids │ 2872.0 │
│ test_report_nlg_num_study_ids │ 1624.0 │
│ test_report_nlg_rouge │ 0.34699246287345886 The question is why test_report_cxr-bert is so high. Is it because cxr-bert has good Chinese generalization ability? I plan to test it. |
How do the reports look? E.g., in experiments/.../trial_0/metric_outputs/reports/... And I was suggesting a Chinese pre-trained Transformer encoder for BERTScore, not CheXbert or CXR-BERT (because I am not sure that they exist for the later two). |
Another question is that I don't see any code for calculating BERTScore? There is no BERTScore in the test results, only test_report_cxr-bert. |
Please pull the repo, it has been updated |
Hi @anicolson , Thank you very much for your reply. The generated reports seem to be fine, but many of the generated reports with different dicom_ids are identical, this indicates that the model's ability to generate reports is relatively poor. Then I just tested the performance of In addition, because Looking forward to your reply ! |
Hi @yihp, I guess your best starting point would be a non-model based metric, such as a word overlap metric that is language agnostic (I assume these fit into this category, but you will have to double check: val_report_nlg_bleu_4, You could use this until you find a Chinese-based model that could be used as a metric perhaps. |
Hi @anicolson , OK, I am doing experimental verification. I have a question about |
Hi! Thanks for your contribution. It is an excellent piece of work!
My task language is Chinese. I have trained a Chinese tokenizer and trained it from scratch, but I have the following questions:
Can I still use CheXbert metrics? I am still using monitor:
val_report_chexbert_f1_macro
for my training. Should I change to other monitor?Thank you very much for your time and consideration. I eagerly look forward to your response.
The text was updated successfully, but these errors were encountered: