add icml smarter paper to publications

rlucas7 · Oct 8, 2024 · 81f01d9 · 81f01d9
1 parent 50b4681
commit 81f01d9
Show file tree

Hide file tree

Showing 3 changed files with 99 additions and 14 deletions.
diff --git a/_publications/2024-02-17-paper-title-number-4.md b/_publications/2024-02-17-paper-title-number-4.md
diff --git a/_publications/2024-07-26-smart-vision-language-reasoners.md b/_publications/2024-07-26-smart-vision-language-reasoners.md
@@ -0,0 +1,98 @@
+---
+title: "Smart Vision-Language Reasoners"
+collection: publications
+category: conferences
+permalink: /publication/2024-07-26-smart-vision-language-reasoners
+excerpt: 'Images in Math AI Considered harmful? Not quite.
+
+This paper demonstrates improved performance on question answering problems in math
+by customizing the neural network architecture to pool information from both vision and text backbones.
+The improvements come from a custom QF layer which includes multihead self attention layers
+as well as a cross attention layer (vision & text). We fine tune the model using the smart-101 dataset presented in CVPR 2023.'
+date: 2024-07-26
+venue: 'ICML 2024'
+paperurl: 'https://smarter-vlm.github.io/smarter-vlm/'
+citation: 'Your Name, You. (2024). &quot;Paper Title Number 3.&quot; <i>GitHub Journal of Bugs</i>. 1(3).'
+---
+
+The architecture
+================
+
+We freeze the vision and text backbones and add some layers on top to promote
+pooling of visual and textual features and also to reduce the cost of fine
+tuning. The QF layer and the QF Fusion layer contain multihead self attention and
+cross attentions. Typical fully connected layers on top and on the decoder side
+we use a GRU layer. The GRU is used because it usually performs as well or better than other layers like LSTM while also having fewer parameters to update on the backward pass.
+
+The data
+========
+
+The smart-101 dataset consists of a collection of questions with answers.
+Each of the problems contains a base collection of text and 5 potential
+answers. Each problem is actually a class (or collection) of problems.
+The actual data is generated by code and images are also generated for each of
+the 101 problem classes. The images are generated using openCV. Human level
+performance on these data are quantified via the Math Kangaroo program.
+
+For more details on smart-101 see [smart-101](https://smartdataset.github.io/smart101/).
+For more details on math kangaroo see the [math kangaroo site](https://mathkangaroo.org/mks).
+
+The findings
+============
+
+We find that there is improved performance over the baselines used in the smart-101 paper.
+
+Critiques
+=========
+
+Some common critiques I heard at the ICML workshop and since then:
+
+1. We didn't do enough epoches for fine tuning.
+
+We were limited by amount of GPU compute time, and also the vision and text backbones
+are large models. Keep in mind this wasn't work sponsored by my employer so we
+did not have unlimited access to A100 GPUs to fine tune and perform extensive
+ablations. We followed a recipe outlined by [Andrei Karpathy](https://karpathy.github.io/2019/04/25/recipe/) which remains excellent advice despite the fast moving nature of the space.
+
+2. Images in math ai considered harmful.
+
+Another common critique I heard is that there are lots of other Math AI papers
+who have investigated the use of images and found they are either not helpful
+or harmful (images in math ai considered harmful). If you read these other papers
+you will find-in all the ones I've read-they do not customize the network architectures nor dothe have cross attention layers which pool information from the textual and image backbones.
+We sought to disprove the claims that images in math ai considered harmful and we did.
+
+In fact this was actually a panel discussion question during the workshop, "is text
+alone enough?" the consensus was that while images may not be necessary they are sufficient.
+
+That's math speak for images will help because there is a lot of data/information
+contained inside the images but that math problems can be solved without images.
+
+
+3. Images in math ai found not to help
+
+This is a finding in the MathVerse paper, e.g. the model learns to shortcut the
+vision features and rely primarily on the text features of the problem.
+My commentary on this is the same as item 2 above, the architectures did not pool
+the visual and textual information.
+
+My Opinion on Images
+====================
+
+While I do not disagree with the premises of the panel, it seems to me a bit like
+bringing a knife to a gunfight, or to use a less violent metaphor, playing chess
+[blindfold or sans voir](https://en.wikipedia.org/wiki/Blindfold_chess) against your opponent. There are some quite talented chess players and some that can play blindfold extremely well, however, most will say there performance playing blindfold is hindered over playing sighted.
+
+To be honest though, neither I not others in the community have an "answer" on
+the question of images in math ai.
+
+Conclusion
+==========
+
+While it remains to be seen whether purely text models can perform as well in the
+math ai domain, our work suggests that there are several aspects of the problem
+hitherto unconsidered.
+
+If you are a researcher or institution who would like to work together in this space-or fund our investigations-please get in touch!
+
+My email contact is there on the left hand side of the screen only a click away.
diff --git a/_publications/2024-10-01-paper-title-nearest-neighbor-matchups.md b/_publications/2024-10-01-paper-title-nearest-neighbor-matchups.md
@@ -3,7 +3,7 @@ title: "Nearest-neighbor matchup effects:     accounting for team matchups for p
 collection: publications
 category: manuscripts
 permalink: /publication/2024-10-01-paper-title-number-1
-excerpt: 'We introduce our novel nearest-neighbor matchup effects framework, which presents a flexible way to account for team characteristics above and beyond team strength that may influence game outcomes. '
+excerpt: 'We introduce our novel nearest-neighbor matchup effects framework, which presents a flexible way to account for team characteristics above and beyond team strength that may influence game outcomes. The paper was covered by several media outlets, click into the page to see one example.'
 date: 2015-02-23
 venue: ' Journal of Quantitative Analysis in Sports'
 slidesurl: ''