Skip to content

Commit

Permalink
add icml smarter paper to publications
Browse files Browse the repository at this point in the history
  • Loading branch information
Lucas Roberts authored and Lucas Roberts committed Oct 8, 2024
1 parent 50b4681 commit 81f01d9
Show file tree
Hide file tree
Showing 3 changed files with 99 additions and 14 deletions.
13 changes: 0 additions & 13 deletions _publications/2024-02-17-paper-title-number-4.md

This file was deleted.

98 changes: 98 additions & 0 deletions _publications/2024-07-26-smart-vision-language-reasoners.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: "Smart Vision-Language Reasoners"
collection: publications
category: conferences
permalink: /publication/2024-07-26-smart-vision-language-reasoners
excerpt: 'Images in Math AI Considered harmful? Not quite.
This paper demonstrates improved performance on question answering problems in math
by customizing the neural network architecture to pool information from both vision and text backbones.
The improvements come from a custom QF layer which includes multihead self attention layers
as well as a cross attention layer (vision & text). We fine tune the model using the smart-101 dataset presented in CVPR 2023.'
date: 2024-07-26
venue: 'ICML 2024'
paperurl: 'https://smarter-vlm.github.io/smarter-vlm/'
citation: 'Your Name, You. (2024). &quot;Paper Title Number 3.&quot; <i>GitHub Journal of Bugs</i>. 1(3).'
---

The architecture
================

We freeze the vision and text backbones and add some layers on top to promote
pooling of visual and textual features and also to reduce the cost of fine
tuning. The QF layer and the QF Fusion layer contain multihead self attention and
cross attentions. Typical fully connected layers on top and on the decoder side
we use a GRU layer. The GRU is used because it usually performs as well or better than other layers like LSTM while also having fewer parameters to update on the backward pass.

The data
========

The smart-101 dataset consists of a collection of questions with answers.
Each of the problems contains a base collection of text and 5 potential
answers. Each problem is actually a class (or collection) of problems.
The actual data is generated by code and images are also generated for each of
the 101 problem classes. The images are generated using openCV. Human level
performance on these data are quantified via the Math Kangaroo program.

For more details on smart-101 see [smart-101](https://smartdataset.github.io/smart101/).
For more details on math kangaroo see the [math kangaroo site](https://mathkangaroo.org/mks).

The findings
============

We find that there is improved performance over the baselines used in the smart-101 paper.

Critiques
=========

Some common critiques I heard at the ICML workshop and since then:

1. We didn't do enough epoches for fine tuning.

We were limited by amount of GPU compute time, and also the vision and text backbones
are large models. Keep in mind this wasn't work sponsored by my employer so we
did not have unlimited access to A100 GPUs to fine tune and perform extensive
ablations. We followed a recipe outlined by [Andrei Karpathy](https://karpathy.github.io/2019/04/25/recipe/) which remains excellent advice despite the fast moving nature of the space.

2. Images in math ai considered harmful.

Another common critique I heard is that there are lots of other Math AI papers
who have investigated the use of images and found they are either not helpful
or harmful (images in math ai considered harmful). If you read these other papers
you will find-in all the ones I've read-they do not customize the network architectures nor dothe have cross attention layers which pool information from the textual and image backbones.
We sought to disprove the claims that images in math ai considered harmful and we did.

In fact this was actually a panel discussion question during the workshop, "is text
alone enough?" the consensus was that while images may not be necessary they are sufficient.

That's math speak for images will help because there is a lot of data/information
contained inside the images but that math problems can be solved without images.


3. Images in math ai found not to help

This is a finding in the MathVerse paper, e.g. the model learns to shortcut the
vision features and rely primarily on the text features of the problem.
My commentary on this is the same as item 2 above, the architectures did not pool
the visual and textual information.

My Opinion on Images
====================

While I do not disagree with the premises of the panel, it seems to me a bit like
bringing a knife to a gunfight, or to use a less violent metaphor, playing chess
[blindfold or sans voir](https://en.wikipedia.org/wiki/Blindfold_chess) against your opponent. There are some quite talented chess players and some that can play blindfold extremely well, however, most will say there performance playing blindfold is hindered over playing sighted.

To be honest though, neither I not others in the community have an "answer" on
the question of images in math ai.

Conclusion
==========

While it remains to be seen whether purely text models can perform as well in the
math ai domain, our work suggests that there are several aspects of the problem
hitherto unconsidered.

If you are a researcher or institution who would like to work together in this space-or fund our investigations-please get in touch!

My email contact is there on the left hand side of the screen only a click away.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "Nearest-neighbor matchup effects: accounting for team matchups for p
collection: publications
category: manuscripts
permalink: /publication/2024-10-01-paper-title-number-1
excerpt: 'We introduce our novel nearest-neighbor matchup effects framework, which presents a flexible way to account for team characteristics above and beyond team strength that may influence game outcomes. '
excerpt: 'We introduce our novel nearest-neighbor matchup effects framework, which presents a flexible way to account for team characteristics above and beyond team strength that may influence game outcomes. The paper was covered by several media outlets, click into the page to see one example.'
date: 2015-02-23
venue: ' Journal of Quantitative Analysis in Sports'
slidesurl: ''
Expand Down

0 comments on commit 81f01d9

Please sign in to comment.