ListOps brackets are not tokenized #3

simon-ging · 2021-08-03T11:18:27Z

Hi,

A ListOps input of "[MAX 4 3 [ MIN 2 3 ] 1 0 ])" will get encoded as "MAX 4 3 MIN 2 3 1 0" so all brackets are removed, which makes the task unsolvable.
This is also described here google-research/long-range-arena#20

How I got aware of this: In the paper, page 3 under ListOps you write "models are fed 512 tokens of dimension 15".
However there are 4 operations, 2 brackets and 10 numbers which would require dimension 16.
Checking the dataset code, there is one unused UNK token, 10 numbers, 4 operations which equals to a vocabulary length of 15.

Your code reproduces the ~38% accuracy of ListOps described in the paper correctly.

Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ListOps brackets are not tokenized #3

ListOps brackets are not tokenized #3

simon-ging commented Aug 3, 2021

ListOps brackets are not tokenized #3

ListOps brackets are not tokenized #3

Comments

simon-ging commented Aug 3, 2021