My guess is that a lab could train a model that could achieve superhuman performance on all OpenAI gym environments (includes Montezuma's revenge, so would be hard for most individuals to do easily is my sense), but I don't think it would. What are the odds a lab would train their multi-game RL policy on every single environment, including https://www.gymlibrary.dev/environments/toy_text/? Maybe 20%?
So then the question is if there will be a model that was trained on Montezuma's revenge or can generalize to Montezuma's revenge and generalize to these sillier toy text environments, or if someone finetunes an open-sourced atari-trained model on these sillier environments. 25%?
Then an extra 10% because there are worlds I haven't accounted for, and this tends to push towards this event happening instead of not happpening.
20% + (100%-20%)*(25%) + 10% = 0.5
@NoaNabeshima I think that if they can get 80% of environments then they can also get 100% of environments, and training on the extra 20% is a relatively low cost (especially if there are transfer gains, which seems likely), and in return they get to say "we solved all the environments" which has historically been a pretty big motivator for RL research.
@vluzko Gato didn't train on toy text (I think), I imagine because it doesn't seem that helpful/important.
https://arxiv.org/pdf/2205.06175.pdf
@vluzko I mean they did train on control environments which seem similarly trivial. shrug
@NoaNabeshima I think Gato is a little weird as an example, I think more central RL papers often try to solve as many environments as they can. e.g. Agent 57 cared a lot about beating the sub benchmark of "all Atari environments", I think that there might be a similar push for "GymX" that solves all Gym environments. Not guaranteed of course but if I was writing a paper that solved most gym environments I would definitely spend a weekend running it on all the tiny esoteric environments too
@vluzko "all atari environments", "all control environments" seem like more natural kinds than "all gym environments" to me. Also, as a model learns to solve more environments, I think focusing on training on easy environments will be less of a big deal because it'll be more obvious that your model could solve them.