15 Comments

I think the point you bring up about publishing negative results is priceless. I think that kind of sincerity is needed in general but for RL that is necessary condition to go up that slope of enlightenment. So many efforts result in the agent learning nothing. Not all but many of those could be interpreted and reported. It can be of help to all the practitioners of the field.

Expand full comment

In our case (self-driving engineering systems: https://balazskegl.medium.com/building-autopilots-for-engineering-systems-using-ai-86a4f312c1f2) there are three motivating constraints:

1. Iterated offline (no online access to systems, but we can learn-and-deploy a few number of times, so it's not pure offline)

2. Micro-data: our systems are physical, don't get faster with time, our main bottleneck is extreme small samples.

3. Safety: we cannot "lose" while learning.

On the other hand, our systems are already nicely "feature-extracted", the dimensionality is relatively low (max 100s), so all the representation learning research on RL (atari, robots with camera) is meaningless for us. This also means that we can concentrate on small benchmarks where we can run a lot of experiments (to experimentally validate some choices, but also to report statistically valid results)

So the only remark I have is that sample complexity is important, otherwise I agree with everything.

The main question to me: what would you propose to do? We have a lot of fun internally, learning a ton on CartPole and Acrobot (also holding the SOTA on those), but current reviewing practices are extremely hostile to our niche approach (small dimensions, small systems, small data, but rigorous comparison). Also, for running proper challenges, algorithms should be evaluated on a simulator which requires computational power, and probably a third party that has no stakes in winning (cf ImageNet). Who would that be?

Expand full comment

This is a great thought provoking post. I particularly like that you highlighted these two different views of RL research (RL-first vs deployable RL). It seems to me that more explicit acknowledgment of these two views would help the RL community progress particularly with respect to how papers are reviewed. Reviewers and authors implicitly take on one of these two views and when they differ a paper may be judged unfairly. For instance, RL-first reviewers may prefer tabula rasa learning and look down on the use of domain knowledge whereas deployable-RL reviewers may want to judge an assumption of domain knowledge based on how reasonable or broadly applicable it is (e.g., assuming an imperfect domain simulator is a mild assumption for many applications). Explicit acknowledgment of what view a paper is written for could make it easier to decide what criteria to apply to the paper.

Expand full comment

Nice. We are working one introducing a novel challenge to the RL community, related to agroecology. Hopefully it should be released this spring :)

Expand full comment
Jan 5, 2023·edited Jan 5, 2023

A very good piece, thanks for sharing.

I disagree with "Criticize others' research" especially "..ask how a paper gets the field closer to real-world impact. If you are a senior reviewer or an area chair - consider instructing your reviewers to judge papers differently..." This is clearly not a good idea as it makes reviewing process even more noisy, e..g reviewers can simply ask for unrealistic use cases or use this to kill a paper if they don't like the paper.

Expand full comment

I disagree with this perspective. I think the claim

> with the current knowledge in the field, we believe there are concrete benefits to reap in deployable RL

is probably false, and at minimum need more justification. My expectation is that 95% of real-world tasks are best solved by paying humans to complete the task, recording their actions, and doing imitation learning on the resulting dataset. I would predict that almost every challenge that is proposed will be solved this way for the forseeable future.

The only reason to study or care about RL is to make progress towards the RL-first dream.

Expand full comment

2 points come to mind:

1. Can we create a challenge where the evaluated model actually does “something“ in real life, an api that allows your model to actually control some real-life entity. You could grant everyone access to the data from other model’s runs, and maintain some queue that people can submit their models to for evaluation.

2. I think there should be emphasis on goals for RL that aren’t “performance”. For instance, a regression model’s weights could carry more importance than the models ability to predict. It could have tremendous value without having ever been deployed. Can RL algorithms provide insight?

Expand full comment