Discussion about this post

User's avatar
Rohit Gupta's avatar

I think the point you bring up about publishing negative results is priceless. I think that kind of sincerity is needed in general but for RL that is necessary condition to go up that slope of enlightenment. So many efforts result in the agent learning nothing. Not all but many of those could be interpreted and reported. It can be of help to all the practitioners of the field.

Expand full comment
Balázs Kégl's avatar

In our case (self-driving engineering systems: https://balazskegl.medium.com/building-autopilots-for-engineering-systems-using-ai-86a4f312c1f2) there are three motivating constraints:

1. Iterated offline (no online access to systems, but we can learn-and-deploy a few number of times, so it's not pure offline)

2. Micro-data: our systems are physical, don't get faster with time, our main bottleneck is extreme small samples.

3. Safety: we cannot "lose" while learning.

On the other hand, our systems are already nicely "feature-extracted", the dimensionality is relatively low (max 100s), so all the representation learning research on RL (atari, robots with camera) is meaningless for us. This also means that we can concentrate on small benchmarks where we can run a lot of experiments (to experimentally validate some choices, but also to report statistically valid results)

So the only remark I have is that sample complexity is important, otherwise I agree with everything.

The main question to me: what would you propose to do? We have a lot of fun internally, learning a ton on CartPole and Acrobot (also holding the SOTA on those), but current reviewing practices are extremely hostile to our niche approach (small dimensions, small systems, small data, but rigorous comparison). Also, for running proper challenges, algorithms should be evaluated on a simulator which requires computational power, and probably a third party that has no stakes in winning (cf ImageNet). Who would that be?

Expand full comment
13 more comments...

No posts