RRO achieved a reward score of 62.91 on the WebShop benchmark using only 1.86 sampled trajectories.
Amazon, UC San Diego’s new method speeds up AI training efficiency

RRO achieved a reward score of 62.91 on the WebShop benchmark using only 1.86 sampled trajectories.