Super human Stratego with RL and test time search
arxiv.orgOnly 2000 GPU hours Heavily customized network 95% win rate in recent human tournament sample Several training techniques for evaluation/learning rate
Only 2000 GPU hours Heavily customized network 95% win rate in recent human tournament sample Several training techniques for evaluation/learning rate