Adversarial policies beat professional-level Go AIs
goattack.alignmentfund.orgAgree with tasuki below. 2nd line of abstract says: "[achieve] >50% win-rate when KataGo uses enough search to be near-superhuman." If KataGo searches one move ahead it would never pass because in next move the opponent passing leads to defeat. The whole paper is bull-SHITE.
This is less an adversarial policy and more just a bug or bad config in katago.
It should probably reject friendly-pass mode (which is a flag iirc) and refuse to pass in tromp-tailor rules except under extreme circumstances.
Ehh, I'm generally hesitant to call bullshit, but I'm afraid I've got to do that here.
The TL;DR is that they win on a rules technicality. (I think by using a ruleset which isn't used by KataGo)
> However, KataGo plays a pass move before it has finished securing its territory, allowing the adversary to pass in turn and end the game. This results in a win for the adversary under the standard Tromp-Taylor ruleset for computer Go, as the adversary gets points for its corner territory (devoid of victim stones) whereas the victim does not receive points for its unsecured territory because of the presence of the adversary’s stones.
KataGo has finished securing its territory under virtually all rulesets but Tromp-Taylor. Tromp-Taylor is a fine ruleset (courtesy of our own https://news.ycombinator.com/user?id=tromp), but not one you'd use to, like, play against someone on the internet, or in real life.
When they induce KataGo to play bad moves, like Lee Sedol did in the only game he won against AlphaGo, they'll have my attention.
One of the authors here! I think from a perspective of humans playing Go this result is not very interesting since, as you say, Tromp-Taylor isn't the kind of ruleset humans like to play under. But KataGo was trained under (modified) Tromp-Taylor which we also evaluate under, so we think from an AI security and robustness stand point it's interesting that it fails at the rule set it was trained on.
> When they induce KataGo to play bad moves, like Lee Sedol did in the only game he won against AlphaGo, they'll have my attention.
That said, we are working to do exactly this! In particular, we found in the paper that we could patch the victim to not pass except in the end-game and defeat this adversarial policy. But, when we repeat the attack, we do eventually find an adversarial policy that can exploit the victim -- and this doesn't depend on the rule set. That part is still a work in progress, we've only got it working against a victim without search for example, but we hope to put out an updated revision including this in a month or two!
Any human referee would declare KataGo the winner in any of these games, just like they did in the Robert Jasiek vs Csaba Mero case[0].
> But KataGo was trained under (modified) Tromp-Taylor which we also evaluate under, so we think from an AI security and robustness stand point it's interesting that it fails at the rule set it was trained on.
Have you done anything in particular to induce this, or is it just the case that KataGo is completely incompetent at evaluating what constitutes a pass-alive territory?
> That part is still a work in progress, we've only got it working against a victim without search for example
That would still be amazing. KataGo without search is still a beast.
---
By the way, your website is weird:
1. "strength of a top-100 European professional" - Europe doesn't have that many professionals?
2. "Yet our adversary achieves a 99% win rate against this victim by playing a counterintuitive strategy." - Well, if your main goal is to leave a stone in your opponent's territory, this strategy isn't very counterintuitive?
> That part is still a work in progress
I hope to see the result soon.
Not exactly what I would call a rousing win for black https://i.imgur.com/9777fD3.png
> A player's score is the number of points of her color, plus the number of empty points that reach only her color.
No one ever plays games to completion like this.
> No one ever plays games to completion like this.
Oh I wish.
How silly is this.