Settings

Theme

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

arxiv.org

131 points by aklein 3 years ago · 78 comments

Reader

hervature 3 years ago

I'm still trying to grok and implement the paper, but I studied AlphaGo/AlphaZero/MuZero during my PhD. The core contribution here is the Nash equilibrium component to imperfect information games using only self-play. Note, there is no MCTS being done in this paper. This differs from counter factual regret methods (like the most famous Poker AIs) because it does not need to compute for all possible "information sets" which makes it intractable for even sufficiently complicated poker variants. It should also be noted (as they do in the paper) that this is more incremental than methodologically innovative as AlphaGo. This is the AlphaZero step increment to NeuRD. As is my general critique with their previous papers, they generally omit many engineering details that prove to be very important. Here, they admit that fine-tuning is vitally important (one of the 3 core steps) but details are relegated to the supplementary materials. It also opens up the question of if this new "fine-tuned" policy still guarantees the Nash equilibrium which it obviously does not as some mixed strategies are going to have sufficiently small probability. I wish researchers would be more honest with "this is a hack to get things to work on a computer because neural networks have floating point inaccuracies". It doesn't ruin any of the theory and no one is going to hold it against you. But it causes all sorts of confusion when trying to reimplement.

  • TemplateRex 3 years ago

    What I don't understand is why they don't try to make inferences about the opponent's private state. I get that the full Bayesian update is intractable, but some sort of RNN or LSTM should be able to produce pretty accurate estimates for the opponent's private info. And with self-play, you can train the deduction head of a NN by adding a KL-divergence between inferred and ex-post observed pieces. That would both make you guess better and also try and "jam" your opponent's inference by randomizing your own piece distribution.

    • hervature 3 years ago

      This is an interesting avenue for future research. The reason why it is not as straightforward as you claim is because all inference is going to depend on your perception of their policy. That's why the Nash equilibrium is sought after first. Because you should assume your opponent is perfect until you start observing their suboptimal behavior that you can exploit. Additionally, you would also have to handle the meta part where the exploiting portion of the algorithm isn't itself being exploited by the opponent. Somehow, you should deviate slowly from the Nash equilibrium but revert quickly if the opponent is abusing your new strategy.

      • TemplateRex 3 years ago

        But their NN already outputs a policy conditional on public and private info! Why not have a separate intermediate branch in the NN that is fed with the current estimate of private info (for both players) and outputs the policies (again for both players) given those info estimates? Wouldn't it be possible to learn from that?

        • hervature 3 years ago

          First, the neural network is taking the history of observations into account. We don't know what the NN has learned, but the NN is probably making some inference on likelihood of opponent piece locations. They haven't explicitly coded it to do that but it is difficult to imagine a human-level AI not doing this.

          Second, what you are suggesting is probably best done as a secondary process outside of learning the Nash equilibrium. If you knew an opponent's policy, you would need to recalculate your optimal counterplay for that specific policy. This is completely orthogonal to the goal of this paper which is to learn the Nash equilibrium through self-play alone.

    • thomasahle 3 years ago

      Bayesian play is not necessarily optimal for imperfect information games. The reason is: You don't only need to play optimally with respect to the information you have observed, you also need to hide your own information and balance those two needs.

      See the Deep Mind "Player of Games" paper from last year for an agent that takes a more game theoretic approach, which is probably needed for "simpler" games like Poker, that we can play to higher levels of accuracy: https://arxiv.org/pdf/2112.03178.pdf

  • algo_trader 3 years ago

    > I'm still trying to grok and implement the paper, but I studied AlphaGo/AlphaZero/MuZero during my PhD

    What is the SOTA on solving non-adversarial (single player?) POMDPs? Are those considered to be much simpler problems?

    • hervature 3 years ago

      POMDPs is exactly how one formalizes imperfect information games. This is where the concept of information sets comes from. To answer your question, any two player algorithm is going to apply to single player games as it is trivial to transform. For games like 2048, the "adversary" is simply the opposite of your outcome. For games where you are trying to maximize your score, this is the standard RL setting and any of the Atari algorithms (including MuZero) can be used.

      In case you are wondering about cooperative multi-agent games, I would check this group's publications: https://www.cs.ox.ac.uk/people/publications/date/Shimon.Whit...

    • joe_the_user 3 years ago

      POMDPs? Are those considered to be much simpler problems?

      Well, solving a Partially Observed Markov Decision Processes in general isn't just NP-complete but actually undecidable. So I'm not sure how one measures SOTA (state of the art).

      • thomasahle 3 years ago

        > I'm not sure how one measures SOTA (state of the art).

        You measure it by practical performance on a test suite of important POMDPs and games.

    • igorkraw 3 years ago

      It strongly depends on what type of structure you can assume and how expensive sampling is. Dreamerv2, agent57 on Atari, dreamerv2 and the generalized agent model trained on 600 tasks by deepmind might be worth looking into for different approaches on pomdps, but you can do much better if you impose physics priors by e.g. using neural ODEs for the latent state modeling.

      POMDP just means "observations are not state" and that you need to use a stateful policy to infer the state somehow, but without further assumptions it's difficult to answer this question

miiiiiike 3 years ago

Got a copy of Stratego (one of the old-style ones with descending rank, as pleases the gods) so I could show my board game loving girlfriend one of my favorite games from when I was a kid. She hated it.

  • jrussino 3 years ago

    Did you still enjoy it? I loved playing this game with my Dad when I was a kid and I'm wondering if it still holds up.

    • jasone 3 years ago

      It held up for me. I played a lot of Stratego with family and friends as a kid, but there was one friend in particular who routinely wiped the battlefield with my pieces. I couldn't understand how he so consistently beat me, and as an adult I was able to 1) reason more deeply about the game, and 2) quickly learn deeper strategy from the Internet. As a result, the game is even richer now to me than it was in the 1980s.

    • TedDoesntTalk 3 years ago

      Played it recently with a 12 year old boy. He loves it.

    • miiiiiike 3 years ago

      Loved it. I still think it’s one of the best games out there. The best games tho: Neuroshima Hex! And 51st State from Portal games.

  • LionTamer 3 years ago

    Seeing the paper title gave me so much nostalgia for playing Stratego as a kid, that was always my favorite Board game. Glad to see I’m not the only one who used to love that game, few of my friends growing up played it.

  • kibwen 3 years ago

    Stratego was probably my favorite board game as a kid, but, to be fair, when people these days say that they love board games what they probably mean is that they love Eurogames, which Stratego decidedly is not.

    • miiiiiike 3 years ago

      Yeah, no. She plays everything.

      She’s one of the few people who can regularly beat me at war games. The Avalon Hill and GMT types. My favorite example is the time she carefully mislead me into believing that she hadn’t found the other end of a wormhole that opened near my home-world in a game of Space Empires 4x. Spent the whole game exploring and increasing ship speed and weapons just enough, waiting for me to commit my heavily armed, but slow-ish ships. Then, giggled, and sighed for relief, as she paid the overnight rate to send an armada to my doorstep. Oh, and time she slipped a WMD into the US in Labyrinth. Or in Starship Troopers..

      She likes everything from Codenames to those Rosenberg games that are so heavy they should come with an OSHA training poster.

      In Stratego she had bluffed me into believing her flag was in a corner.. But then she move a piece that I assumed was a bomb and her face gave away everything. But a momentary lapse of Stratego-face wasn’t the issue. Mostly, it was just too slow for her. Her last obsession was Tyrants of the Underdark. Deck building + area control.

thomasahle 3 years ago

I really like the section on initial piece deployment:

> The Flag is almost always put on the back row, and often protected by Bombs. Occasionally, however, DeepNash will not surround the Flag with Bombs. Experts (e.g. Vincent de Boer, 3-fold World Champion) believe that it is indeed good to occasionally not protect the Flag because this unpredictability makes it harder for the opponent in the end-game. Another pattern observed is that the highest pieces, the 10 and 9, are often deployed on different sides of the board. Additionally, the Spy is quite often located not too far away from the 9 (or 8), which protects it against the opponent’s 10. DeepNash does not often deploy Bombs on the front row, which complies with the behavior seen from strong human players. The 3’s (Miner), which can defuse Bombs, are often placed on the back row, which makes sense because their importance typically increases throughout a game as more opponent Bombs and potential Flag positions get revealed. The eight 2’s (Scout) are typically deployed both in the front and more in the back, allowing to scout opponent pieces initially but also in later phases of the game.

spywaregorilla 3 years ago

Stratego is an odd choice I feel. Evaluating it must be really hard. A significant chunk of the game is just trying to remember which unit is which of the things you've seen so far. Which humans generally can't do very well but machines can do easily. Beating hand crafted bots is good.

Can expert humans beat the hand crafted bots? I'm guessing no. Also, what's stratego like without the hidden units? Is that... hard?

  • boringg 3 years ago

    I don't think you are that familiar with how good Stratego is played. It isn't strictly a memorize where your opponents units are. There's a significant degree of bluffing and posturing.

    • spywaregorilla 3 years ago

      It isn't strictly about memorizing where your opponents are, but having a perfect memory would be an enormous advantage, the point of being an entirely different game imo.

      • EthanHeilman 3 years ago

        Most good players of stratego have perfect memory of all revealed information during a game. That's basic table stakes for playing, like memorizing the dictionary is for scrabble or memorizing openings for chess. It isn't very hard to do, since most moves don't reveal the value of piece and those moves to do reveal a piece tend to led to the destruction of that piece within a move or two. A computer isn't going to have an advantage there.

      • treis 3 years ago

        Seems like it would be easy to test. See how the AI does when revealed pieces stay revealed compared to a standard game.

        Also, given this is online competitive play likely that a significant portion of top human players are cheating and have aides to keep track of pieces.

        • TemplateRex 3 years ago

          The pace in online games is too fast (4s per move usually, with a 12m buffer) to use pen and paper aides. Short of having some sort of screen-scrabbing AI-tool that auto-labels pieces as they are revealed (doable by a decent programmer, but Stratego is no poker, there's no money in it), I think it's safe to assume top-level players don't cheat. Top-level play in live tournaments has a high correlation with top-level online play as well.

        • spywaregorilla 3 years ago

          > Seems like it would be easy to test. See how the AI does when revealed pieces stay revealed compared to a standard game.

          It would effectively need to be a different AI to use that information. And what does it play? Humans? Other AI?

          > Also, given this is online competitive play likely that a significant portion of top human players are cheating and have aides to keep track of pieces.

          That's a great point. It may even be condoned in some circles.

      • ghaff 3 years ago

        Yes, playing it as a kid I remember one pretty common thing to do was to make movements that made it easy for your opponent to get confused about whether a piece was that known piece or something else. Not the whole strategy but take that piece out and you definitely change the game I would think relative to typical human play. Especially in the context of it normally being more of a kids game.

    • thekiptxt 3 years ago

      To the extent that machines can’t replicate. If I place my hand on a piece that’s illegal to move, feign that I’m reconsidering, and then move a different piece, then my opponent may suspect that the originally touched piece may be moved.

      How can I use these 200 IQ moves against a bot?

      • elcomet 3 years ago

        This does not apply when playing machines. Like facial expressions or gestures do not apply when playing poker with machines.

  • TemplateRex 3 years ago

    The best available bot that is also mentioned in the paper, is Probe. Expert humans will score the same as DeepNash against Probe. The best humans have no trouble recalling every piece that moved, and the square it originated from. Top-level play usually has very few moved pieces (since they are vulnerable to your opponent's general once your own marshal gets revealed), so memory is important but typically not the main bottleneck.

    • spywaregorilla 3 years ago

      > Top-level play usually has very few moved pieces (since they are vulnerable to your opponent's general once your own marshal gets revealed), so memory is important but typically not the main bottleneck.

      Interesting. Is top level play... boring? Stratego doesn't have a lot of nuance to positional advantaging aside from moving forward or back, and while I'd imagine there's stalemate rules, there's probably a lot of nothing moves to dance around getting super minor and uninteresting advantages. Is that a correct statement?

      • TemplateRex 3 years ago

        It depends on player attitudes. The maneuvering is simpler than in chess or even checkers, although you can have very intricate multi-piece pincers around the lakes. Most top players however tend to be quite aggressive, sacking majors or even colonels to expose the opponent’s top two pieces, while at the same time having their counter attacks ready. Typically there are 3 simultaneous battles for each of the 3 alleys. But, yeah, as in chess, there can be bloodless draws or fortress positions where neither wants or can make headway.

  • Imnimo 3 years ago

    From the paper:

    > Successes in the game have been limited, with artificial agents only able to play at a level comparable to a human amateur, see e.g. (14–20).

    • riku_iki 3 years ago

      It could be because no one seriously tried to build competitive AI player.

      • thomasahle 3 years ago

        Computer Stratego World Championship has been held since at least 2007.

        • riku_iki 3 years ago

          This doesn't bring much information. It could be some students competition.

          Also, per my brief google search they stopped doing it after 3 competitions in 2009.

  • dr_orpheus 3 years ago

    > A significant chunk of the game is just trying to remember which unit is which of the things you've seen so far

    While this does get hard the further you go in the game, a think a more significant portion of the strategy is trying to predict newly moving units based on what is happening in the game. i.e. I just took a 7 unit with my 8. And now my opponent is coming straight towards me with a unit from elsewhere. Is it a 9 or 10, is it a bluff to drive me off, to drive me in to a spot where the actual 9 or 10 is waiting?

    • aidenn0 3 years ago

      Did they renumber the pieces? In the set I played as a kid a scout was 9 and a Marshall was 1...

      • wakamoleguy 3 years ago

        Yes, around 2000 they swapped the numbers. Most recent versions treat the Spy as 1 and Marshal as 10. Classic versions ranked the Marshal as 1 and Scouts as 9, with the Spy designated as S.

        • andruby 3 years ago

          I bought a set to play with my kids and felt like something was off but couldn’t really recall. Thanks!

    • spywaregorilla 3 years ago

      Computers are better at such problems.

  • janosett 3 years ago

    You might consider reading the linked abstract: “… Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play.”

    • spywaregorilla 3 years ago

      Sounds like bullshit to me. I'm not convinced. Here's a paper from 2021 that suggests as much. It's hard, sure, but it's also not really seriously explored.

      > Compared to other games like Chess and Go, not much work has been done on creating an AI agent for Stratego. As such, the available literature is far and few between, and mostly consists of bachelor’s and master’s theses. In fact, most agents created for Stratego are largely undocumented or closed-source, making it difficult to effectively ascertain exactly which particular methods and techniques have been applied and how effective they were.

      edit: the claim about bots not beating humans appears to hold, but I'm not convinced its just shoddy bot quality.

      • JoshCole 3 years ago

        Just to give a sense of hardness, Stratego's game tree is infinitely larger than Go's game tree, because in imperfect information you select a continuous strategy vector over action space whereas in Go you select a discrete action. Meanwhile Stratego is also infinitely more complex than Go because in Go there are less moves with every move, but in Stratego moves don't monotonically decrease the remaining game length.

        Beyond those two infinities of greater complexity, Stratego is imperfect information, so it is played relative to the infostate, not the state. In Stratego there are thirty three pieces with unknown information on the first move. We can definitely get a lower upper bound by applying abstraction via domain knowledge, but just so I don't have to deal with the complexity I'll state that there are at most 8683317618811886495518194401280000000 different states associated with the infostate of your first move. Meanwhile, on your first move in Go, you are in at most 1 state.

        In practice though the average length of a game of Go is ~200ish, the average length of Stratego ~400ish.

        Much like chess, I would expect optionality to be an important strategic consideration in Stratego. So branching factors are likely selected for such that they get higher by good agents. In contrast to something like Go, the branching factor would tend to diminish over time.

        Mostly sharing this because I think most people are bad at reasoning about complexity - our mind is really good at making complex things seem simple and simple things seem complex because we can actually deal with the complexity of simple things in their full complexity but dealing with the full complexity of the complicated is intractable. I imagine a lot of people's mind latch onto the things like "we get information so playing well means memorization" and don't pay as much focus to the dizzying complexity; but playing well isn't memorizing. Playing well is playing perfectly according to your uncertainty and it just so happens that as part of doing that you sometimes reduce your uncertainty by scouting.

        • spywaregorilla 3 years ago

          This is all technically true, but it's also misleading I feel. Chess and Go have a lot of states, and you need all of them. The entropy of the games are enormous. Move that bishop and the significance of every piece on the board could change. Stratego has verrrryy low entropy. You can only move one piece one space at a time aside from scouts. You can also enter scenarios where the game just simplifies on some dimensions immensely to which some fairly dumb algorithms can win the game. If the enemy loses their top dude and their spy, your top guy is invincible against anything that has ever moved, which isn't victory assured but its probably relatively easy to write an ai to win from that point on.

          Maybe this is just me but when I thought about stratego I'm pretty sure I only thought about a few units at a time. Moving all of them is a bad idea because of bombs anyway. Your mental model of the game does not consider the state of units in the back row of either side. It probably doesn't even consider all of the units in the front.

          I also suspect a lot of optimal play is just doing dumb shit back and forth to force your opponent into making the more aggressive move in a lot of cases without technically forfeiting which most humans likely wouldn't do but ai would like to do if not constrained otherwise.

          But I agree a simple tree search on possible moves is unlikely to work very well until the end game.

          • JoshCole 3 years ago

            If I'm understanding you correctly, you're saying that the rules of Stratego let you apply an abstraction rule which drastically simplifies the game. I agree you can do this. I think this point is remarkably similar to the technique of blueprint abstraction.

            Of course, we ought to be focused on the situations which lead to these nice situations - which leads us back to the imperfect information situations which lead to those positions we can reason about.

            Interestingly, you can invert your point and get a similar rule more generally. The trick is to backward induction on the abstracted category transitions. So your point is actually so valid that is valid even in situations where your examples don't apply. I hope you can see I'm strong manning here. I agree with you that this should be done and is critical to making things tractable.

            There is an issue though, at least in the generalized case, which is that perfect information is much easier than imperfect information.

            See, in perfect information games when you do the backward induction step you actually just flat out solve the game. Meanwhile, in imperfect information games, this backward induction step merely makes solving the game more tractable. Chess endgames are the backward induction version of your forward induction from the rules abstraction, but the abstraction rule is so hard to determine we wouldn't usually think of it that way. Notice that chess is hard enough that we don't have endgame tables that go all the way back to the start of the game? Yet the average game length in Stratego is hundreds of moves longer and there are more then double the number of pieces.

            I still think your point is right and someone who pushes hard enough in this direction would manage to tackle the complexity. So I basically agree with you. But if you take standard approaches like counterfactual regret minimization and throw it at the game without adjusting them for the fact that the problem is "hard" then they just wont terminate.

            • spywaregorilla 3 years ago

              I guess I have two points. Perfect information stratego is not a hard game at all. There's still a lot of moves one can make but perfect play is easy to calculate. Near perfect play is probably easy enough even for an amateur player. My gut says many games will converge on this state (early) if you have an unbeatable piece.

              Without perfect information, the number of states is still mostly a red herring, because the differences between the states are immaterial. The moves aren't super important. If you decide to move frontline unit A against the frontline unit on the opposite side of the field, there may be several moves between that but you've only made one noteworthy decision. Could be bad intuition. I think a more abstract model here would do better and be far simpler with some minor tactical move prioritization or whatever.

              • JoshCole 3 years ago

                > Could be bad intuition.

                This actually does seem to be bad intuition to me, because your intuitive explanation isn't being expressed in terms of strategies. You wouldn't want to "attack this frontline unit" but would want to have a probability distribution over your potential options. This is similar how you wouldn't want to "play rock" in RPS, but would instead want to play {R: 1/3, P: 1/3, S: 1/3}.

                In a certain sense the thing that is "different" about imperfect information is that you shouldn't play as if you are making only one decision. You should play as if you are in multiple different game states at the same time.

                FWIW, I don't think detracts from your point about it being possible to simplify the game, just your reasoning explaining the "why" we ought to seems off to me.

                • spywaregorilla 3 years ago

                  > This actually does seem to be bad intuition to me, because your intuitive explanation isn't being expressed in terms of strategies. You wouldn't want to "attack this frontline unit" but would want to have a probability distribution over your potential options. This is similar how you wouldn't want to "play rock" in RPS, but would instead want to play {R: 1/3, P: 1/3, S: 1/3}.

                  My point is that your decision to attack is one that takes 3-5 turns with no meaningful possible positional maneuvering. So a game may have 200 turns but many fewer "decisions". There's not much "area control" like in many other strategic boardgames. It's more bluffing about your original setup choices. Every battle is reduced to a bet of whether your unit is higher or not and if it is, well you don't have a lot of agency on that. Especially if attacking.

                  • TemplateRex 3 years ago

                    It's true that many move sequences can be collapsed to a single maneuver. However, the area control remark is inaccurate: controlling or occupying the 3 alleys is very important and requires careful positional play. Getting the right "lane parity" to attack pieces frontally is crucial here, as is pincer movements around the lake with multiple power pieces. You seem to view Stratego as a superficial and repetitive game, but high level play goes considerable beyond a few bluffs and mechanical exchanges.

  • Buttons840 3 years ago

    I think the best human players can beat the best computer players in Stratego. Thus, Stratego is an excellent choice.

hirundo 3 years ago

When I was a kid I "won" a Stratego game with a non-move that my friend claimed was against the rules. So he claimed the win. Could I get an umpire's call here?

The issue is that when considering my next move, I picked up the bomb piece, thought for a few moments, put it back down, and moved another piece. My friend then, assuming that I had just given away that it was not a bomb, attempted to capture it, and lost the attacker.

He claimed that it was illegal to pick up that piece and put it down again, although he had no objection until he learned that I'd tricked him. We had never previously announced or enforced a touch-it-move-it rule.

So did I win that game or did he? That's not a question machine learning could answer.

  • aidenn0 3 years ago

    Official ISF tournament rules[1] say:

    > Touching one of his own pieces does not oblige a player to move it.

    It also says:

    > Psychology, bluff and misleading manoeuvres are considered important aspects of Stratego. Bluffing consists of all verbal communication (talking) or non-verbal communication (acting, mimic or feign) which is intended to mislead your opponent. All forms of bluff are allowed at any time during the game, unless prohibited by any other rule. Abuse will be considered unsporting behaviour and can be penalized accordingly.

    So I think you won.

    1: https://isfstratego.kleier.net/docs/rulreg/isfgamerules.pdf

    • toast0 3 years ago

      I agree

      > 5.2 Moving

      > Flag and Bombs are never moved (For the definition of „move‟: see chapter 6).

      Chapter 6 shows the sequences of moves, and then Chapter 7 says:

      > A move is made when:

      > a piece is released on another square than the starting one, or

      > a player touches an opponent‟s piece with one of his own pieces or with the hand in which his own piece is held.

      So if the piece was picked up and replaced on the starting square, it was not moved, and that's fine.

      On the other hand, these rules incorporate some other rules by reference which includes:

      > The bombs and the flag may never be moved and therefore remain in the same place throughout the duration of the game

      A bomb hasn't exactly remained in the same place if it's been picked up, has it?

      • TemplateRex 3 years ago

        In actual live game play, none of the top players use such kids' ploys as touching bombs pretending to be moveable pieces. Actual game play wouldn't change if the one touch move rule from chess or checkers were to be introduced.

        The main thing the rules guard against is accidentally tripping over your opponent's or your own pieces. Especially during time trouble, it sometimes happens that you drop a piece. If your opponent reveals < 4 of your pieces, you can exchange all of them with any other of your pieces. With >= 4 "accidents", it's an automatic forfeit. If you tip over your opponent's flag, he is entitled to 3 swaps (may or may not be bombs).

        • aidenn0 3 years ago

          I always wondered if the pieces could be redesigned to have a strong bias towards landing "face down" when bumped; I've definitely accidentally revealed pieces before when making a quick move.

  • HWR_14 3 years ago

    You won because "He claimed that it was illegal to pick up that piece and put it down again, although he had no objection until he learned that I'd tricked him."

    I think "touch-move" is perfectly valid to enforce, but you waive your right to do so once you move after that. If someone touches a piece in chess, then moves another piece, you don't get to go all the way until you're mated and then say there was an error you should have won on.

    And if you touch an illegal to move piece, I would say there the penalty would probably be revealing the bomb (or that it is immobile), not forfeiture of the game.

  • boringg 3 years ago

    Hmm that you picked up the piece might have been an infraction. I constantly touch the bomb pieces but I don't lift them off the ground which implies that they can move.

    Tough call - glad that you are still carrying this from your childhood though. I would wager that you won but through borderline cheating. Still not sure TBH

  • Swenrekcah 3 years ago

    You won. Deception is obviously a critical part of all warfare.

    • yborg 3 years ago

      By that argument, the opponent rightfully won, because negotiation is also a critical part of warfare. He got his opponent to give away a tactical battlefield victory on the basis of mutually agreed ground rules.

      Of course, dissatisfaction with such outcomes has often resulted in further wars.

  • milesskorpen 3 years ago

    If you're playing seriously, I think if you touch a piece you need to move it.

  • Imnimo 3 years ago

    Wow! I did exactly the same thing as a kid. I though I was sooooo clever.

evouga 3 years ago

I mean. Stratego is a great game; I had a lot of fun playing it at summer camps when I was a young boy. It's cool there's a good AI for it.

But this result feels a bit anticlimactic in a world where AIs can already beat expert humans at go, six-player poker, Starcraft, ...

  • goodside 3 years ago

    It’s explained right in the abstract why Stratego is a more difficult game for AI than go or poker.

    • TemplateRex 3 years ago

      You can view Stratego as the "Cartesian product" of a public information board game and an imperfection information "card" game. The board game has much simpler local tactics than e.g. chess or checkers, although whole board tactics where 2+ high pieces are trapping 1 lower piece defended by 1+ high piece are extremely complicated to reason about.

      The "card" game can be viewed as a form of Limit Poker. The bidding in poker is done with secret cards and public bets. In Stratego, you bet with your secret pieces, so it's more like a closed bid auction. But since there are only 10 moveable ranks, the range of bluffs you can pull off is rather limited compared to e.g. No Limit Poker.

      Each of the "subgames" in itself are quite tractable for computers. But the numerical product of all public game states times the number of secret information states is humongous. Combine this with the fact that imperfect information game trees don't decompose as nicely as e.g. chess game trees, and computers will also not be able to divide-and-conquer their way out of the numerical complexity with brute force. Whereas humans can come a long way with heuristics.

  • 60secz 3 years ago

    These are interesting not because you're solving for a game, but because you're potentially partially solving a category of problem.

  • kzrdude 3 years ago

    How far did they get with Starcraft? Stratego should be a stepping stone to get there - it introduces imperfect information.

    • anonymoushn 3 years ago

      spywaregorilla left a good reply about Starcraft. In the games I watched, despite the AI being handicapped to use human-like levels of APM and human-like viewport management, it primarily fought using *clearly superhuman* blink stalker micromanagement. Stalkers with barely any health would typically survive engagements and get to re-engage when their shields were fully recovered. On the other hand, a human player managed to confuse the bot by repeatedly airdropping units in its base, picking them up, and dropping them elsewhere. The bot uselessly moved its mass of stalkers back and forth while losing many probes and buildings.

      Edit: After looking at some games again, the AI also benefits from precise target selection for the phoenix's graviton beam. A human player might take 3 phoenixes and a bunch of stalkers into a fight against an army containing 3 immortals, some sentries, some stalkers, and some zealots, and use graviton beam to pick up some mix of units including units other than immortals. The bot can pick up only the immortals.

    • spywaregorilla 3 years ago

      It's very difficult to say. There are many cookie cutter strategies for RTS games and it's difficult to handicap the ai to note use its machine precision and quickness of thinking to just win everything on a tactical level. actions per minute handicaps are not nearly nuanced enough to capture this. Generally it seems that the absolute top humans are better than bots strategically but the execution to do so is really, really tough. And then of course there are dumb exploits because you find some dumb weakpoint than any sane human would quickly adjust their behavior for.

    • confuseshrink 3 years ago

      Starcraft in the form of Alphastar worked in the sense that it could beat humans, at least in the short term. The problem with the whole technique is that they had to tether it to the human examples they had gathered in the form of a divergence loss.

      I haven't checked out the linked paper yet but if they managed to do something from first principles that would still be an interesting development.

Someone 3 years ago

My gut feeling is that optimum play in Stratego is not to play.

It feels better to let your opponent try and take your piece because, if they take it, you can make sure there will be at least one neighboring piece that can strike back.

If so, every game should end in a draw because of inactivity of both players.

My limited experience confirms that. Playing defensively, only offering my scouts to get intel tends to win games for me.

But then, I’ve never found any strategy guides, and wouldn’t know how good players play.

  • thomasahle 3 years ago

    I you read the article you will see a lot of discussion on different strategies the bot has (re)discovered. You can also try playing against others on the internet and see how well your method works.

voidfunc 3 years ago

I haven't played Stratego since I lost my board when I was in third grade and brought it to play during recess...

Is there a good online version these days?

mensetmanusman 3 years ago

Would it be interesting if the posted the approximate kWhr energy required to train?

bezoz 3 years ago

So we have gone from DQN to Alpha Go to Alpha Zero to Mu Zero to Deep Nash? Every time I thought I have figured out their naming scheme, they come out with something even more unpredictable.

warrenm 3 years ago

I haven't played Stratego in decades!

Loved it as a kid, though

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection