Advertisement
No access
Research Article

Heads-up limit hold’em poker is solved

Science
9 Jan 2015
Vol 347, Issue 6218
pp. 145-149

I'll see your program and raise you mine

One of the fundamental differences between playing chess and two-handed poker is that the chessboard and the pieces on it are visible throughout the entire game, but an opponent's cards in poker are private. This informational deficit increases the complexity and the uncertainty in calculating the best course of action—to raise, to fold, or to call. Bowling et al. now report that they have developed a computer program that can do just that for the heads-up variant of poker known as Limit Texas Hold 'em (see the Perspective by Sandholm).
Science, this issue p. 145; see also p. 122

Abstract

Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. Whereas many perfect-information games have been solved (e.g., Connect Four and checkers), no nontrivial imperfect-information game played competitively by humans has previously been solved. Here, we announce that heads-up limit Texas hold’em is now essentially weakly solved. Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a substantial advantage. This result was enabled by a new algorithm, CFR+, which is capable of solving extensive-form games orders of magnitude larger than previously possible.

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material

Summary

Source code used to compute the solution strategy
Supplementary Text
Fig. S1
References (5462)

Resources

File (1259433-bowling-source-code.zip)
File (bowling.sm.pdf)

References and Notes

1
C. Babbage, Passages from the Life of a Philosopher (Longman, Green, Longman, Roberts, and Green, London, 1864), chap. 34.
2
A. Turing, in Faster Than Thought, B. V. Bowden, Ed. (Pitman, London, 1976), chap. 25.
3
Shannon C. E., XXII. Programming a computer for playing chess. Philos. Mag. Series 7 41, 256–275 (1950).
4
Schaeffer J., Lake R., Lu P., Bryant M., CHINOOK the world man-machine checkers champion. AI Mag. 17, 21 (1996).
5
Campbell M., Hoane A. J., Hsu F., Deep Blue. Artif. Intell. 134, 57–83 (2002).
6
D. Ferrucci, IBM J. Res. Dev. 56, 1 (2012).
7
V. Allis, thesis, Vrije Universiteit Brussel (1988).
8
Schaeffer J., Burch N., Björnsson Y., Kishimoto A., Müller M., Lake R., Lu P., Sutphen S., Checkers is solved. Science 317, 1518–1522 (2007).
9
We use the word “trivial” to describe a game that can be solved without the use of a machine. The one near-exception to this claim is oshi-zumo, but it is not played competitively by humans and is a simultaneous-move game that otherwise has perfect information (49). Furthermore, almost all nontrivial games played by humans that have been solved to date also have no chance elements. The one notable exception is hypergammon, a three-checker variant of backgammon invented by Sconyers in 1993, which he then strongly solved (i.e., the game-theoretic value is known for all board positions). It has seen play in human competitions (see www.bkgm.com/variants/HyperBackgammon.html).
10
For example, Zermelo proved the solvability of finite, two-player, zero-sum, perfect-information games in 1913 (50), whereas von Neumann’s more general minimax theorem appeared in 1928 (13). Minimax and alpha-beta pruning, the fundamental computational algorithms for perfect-information games, were developed in the 1950s; the first polynomial-time technique for imperfect-information games was introduced in the 1960s but was not well known until the 1990s (29).
11
J. Bronowski, The Ascent of Man [documentary] (1973), episode 13.
12
É. Borel, J. Ville, Applications de la théorie des probabilités aux jeux de hasard (Gauthier-Villars, Paris, 1938).
13
von Neumann J., Zur Theorie der Gesellschaftsspiele. Math. Annal. 100, 295–320 (1928).
14
J. von Neumann, O. Morgenstern, Theory of Games and Economic Behavior (Princeton Univ. Press, Princeton, NJ, ed. 2, 1947).
15
We use the word synthetic to describe a game that was invented for the purpose of being studied or solved rather than played by humans. A synthetic game may be trivial, such as Kuhn poker (16), or nontrivial, such as Rhode Island hold’em (32).
16
H. Kuhn, in Contributions to the Theory of Games, H. Kuhn, A. Tucker, Eds. (Princeton Univ. Press, Princeton, NJ, 1950), pp. 97–103.
17
J. F. Nash, L. S. Shapley, in Contributions to the Theory of Games, H. Kuhn, A. Tucker, Eds. (Princeton Univ. Press, Princeton, NJ, 1950), pp. 105–116.
18
“Poker: A big deal.” Economist (22 December 2007), p. 31.
19
See supplementary materials on Science Online.
20
M. Craig, The Professor, the Banker, and the Suicide King: Inside the Richest Poker Game of All Time (Grand Central, New York, 2006).
21
J. Rehmeyer, N. Fox, R. Rico, Ante up, human: The adventures of Polaris the poker-playing robot. Wired 16.12, 186–191 (2008).
22
Billings D., Davidson A., Schaeffer J., Szafron D., The challenge of poker. Artif. Intell. 134, 201–240 (2002).
23
Koller D., Pfeffer A., Representations and solutions for game-theoretic problems. Artif. Intell. 94, 167–215 (1997).
24
D. Billings et al., in Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (2003), pp. 661–668.
25
M. Zinkevich, M. Littman, The AAAI Computer Poker Competition. J. Int. Comput. Games Assoc. 29, 166 (2006).
26
V. L. Allis, thesis, University of Limburg (1994).
27
F. Southey et al., in Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (2005), pp. 550–558.
28
Romanovskii I. V., Reduction of a game with complete memory to a matrix game. Sov. Math. 3, 678–681 (1962).
29
Koller D., Megiddo N., The complexity of two-person zero-sum games in extensive form. Games Econ. Behav. 4, 528–552 (1992).
30
Koller D., Megiddo N., von Stengel B., Efficient computation of equilibria for extensive two-person games. Games Econ. Behav. 14, 247–249 (1996).
31
Gilpin A., Sandholm T., Lossless abstraction of imperfect information games. J. ACM 54, 25 (2007).
32
J. Shi, M. L. Littman, in Revised Papers from the Second International Conference on Computers and Games (2000), pp. 333–345.
33
Sandholm T., The state of solving large incomplete-information games, and application to poker. AI Mag. 31, 13–32 (2010).
34
Rubin J., Watson I., Computer poker: A review. Artif. Intell. 175, 958–987 (2011).
35
Another notable algorithm to emerge from the Annual Computer Poker Competition is an application of Nesterov’s excessive gap technique (51) to solving extensive-form games (52). The technique has some desirable properties, including better asymptotic time complexity than what is known for CFR. However, it has not seen widespread use among competition participants because of its lack of flexibility in incorporating sampling schemes and its inability to be used with powerful (but unsound) abstractions that make use of imperfect recall. Recently, Waugh and Bagnell (53) have shown that CFR and the excessive gap technique are more alike than different, which suggests that the individual advantages of each approach may be attainable in the other.
36
M. Zinkevich, M. Johanson, M. Bowling, C. Piccione, in Advances in Neural Information Processing Systems 20 (2008), pp. 905–912.
37
N. Karmarkar, in Proceedings of the 16th Annual ACM Symposium on Theory of Computing (1984), pp. 302–311.
38
E. Jackson, in Proceedings of the 2012 Computer Poker Symposium (2012); www.ualberta.ca/~archibal/papers/jackson.pdf. Jackson reports a higher number of information sets, which counts terminal information sets rather than only those where a player is to act.
39
O. Tammelin, http://arxiv.org/abs/1407.5042 (2014).
40
M. Johanson, N. Bard, M. Lanctot, R. Gibson, M. Bowling, in Proceedings of the 11th International Conference on Autonomous Agents and Multi-Agent Systems (2012), pp. 837–846.
41
M. Johanson, K. Waugh, M. Bowling, M. Zinkevich, in Proceedings of the 22nd International Joint Conference on Artificial Intelligence (2011), pp. 258–265.
42
M. Bowling, M. Johanson, N. Burch, D. Szafron, in Proceedings of the 25th International Conference on Machine Learning (2008), pp. 72–79.
43
The total time and number of core-years is larger than was strictly necessary, as it includes computation of an average strategy that was later measured to be more exploitable than the current strategy and so was discarded. The total space noted, on the other hand, is without storing the average strategy.
44
These insights were the result of discussions with Bryce Paradis, previously a professional poker player who specialized in HULHE.
45
O. Morgenstern, New York Times Magazine (5 February 1961), pp. 21–22.
46
M. Tambe, Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned (Cambridge Univ. Press, Cambridge, 2011).
47
K. Chen, M. Bowling, in Advances in Neural Information Processing Systems 25 (2012), pp. 2078–2086.
48
P. Mirowski, in Toward a History of Game Theory, E. R. Weintraub, Ed. (Duke Univ. Press, Durham, NC, 1992), pp. 113–147. Mirowski cites Turing as author of the paragraph containing this remark. The paragraph appeared in (2), in a chapter with Turing listed as one of three contributors. Which parts of the chapter are the work of which contributor, particularly the introductory material containing this quote, is not made explicit.
49
Buro M., Solving the Oshi-Zumo game. Adv. Comput. Games 135, 361–366 (2004).
50
E. Zermelo, in Proceedings of the Fifth International Congress of Mathematics (Cambridge Univ. Press, Cambridge, 1913), pp. 501–504.
51
Nesterov Y., Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16, 235–249 (2005).
52
A. Gilpin, S. Hoda, J. Peña, T. Sandholm, in Proceedings of the Third International Workshop on Internet and Network Economics (2007), pp. 57–69.
53
K. Waugh, J. A. Bagnell, in AAAI Workshop on Computer Poker and Imperfect Information; www.cs.cmu.edu/~./waugh/publications/unify15.pdf.
54
M. Lanctot, K. Waugh, M. Zinkevich, M. Bowling, in Advances in Neural Information Processing Systems 22 (2009), pp. 1078–1086.
55
Hoda S., Gilpin A., Peña J., Sandholm T., Smoothing techniques for computing Nash equilibria of sequential games. Math. Oper. Res. 35, 494–512 (2010).
56
Gilpin A., Peña J., Sandholm T., First-order algorithm with O(ln(1/ε)) convergence for ε-equilibrium in two-person zero-sum games. Math. Program. 133, 279–298 (2012).
57
M. Johanson, N. Bard, N. Burch, M. Bowling, in Proceedings of the 26th Conference on Artificial Intelligence (2012), pp. 1371–1379.
58
N. Burch, M. Johanson, M. Bowling, in Proceedings of the 28th Conference on Artificial Intelligence (2014), pp. 602–608.
59
K. Waugh, D. Schnizlein, M. Bowling, D. Szafron, in Proceedings of the Eighth International Conference on Autonomous Agents and Multi-Agent Systems (2009), pp. 781–788.
60
R. Gibson, thesis, University of Alberta (2013).
61
K. Waugh et al., in Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (2009), pp. 175–182.
62
M. Zinkevich, M. Bowling, N. Burch, in Proceedings of the 22nd Conference on Artificial Intelligence (2007), pp. 788–793.

(0)eLetters

eLetters is a forum for ongoing peer review. eLetters are not edited, proofread, or indexed, but they are screened. eLetters should provide substantive and scholarly commentary on the article. Embedded figures cannot be submitted, and we discourage the use of figures within eLetters in general. If a figure is essential, please include a link to the figure within the text of the eLetter. Please read our Terms of Service before submitting an eLetter.

Log In to Submit a Response

No eLetters have been published for this article yet.

Information & Authors

Information

Published In

Science
Volume 347 | Issue 6218
9 January 2015

Submission history

Received: 31 July 2014
Accepted: 1 December 2014
Published in print: 9 January 2015

Permissions

Request permissions for this article.

Acknowledgments

The author order is alphabetical reflecting equal contribution by the authors. The idea of CFR+ and compressing the regrets and strategy originated with O.T. (39). This research was supported by Natural Sciences and Engineering Research Council of Canada and Alberta Innovates Technology Futures through the Alberta Innovates Centre for Machine Learning and was made possible by the computing resources of Compute Canada and Calcul Québec. We thank all of the current and past members of the University of Alberta Computer Poker Research Group, where the idea to solve heads-up limit Texas hold’em was first discussed; J. Schaeffer, R. Holte, D. Szafron, and A. Brown for comments on early drafts of this article; and B. Paradis for insights into the conventional wisdom of top human poker players.

Authors

Affiliations

Michael Bowling* [email protected]
Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
Neil Burch
Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
Michael Johanson
Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
Oskari Tammelin

Notes

*
Corresponding author. E-mail: [email protected]

Metrics & Citations

Metrics

Article Usage

Altmetrics

Citations

Cite as

Export citation

Select the format you want to export the citation of this publication.

Cited by

  1. Advanced Reinforcement Learning and Its Connections with Brain Neuroscience, Research, 6, (2023)./doi/10.34133/research.0064
    Abstract
  2. Student of Games: A unified learning algorithm for both perfect and imperfect information games, Science Advances, 9, 46, (2023)./doi/10.1126/sciadv.adg3256
    Abstract
  3. Elite professional online poker players: factors underlying success in a gambling game usually associated with financial loss and harm, Addiction Research & Theory, (1-12), (2023).https://doi.org/10.1080/16066359.2023.2179997
    Crossref
  4. The dominance of skill in online poker, International Review of Law and Economics, 74, (106119), (2023).https://doi.org/10.1016/j.irle.2022.106119
    Crossref
  5. Mastering “Gongzhu” with Self-play Deep Reinforcement Learning, Cognitive Systems and Information Processing, (148-158), (2023).https://doi.org/10.1007/978-981-99-0617-8_11
    Crossref
  6. Human Rights and Artificial Intelligence, Dynamics of Dialogue, Cultural Development, and Peace in the Metaverse, (1-14), (2022).https://doi.org/10.4018/978-1-6684-5907-2.ch001
    Crossref
  7. The Dynamics of Minority versus Majority Behaviors: A Case Study of the Mafia Game, Information, 13, 3, (134), (2022).https://doi.org/10.3390/info13030134
    Crossref
  8. Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning, Entropy, 24, 6, (774), (2022).https://doi.org/10.3390/e24060774
    Crossref
  9. Techniques and Paradigms in Modern Game AI Systems, Algorithms, 15, 8, (282), (2022).https://doi.org/10.3390/a15080282
    Crossref
  10. Extensive game decision based on the PPO-CFR algorithm under incomplete information, SCIENTIA SINICA Informationis, 52, 12, (2178), (2022).https://doi.org/10.1360/SSI-2022-0216
    Crossref
  11. See more
Loading...

View Options

Check Access

Log in to view the full text

AAAS ID LOGIN

AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.

Log in via OpenAthens.
Log in via Shibboleth.

More options

Register for free to read this article

As a service to the community, this article is available for free. Login or register for free to read this article.

Purchase this issue in print

Buy a single issue of Science for just $15 USD.

View options

PDF format

Download this article as a PDF file

Download PDF

Full Text

FULL TEXT

Media

Figures

Multimedia

Tables

Share

Share

Share article link

Share on social media