Research Article

Heads-up limit hold’em poker is solved

Michael Bowling [email protected], Neil Burch, Michael Johanson, and Oskari TammelinAuthors Info & Affiliations

Science

9 Jan 2015

Vol 347, Issue 6218

pp. 145-149

DOI: 10.1126/science.1259433

CHECK ACCESS

I'll see your program and raise you mine

One of the fundamental differences between playing chess and two-handed poker is that the chessboard and the pieces on it are visible throughout the entire game, but an opponent's cards in poker are private. This informational deficit increases the complexity and the uncertainty in calculating the best course of action—to raise, to fold, or to call. Bowling et al. now report that they have developed a computer program that can do just that for the heads-up variant of poker known as Limit Texas Hold 'em (see the Perspective by Sandholm).

Science, this issue p. 145; see also p. 122

Abstract

Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. Whereas many perfect-information games have been solved (e.g., Connect Four and checkers), no nontrivial imperfect-information game played competitively by humans has previously been solved. Here, we announce that heads-up limit Texas hold’em is now essentially weakly solved. Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a substantial advantage. This result was enabled by a new algorithm, CFR⁺, which is capable of solving extensive-form games orders of magnitude larger than previously possible.

Get full access to this article

View all available purchase options and get full access to this article.

CHECK ACCESS

Supplementary Material

Summary

Source code used to compute the solution strategy

Supplementary Text

Fig. S1

References (54–62)

Resources

File (1259433-bowling-source-code.zip)

Download
84.33 KB

File (bowling.sm.pdf)

Download
299.46 KB

References and Notes

C. Babbage, Passages from the Life of a Philosopher (Longman, Green, Longman, Roberts, and Green, London, 1864), chap. 34.

Google Scholar

A. Turing, in Faster Than Thought, B. V. Bowden, Ed. (Pitman, London, 1976), chap. 25.

Google Scholar

Shannon C. E., XXII. Programming a computer for playing chess. Philos. Mag. Series 7 41, 256–275 (1950).

Crossref

Google Scholar

Schaeffer J., Lake R., Lu P., Bryant M., CHINOOK the world man-machine checkers champion. AI Mag. 17, 21 (1996).

ISI

Google Scholar

Campbell M., Hoane A. J., Hsu F., Deep Blue. Artif. Intell. 134, 57–83 (2002).

Crossref

ISI

Google Scholar

D. Ferrucci, IBM J. Res. Dev. 56, 1 (2012).

Crossref

Google Scholar

V. Allis, thesis, Vrije Universiteit Brussel (1988).

Google Scholar

Schaeffer J., Burch N., Björnsson Y., Kishimoto A., Müller M., Lake R., Lu P., Sutphen S., Checkers is solved. Science 317, 1518–1522 (2007).

Crossref

PubMed

ISI

Google Scholar

We use the word “trivial” to describe a game that can be solved without the use of a machine. The one near-exception to this claim is oshi-zumo, but it is not played competitively by humans and is a simultaneous-move game that otherwise has perfect information (49). Furthermore, almost all nontrivial games played by humans that have been solved to date also have no chance elements. The one notable exception is hypergammon, a three-checker variant of backgammon invented by Sconyers in 1993, which he then strongly solved (i.e., the game-theoretic value is known for all board positions). It has seen play in human competitions (see www.bkgm.com/variants/HyperBackgammon.html).

For example, Zermelo proved the solvability of finite, two-player, zero-sum, perfect-information games in 1913 (50), whereas von Neumann’s more general minimax theorem appeared in 1928 (13). Minimax and alpha-beta pruning, the fundamental computational algorithms for perfect-information games, were developed in the 1950s; the first polynomial-time technique for imperfect-information games was introduced in the 1960s but was not well known until the 1990s (29).

J. Bronowski, The Ascent of Man [documentary] (1973), episode 13.

Google Scholar

É. Borel, J. Ville, Applications de la théorie des probabilités aux jeux de hasard (Gauthier-Villars, Paris, 1938).

Google Scholar

von Neumann J., Zur Theorie der Gesellschaftsspiele. Math. Annal. 100, 295–320 (1928).

Crossref

Google Scholar

J. von Neumann, O. Morgenstern, Theory of Games and Economic Behavior (Princeton Univ. Press, Princeton, NJ, ed. 2, 1947).

Google Scholar

We use the word synthetic to describe a game that was invented for the purpose of being studied or solved rather than played by humans. A synthetic game may be trivial, such as Kuhn poker (16), or nontrivial, such as Rhode Island hold’em (32).

H. Kuhn, in Contributions to the Theory of Games, H. Kuhn, A. Tucker, Eds. (Princeton Univ. Press, Princeton, NJ, 1950), pp. 97–103.

Google Scholar

J. F. Nash, L. S. Shapley, in Contributions to the Theory of Games, H. Kuhn, A. Tucker, Eds. (Princeton Univ. Press, Princeton, NJ, 1950), pp. 105–116.

Google Scholar

“Poker: A big deal.” Economist (22 December 2007), p. 31.

Google Scholar

See supplementary materials on Science Online.

M. Craig, The Professor, the Banker, and the Suicide King: Inside the Richest Poker Game of All Time (Grand Central, New York, 2006).

Google Scholar

J. Rehmeyer, N. Fox, R. Rico, Ante up, human: The adventures of Polaris the poker-playing robot. Wired 16.12, 186–191 (2008).

Google Scholar

Billings D., Davidson A., Schaeffer J., Szafron D., The challenge of poker. Artif. Intell. 134, 201–240 (2002).

Crossref

ISI

Google Scholar

Koller D., Pfeffer A., Representations and solutions for game-theoretic problems. Artif. Intell. 94, 167–215 (1997).

Crossref

ISI

Google Scholar

D. Billings et al., in Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (2003), pp. 661–668.

Google Scholar

M. Zinkevich, M. Littman, The AAAI Computer Poker Competition. J. Int. Comput. Games Assoc. 29, 166 (2006).

Google Scholar

V. L. Allis, thesis, University of Limburg (1994).

Google Scholar

F. Southey et al., in Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (2005), pp. 550–558.

Google Scholar

Romanovskii I. V., Reduction of a game with complete memory to a matrix game. Sov. Math. 3, 678–681 (1962).

Google Scholar

Koller D., Megiddo N., The complexity of two-person zero-sum games in extensive form. Games Econ. Behav. 4, 528–552 (1992).

Crossref

ISI

Google Scholar

Koller D., Megiddo N., von Stengel B., Efficient computation of equilibria for extensive two-person games. Games Econ. Behav. 14, 247–249 (1996).

Crossref

ISI

Google Scholar

Gilpin A., Sandholm T., Lossless abstraction of imperfect information games. J. ACM 54, 25 (2007).

Crossref

ISI

Google Scholar

J. Shi, M. L. Littman, in Revised Papers from the Second International Conference on Computers and Games (2000), pp. 333–345.

Google Scholar

Sandholm T., The state of solving large incomplete-information games, and application to poker. AI Mag. 31, 13–32 (2010).

ISI

Google Scholar

Rubin J., Watson I., Computer poker: A review. Artif. Intell. 175, 958–987 (2011).

Crossref

ISI

Google Scholar

Another notable algorithm to emerge from the Annual Computer Poker Competition is an application of Nesterov’s excessive gap technique (51) to solving extensive-form games (52). The technique has some desirable properties, including better asymptotic time complexity than what is known for CFR. However, it has not seen widespread use among competition participants because of its lack of flexibility in incorporating sampling schemes and its inability to be used with powerful (but unsound) abstractions that make use of imperfect recall. Recently, Waugh and Bagnell (53) have shown that CFR and the excessive gap technique are more alike than different, which suggests that the individual advantages of each approach may be attainable in the other.

M. Zinkevich, M. Johanson, M. Bowling, C. Piccione, in Advances in Neural Information Processing Systems 20 (2008), pp. 905–912.

Google Scholar

N. Karmarkar, in Proceedings of the 16th Annual ACM Symposium on Theory of Computing (1984), pp. 302–311.

Google Scholar

E. Jackson, in Proceedings of the 2012 Computer Poker Symposium (2012); www.ualberta.ca/~archibal/papers/jackson.pdf. Jackson reports a higher number of information sets, which counts terminal information sets rather than only those where a player is to act.

Google Scholar

O. Tammelin, http://arxiv.org/abs/1407.5042 (2014).

Google Scholar

M. Johanson, N. Bard, M. Lanctot, R. Gibson, M. Bowling, in Proceedings of the 11th International Conference on Autonomous Agents and Multi-Agent Systems (2012), pp. 837–846.

Google Scholar

M. Johanson, K. Waugh, M. Bowling, M. Zinkevich, in Proceedings of the 22nd International Joint Conference on Artificial Intelligence (2011), pp. 258–265.

Google Scholar

M. Bowling, M. Johanson, N. Burch, D. Szafron, in Proceedings of the 25th International Conference on Machine Learning (2008), pp. 72–79.

Google Scholar

The total time and number of core-years is larger than was strictly necessary, as it includes computation of an average strategy that was later measured to be more exploitable than the current strategy and so was discarded. The total space noted, on the other hand, is without storing the average strategy.

These insights were the result of discussions with Bryce Paradis, previously a professional poker player who specialized in HULHE.

O. Morgenstern, New York Times Magazine (5 February 1961), pp. 21–22.

Google Scholar

M. Tambe, Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned (Cambridge Univ. Press, Cambridge, 2011).

Google Scholar

K. Chen, M. Bowling, in Advances in Neural Information Processing Systems 25 (2012), pp. 2078–2086.

Google Scholar

P. Mirowski, in Toward a History of Game Theory, E. R. Weintraub, Ed. (Duke Univ. Press, Durham, NC, 1992), pp. 113–147. Mirowski cites Turing as author of the paragraph containing this remark. The paragraph appeared in (2), in a chapter with Turing listed as one of three contributors. Which parts of the chapter are the work of which contributor, particularly the introductory material containing this quote, is not made explicit.

Google Scholar

Buro M., Solving the Oshi-Zumo game. Adv. Comput. Games 135, 361–366 (2004).

Crossref

Google Scholar

E. Zermelo, in Proceedings of the Fifth International Congress of Mathematics (Cambridge Univ. Press, Cambridge, 1913), pp. 501–504.

Google Scholar

Nesterov Y., Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16, 235–249 (2005).

Crossref

ISI

Google Scholar

A. Gilpin, S. Hoda, J. Peña, T. Sandholm, in Proceedings of the Third International Workshop on Internet and Network Economics (2007), pp. 57–69.

Google Scholar

K. Waugh, J. A. Bagnell, in AAAI Workshop on Computer Poker and Imperfect Information; www.cs.cmu.edu/~./waugh/publications/unify15.pdf.

Google Scholar

M. Lanctot, K. Waugh, M. Zinkevich, M. Bowling, in Advances in Neural Information Processing Systems 22 (2009), pp. 1078–1086.

Google Scholar

Hoda S., Gilpin A., Peña J., Sandholm T., Smoothing techniques for computing Nash equilibria of sequential games. Math. Oper. Res. 35, 494–512 (2010).

Crossref

ISI

Google Scholar

Gilpin A., Peña J., Sandholm T., First-order algorithm with O(ln(1/ε)) convergence for ε-equilibrium in two-person zero-sum games. Math. Program. 133, 279–298 (2012).

Crossref

ISI

Google Scholar

M. Johanson, N. Bard, N. Burch, M. Bowling, in Proceedings of the 26th Conference on Artificial Intelligence (2012), pp. 1371–1379.

Google Scholar

N. Burch, M. Johanson, M. Bowling, in Proceedings of the 28th Conference on Artificial Intelligence (2014), pp. 602–608.

Google Scholar

K. Waugh, D. Schnizlein, M. Bowling, D. Szafron, in Proceedings of the Eighth International Conference on Autonomous Agents and Multi-Agent Systems (2009), pp. 781–788.

Google Scholar

R. Gibson, thesis, University of Alberta (2013).

Google Scholar

K. Waugh et al., in Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (2009), pp. 175–182.

Google Scholar

M. Zinkevich, M. Bowling, N. Burch, in Proceedings of the 22nd Conference on Artificial Intelligence (2007), pp. 788–793.

Google Scholar

(0)eLetters

eLetters is a forum for ongoing peer review. eLetters are not edited, proofread, or indexed, but they are screened. eLetters should provide substantive and scholarly commentary on the article. Embedded figures cannot be submitted, and we discourage the use of figures within eLetters in general. If a figure is essential, please include a link to the figure within the text of the eLetter. Please read our Terms of Service before submitting an eLetter.

Information & Authors

Information

Published In

Science

Volume 347 | Issue 6218
9 January 2015

Copyright

Submission history

Received: 31 July 2014

Accepted: 1 December 2014

Published in print: 9 January 2015

Permissions

Request permissions for this article.

Request Permissions

Acknowledgments

The author order is alphabetical reflecting equal contribution by the authors. The idea of CFR⁺ and compressing the regrets and strategy originated with O.T. (39). This research was supported by Natural Sciences and Engineering Research Council of Canada and Alberta Innovates Technology Futures through the Alberta Innovates Centre for Machine Learning and was made possible by the computing resources of Compute Canada and Calcul Québec. We thank all of the current and past members of the University of Alberta Computer Poker Research Group, where the idea to solve heads-up limit Texas hold’em was first discussed; J. Schaeffer, R. Holte, D. Szafron, and A. Brown for comments on early drafts of this article; and B. Paradis for insights into the conventional wisdom of top human poker players.

Authors

Affiliations

Michael Bowling^* [email protected]

Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.

View all articles by this author

Neil Burch

Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.

View all articles by this author

Michael Johanson

Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.

View all articles by this author

Oskari Tammelin

Unaffiliated; http://jeskola.net.

View all articles by this author

Notes

Corresponding author. E-mail: [email protected]

Metrics & Citations

Metrics

Article Usage

Altmetrics

Citations

Cite as

Michael Bowling et al.

Heads-up limit hold’em poker is solved.Science347,145-149(2015).DOI:10.1126/science.1259433

Export citation

Select the format you want to export the citation of this publication.

Cited by

- Chaoqiong Fan,
- Li Yao,
- Jiacai Zhang,
- Zonglei Zhen,
- Xia Wu,
Advanced Reinforcement Learning and Its Connections with Brain Neuroscience, Research, 6, (2023)./doi/10.34133/research.0064
Abstract
- Martin Schmid,
- Matej Moravčík,
- Neil Burch,
- Rudolf Kadlec,
- Josh Davidson,
- Kevin Waugh,
- Nolan Bard,
- Finbarr Timbers,
- Marc Lanctot,
- G. Zacharias Holland,
- Elnaz Davoodi,
- Alden Christianson,
- Michael Bowling,
Student of Games: A unified learning algorithm for both perfect and imperfect information games, Science Advances, 9, 46, (2023)./doi/10.1126/sciadv.adg3256
Abstract
- Philip W. S. Newall,
- Niri Talberg,
Elite professional online poker players: factors underlying success in a gambling game usually associated with financial loss and harm, Addiction Research & Theory, (1-12), (2023).https://doi.org/10.1080/16066359.2023.2179997
Crossref
- Jerome Hergueux,
- Gabriel Smagghue,
The dominance of skill in online poker, International Review of Law and Economics, 74, (106119), (2023).https://doi.org/10.1016/j.irle.2022.106119
Crossref
- Licheng Wu,
- Qifei Wu,
- Hongming Zhong,
- Xiali Li,
Mastering “Gongzhu” with Self-play Deep Reinforcement Learning, Cognitive Systems and Information Processing, (148-158), (2023).https://doi.org/10.1007/978-981-99-0617-8_11
Crossref
- Swati Chakraborty,
Human Rights and Artificial Intelligence, Dynamics of Dialogue, Cultural Development, and Peace in the Metaverse, (1-14), (2022).https://doi.org/10.4018/978-1-6684-5907-2.ch001
Crossref
- Hong Ri,
- Xiaohan Kang,
- Mohd Nor Akmal Khalid,
- Hiroyuki Iida,
The Dynamics of Minority versus Majority Behaviors: A Case Study of the Mafia Game, Information, 13, 3, (134), (2022).https://doi.org/10.3390/info13030134
Crossref
- Daming Shi,
- Xudong Guo,
- Yi Liu,
- Wenhui Fan,
Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning, Entropy, 24, 6, (774), (2022).https://doi.org/10.3390/e24060774
Crossref
- Yunlong Lu,
- Wenxin Li,
Techniques and Paradigms in Modern Game AI Systems, Algorithms, 15, 8, (282), (2022).https://doi.org/10.3390/a15080282
Crossref
- 蕾黄,
- 进朱,
- 福庆段,
Extensive game decision based on the PPO-CFR algorithm under incomplete information, SCIENTIA SINICA Informationis, 52, 12, (2178), (2022).https://doi.org/10.1360/SSI-2022-0216
Crossref
See more

View Options

Check Access

Log in to view the full text

AAAS ID LOGIN

AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.

Log in via OpenAthens.

via OpenAthens

Log in via Shibboleth.

via Shibboleth

More options

As a service to the community, this article is available for free. Login or register for free to read this article.

Purchase this issue in print

Buy a single issue of Science for just $15 USD.

View options

PDF format

Download this article as a PDF file

Download PDF

Full Text

FULL TEXT

I'll see your program and raise you mine

Abstract

Get full access to this article

Supplementary Material

Summary

Resources

References and Notes

(0)eLetters

Information

Published In

Copyright

Submission history

Permissions

Acknowledgments

Authors

Affiliations

Notes

Metrics

Article Usage

Altmetrics

Citations

Cite as

Export citation

Cited by

Check Access

Log in to view the full text

More options

View options

PDF format

Full Text

Figures

Multimedia

Share

Share article link

Share on social media

Phage predation, disease severity, and pathogen genetic diversity in cholera patients

Drugs of abuse hijack a mesolimbic pathway that processes homeostatic need

Interferon-γ and infectious diseases: Lessons and prospects