[PDF] DecisionHoldem: Safe Depth-Limited Solving With Diverse





Previous PDF Next PDF



OpenHoldem Manual

find a casino that does not work with OpenHoldem's game state or action model OpenHoldem is not a general-purpose poker botting engine for all styles of.



OpenHoldem: A Benchmark for Large-Scale Imperfect-Information

OpenHoldem can potentially have a significant impact on the poker AI research and more generally in the. AI community dealing with decision-making problems 



OpenPPL - The Manual

OpenHoldem connects automatically (one instance per table) and starts to play. ability to use Poker Tracker stats directly in your OpenPPL-code.



OpenHoldem 2.0 Project Manual and Documentation

23 giu 2007 OpenHoldem is not a complete poker bot. There is much work you need to do to utilize the framework effectively however this manual and the ...



Project and Development of a Case-Based Reasoning Poker Bot

23 mar 2010 that allowed me to complete this work: OpenHoldem PostgreSQL



Building a Poker Playing Agent based on Game Logs using

3.9.5.3 Open Holdem. OpenHoldem [80] is an open source screen scraping framework and programmable logic engine for the online Texas Hold'em poker game.



BeCAPTCHA-Mouse: Synthetic Mouse Trajectories and Improved

2 mar 2021 We first study the suitability of behavioral biometrics to distinguish between computers and humans commonly named as bot detection.



DecisionHoldem: Safe Depth-Limited Solving With Diverse

27 gen 2022 OpenStack is a high-level poker AI integrated in. OpenHoldem a replica AI version of DeepStack. The exper- imental configurations are as ...



Warbot User Manual

Table: OpenHoldem Poker - No Limit - blinds 5/10. Here we can see the version of OpenHoldem engine formula (profile) name that was loaded at the moment



Depth-Limited Solving for Imperfect-Information Games

Paper accepted and presented at the Neural Information Processing Systems Conference (http://nips.cc/)

DecisionHoldem: Safe Depth-Limited Solving With Diverse Opponents for

Imperfect-Information Games

Qibin Zhou

1,Dongdong Bai1,Junge Zhang1y,Fuqing Duan2,Kaiqi Huang1

1Institute of Automation, Chinese Academy of Sciences, Beijing, China

2Beijing Normal University, Beijing, China

zqbagent@gmail.com, baidongdong@nudt.edu.cn, jgzhang@nlpr.ia.ac.cn, fqduan@bnu.edu.can,kqhuang@nlpr.ia.ac.cn

Abstract

An imperfect-information game is a type of game

with asymmetric information. It is more com- mon in life than perfect-information game. Ar- tificial intelligence (AI) in imperfect-information games, such like poker, has made considerable progress and success in recent years. The great success of superhuman poker AI, such as Libra- tus and Deepstack, attracts researchers to pay at- tention to poker research. However, the lack of open-source code limits the development of Texas hold"em AI to some extent. This article introduces

DecisionHoldem, a high-level AI for heads-up no-

limit Texas hold"em with safe depth-limited sub- game solving by considering possible ranges of op- ponent"s private hands to reduce the exploitabil- ity of the strategy. Experimental results show that DecisionHoldem defeats the strongest openly available agent in heads-up no-limit Texas hold"em poker, namely Slumbot, and a high-level reproduc- tion of Deepstack, viz, Openstack, by more than

730 mbb/h (one-thousandth big blind per round)

and 700 mbb/h. Moreover, we release the source codes and tools of DecisionHoldem to promote AI development in imperfect-information games.

1 Introduction

The success of AlphaGo

[Silveret al., 2016]has led to increasing attention to the study of game decision- making [Brown and Sandholm, 2018;Mora vc´ıket al., 2017; Brownet al., 2018;Bro wnand Sandholm, 2019c ]. Unlike perfect-information games, such as Go, real-world problems are mainly imperfect-information games. The hidden knowl- edge in poker games (i.e., private cards) corresponds to the real world"s imperfect-information. Research on poker arti- ficial intelligence (AI) can provide means to deal with prob- lems in life, such as financial market tracking and stock fore- casting. Research in imperfect-information games, particularly poker AI [Brown and Sandholm, 2018;Mora vc´ıket al., 2017;

Dongdong Bai and Qibin Zhou contribute equally.

yCorresponding AuthorBrownet al., 2018;Bro wnand Sandholm, 2019c ], has made considerable progress in recent years. Texas hold"em is one of the most popular poker game in the world. It is an ex- cellent benchmark for studying the game theory and technol- ogy in imperfect-information games because of the follow- ing three factors. First, Texas hold"em is a typical imperfect- information game. Before the game, two private hands invis- ible to the opponent are distributed to each player. Players should predict the opponents" private hands during decision- making based on the opponents" historical actions, which makes Texas hold"em obtain the characteristics of deception and anti-deception. Second, the complexity of the Texas hold"em game is enormous. The decision-making space for heads-up no-limit Texas hold"em (HUNL) exceeds10160[Jo- hanson, 2013 ]. In addition, Texas hold"em has simple rules and moderate difficulty, which considerably facilitates the verification of algorithms by researchers. After decades of research, the poker AI DeepStack de- veloped by Matej Morav c´ık et al.[Moravc´ıket al., 2017] and Libratus developed by Noam Brown and Tuomas Sand- holm [Brown and Sandholm, 2018]successively defeat hu- man professional players in 2017. This event affirms the breakthrough for HUNL. Subsequently, the poker AI Pluribus, also constructed by Noam Brown and Tuomas

Sandholm

[Brown and Sandholm, 2019c], defeats the human professional players in six-man no-limit Texas hold"em. Al- thoughSciencemagazine has published the poker AI men- tioned above [Brown and Sandholm, 2018;Mora vc´ıket al., 2017
], the relevant code and main technical details have not been made public.

In addtion, considerable poker AI progress

[Brownet al., 2017] [Hartley, 2017] [Brown and Sandholm, 2019a] Schmidet al., 2019] [Farinaet al., 2019c] [Farinaet al., 2019a
] [Farinaet al., 2019b] [Liet al., 2020a]is only tested in games with small decision space, such as Leduc hold"em and Kuhn Poker. These algorithms may not work well when applied to large-scale games, such as Texas hold"em. In this paper, we propose a safe depth-limited subgame solving algorithm with diverse opponents. To evaluate the al- gorithm"s performance, we achieve a high-performance and high-efficiency poker AI based on it, namely DecisionHol- dem. Experiments show that DecisionHoldem defeats thearXiv:2201.11580v1 [cs.AI] 27 Jan 2022

Round Number of Abstract Hands 1st2nd Actions 3rd5th Actions Remaining ActionsPre-Flop 169 F, C, 0.5P, P, 2P, 4P, A F, C, P, 2P, 4P, A F, C, A

Flop 50,000 F, C, 0.5P, P, 2P, 4P, A F, C, P, 2P, 4P, A F, C, A Turn 5,000 F, C, 0.5P, P, 2P, 4P, A F, C, P, 2P, 4P, A F, C, A

River 1,000 F, C, 0.5P, P, 2P, 4P, A F, C, P, 2P, 4P, A F, C, ATable 1: The number of abstract hands and actions available for each round (pre-flop, flop, turn and, river) of DecisionHoldem on HUNL. F,

C, 0.5P, P, 2P, 4P, and A represent Fold, Call, 0.5 Pot size, 1.0 Pot size, 2.0 Pot size, 4.0 pot size, and all-in, respectively.

strongest public poker AI, such as Slumbot

1(champion of

2018 Annual Computer Poker Competition [ACPC]) and

OpenStack (a reproduction of DeepStack built-in OpenHol- dem [Liet al., 2020b]2, by a big margin. Meanwhile, we release DecisionHoldem"s source code, and tools for play- ing against the Slumbot and OpenHoldem [Liet al., 2020b]. In addition, we also provide a platform to play DecisionHol- dem with humans (as in Figure 1

3). Our code is available at

.Figure 1: Demonstration of AI and human confrontation.

2 Methods

In this study, we use the counterfactual regret minimization (CFR) algorithm [Zinkevichet al., 2007], the primary way of the Texas hold"em AI, and combine it with safe depth-limited subgame solving to achieve the high-performance and high- efficiency poker AI - DecisionHoldem. DecisionHoldem is mainly composed of two parts, namely the blueprint strategy and the real-time search part. In the blueprint strategy part, we partially follow the idea of Libratus but adjusted the parameters of the abstract num- ber of actions and hands. The abstract parameters of Deci- sionHoldem"s hands and actions are shown in Table 1 . Deci- sionHoldem first employs the hand abstraction technique and actionabstractiontoobtainanabstractedgametree. Then, we use the linear CFR algorithm [Brown and Sandholm, 2019b]1 www.slumbot.com

2holdem.ia.ac.cn

3https://github.com/ishikota/PyPokerGUIiteration on the abstracted game tree to calculate blueprint

strategy on a workstation with 48 core CPUs for about 3

4 days with approximately 200 million iterations. The total

computing power cost is about 4,000 core hours. In the real-time search part, we propose a safer depth- limited subgame solving algorithm than modicum"s [Brown et al., 2018]on subgame solving by considering diverse op- ponents for off-tree nodes. Since the opponent"s private hand range reflects the opponent"s play style and strategy, we pro- pose a safe depth-limited subgame solving method by explic- itly modeling diverse opponents with different ranges. This algorithm can refine the degree of subgame strategy with- out worsening the exploitability compared with the blueprint strategy. That is to say, safe depth-limited solving with di- verse opponents can significantly enhance the AI decision- making level and ability with changeable challenges. Our subsequent articles will introduce the details of the algorithm.

3 Experiments and Results

DecisionHoldem plays against Slumbot and OpenStack [Li et al., 2020b]to test its capability. Slumbot is the champion of the 2018 ACPC and the strongest openly available agent in HUNL. OpenStack is a high-level poker AI integrated in OpenHoldem, a replica AI version of DeepStack. The exper- imental configurations are as follows. For the first three rounds of the game, DecisionHoldem prioritizes using blueprint strategies when making decisions. For off-tree nodes, DecisionHoldem starts a real-time search. For the first two rounds of poker (preflop, flop), the real-time search iterations are 6,000 times; for the third round (turn), the real-time search iterations are 10,000 times. While for the last round (river), DecisionHoldem employs the safe depth-limited subgame solving algorithm for real- time search with 10,000 iterations directly. In approximately 20,000 games against Slumbot, Deci- sionHoldem"s average profit is more remarkable than 730 mbb/h (one-thousandth big blind per round). It ranked first in statistics on November 26, 2021 (DecisionHoldem"s name on the leaderboard is zqbAgent

4), as the Figure2 and 3 . With ap-

proximately 2,000 games against OpenStack, DecisionHol- dem"s average profit is greater than 700 mbb/h, and the com- petition records are available in the Github repository of De- cisionHoldem.4 Figure 2: DecisionHoldem"s ranking on the Slumbot leaderboard on November 26, 2021. Figure 3: Statistics for DecisionHoldem vs. Slumbot.

4 Conclusions

This paper introduces the safe depth-limited subgame solv- ing algorithm with the exploitability guarantee. It achieves the outstanding AI DecisionHoldem for HUNL with the pro- posed subgame solving algorithm for real-time search and suitable abstraction methods for blueprint strategy. Decision- Holdem defeats the current typical public high-level poker AI, namely Slumbot and OpenStack. To our best knowl- edge, DecisionHoldemistheveryfirstopen-sourcehigh-level AI for HUNL. Meanwhile, we also provide toolkits against Slumbot and OpenStack, and a platform to play DecisionHol- dem with humans to assist researchers in conducting further research.

References

Brown and Sandholm, 2018]Noam Brown and Tuomas

Sandholm. Superhuman ai for heads-up no-limit poker: Libratus beats top professionals.Science, 359(6374):418-

424, 2018.[

Brown and Sandholm, 2019a]Noam Brown and T. Sand-

holm. Solving imperfect-information games via dis- counted regret minimization. InAAAI, 2019.

Brown and Sandholm, 2019b]Noam Brown and Tuomas

Sandholm. Solving imperfect-information games via dis- counted regret minimization. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages

1829-1836, 2019.

Brown and Sandholm, 2019c]Noam Brown and Tuomas

Sandholm. Superhuman ai for multiplayer poker.Science,

365(6456):885-890, 2019.

Brownet al., 2017]Noam Brown, Christian Kroer, and T. Sandholm. Dynamic thresholding and pruning for re- gret minimization. InAAAI, 2017. Brownet al., 2018]Noam Brown, Tuomas Sandholm, and Brandon Amos. Depth-limited solving for imperfect- information games.arXiv preprint arXiv:1805.08195, 2018.
Farinaet al., 2019a]Gabriele Farina, Christian Kroer,

Noam Brown, and T. Sandholm. Stable-predictive

optimistic counterfactual regret minimization.ArXiv, abs/1902.04982, 2019. Farinaet al., 2019b]Gabriele Farina, Christian Kroer, and

T. Sandholm. Optimistic regret minimization for

extensive-form games via dilated distance-generating functions. InNeurIPS, 2019. Farinaet al., 2019c]Gabriele Farina, Christian Kroer, and T. Sandholm. Regret circuits: Composability of regret minimizers. InICML, 2019. Hartley, 2017]M. Hartley. Multi-agent counterfactual re- gret minimization for partial-information collaborative games. 2017. Johanson, 2013]Michael Johanson. Measuring the size of large no-limit poker games.arXiv preprint arXiv:1302.7008, 2013. Liet al., 2020a]Hui Li, Kailiang Hu, Zhibang Ge, Tao Jiang, Yuan Qi, and L. Song. Double neural counterfac- tual regret minimization.ArXiv, abs/1812.10607, 2020.

Liet al., 2020b]Kai Li, Hang Xu, Meng Zhang,

Enmin Zhao, Zhe Wu, Junliang Xing, and Kaiqi

Huang. Openholdem: An open toolkit for large-scale imperfect-information game research.arXiv preprint arXiv:2012.06168, 2020. Moravc´ıket al., 2017]Matej Moravc´ık, Martin Schmid,

Neil Burch, Viliam Lis

`y, Dustin Morrill, Nolan Bard,

Trevor Davis, Kevin Waugh, Michael Johanson, and

Michael Bowling. Deepstack: Expert-level artifi-

cial intelligence in heads-up no-limit poker.Science,

356(6337):508-513, 2017.

Schmidet al., 2019]Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, and Michael H. Bowling. Variance reduction in monte carlo counterfactual regret minimization (vr-mccfr) for extensive form games using baselines.ArXiv, abs/1809.03057, 2019. Silveret al., 2016]David Silver, Aja Huang, Chris J Maddi- son, Arthur Guez, Laurent Sifre, George Van Den Driess- che, Julian Schrittwieser, Ioannis Antonoglou, Veda Pan- neershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search.nature,

529(7587):484-489, 2016.

Zinkevichet al., 2007]Martin Zinkevich, Michael Johan- son, Michael Bowling, and Carmelo Piccione. Regret min- imization in games with incomplete information. InAd- vances in Neural Information Processing Systems 20, vol- ume 20, pages 1729-1736, 2007.quotesdbs_dbs14.pdfusesText_20
[PDF] opentype font list

[PDF] operator

[PDF] operators in quantum mechanics pdf

[PDF] optavia fast food guide

[PDF] optimal cluster size

[PDF] optimal formalin fixation time

[PDF] optimal work hours per day

[PDF] optimality conditions of minimization

[PDF] optimisation with equality constraints

[PDF] optimization algorithms

[PDF] optimization book pdf

[PDF] optimization pdf

[PDF] optimization with equality and inequality constraints

[PDF] oracle forms

[PDF] oracle forms developer guide