![强化学习](https://wfqqreader-1252317822.image.myqcloud.com/cover/245/34233245/b_34233245.jpg)
上QQ阅读APP看书,第一时间看更新
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_566.jpg?sign=1738857614-uVF6LEa9PUeYgdv2zgwRGNzdM0T8q6LB-0-da2dc85b40d1736f08baa342b659335b)
图4-1 宝盒
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_569.jpg?sign=1738857614-QjqGKIlFpj1l9u5FzL9HAPbMFnI0Iay0-0-17581beed24d57be8767654401c023de)
图4-15 两种方法计算出来的最优值函数对比图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_572.jpg?sign=1738857614-ihoZCJP7GK3pUw330gsIxVo4RsyWDFig-0-705f11feda5eca6b5649152912a36f19)
图5-1 MC方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_575.jpg?sign=1738857614-JWXRBG92x2TDOFQObVafnRbvKqRxjXUw-0-67158bedd99296a7bbca4915163a870a)
图5-2 DP方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_578.jpg?sign=1738857614-vfCGWwHk1lHIfiNVzxb00T2Op4NgcMvg-0-d3228b3907eead1d7bd632cd04acf57a)
图5-3 TD方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_581.jpg?sign=1738857614-ZhXrF3SiSGBmXHwMUxXoRxNdvdFdT1zZ-0-14e2323b7a7296ac7b4707ef771c22d1)
图5-6 迷宫环境
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_584.jpg?sign=1738857614-ynnLkDlWgiZXwh0maeP1fxzz4vUUiPVW-0-f6ee112282b8e6f0c9967418d7bb61dd)
图5-7 Sarsa方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_587.jpg?sign=1738857614-eBKOK0Nh7KLeJT9AjtQMO7XunalE3yHx-0-69d8dc4a2c901a39983fc0bcf15aefd1)
图6-12 风格子世界
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_591.jpg?sign=1738857614-xXKCca0Y65hqtUD6CY3jwUlgo6ehOSLO-0-8f4f38090e59d912134d51ceaea1fb22)
图6-13 后向Sarsa(λ)方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_595.jpg?sign=1738857614-RkvSQ1fA86287h8YKpid1rXLG4cKsPpj-0-9d23a2ecdb7e75dfc972f5aa8f3c7aa8)
图6-14 后向Sarsa(λ)方法得到的最优路径
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_598.jpg?sign=1738857614-cV3Rqt6DaUagb3ia1BfywK7J8mTln18m-0-a679a369d7e5773ce8797ae0b610b81b)
图6-15 后向Q(λ)方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_601.jpg?sign=1738857614-9g1dOVaTpYDxqFmN87ekgbNwWaPcujak-0-15294b81adaec9538fe489a98de6c639)
图6-16 后向Q(λ)方法得到的最优路径
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_604.jpg?sign=1738857614-bEe1KKOsjOcf38DBmP02tyMVgWFegyeb-0-2eb8d81b51f186c735e07ad3f2bf7fa1)
图7-3 DQN的神经网络结构
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_607.jpg?sign=1738857614-oShtszhNL22nn4tNkHLjPsc50YpJdspp-0-91af5ba2898b3db4d04a0b62250eb6a7)
图7-7 驾驶汽车
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_610.jpg?sign=1738857614-PJHovhUiw5OMVf9JD5YmmnfiOpNvjOXI-0-9467f92f953df70ae7b66c5ad7f25ae9)
图7-10 飞翔的小鸟
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_613.jpg?sign=1738857614-JVkQTMeohrMha9XfuiVWRfyXovGJ0dlU-0-c0fa106efd61e7daae68e2234d88d5fa)
图7-11 删除游戏背景
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_616.jpg?sign=1738857614-r8fyyLQ1r2Lcyjdnt7s885a2IVrbbFoy-0-065be10859e9b8d860bd3b7e0ed46420)
图7-13 灰度化和二值化
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_619.jpg?sign=1738857614-S7CdvEJHGxccRh5BJFdSTwT0Dd3q7IaZ-0-5e7d35294ff27b27562c74f6014bf70f)
图8-4 )及
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_622.jpg?sign=1738857614-RcKPmM0ndk1IURz6Zv4vhWOtTD3p0X1K-0-f4eef976c6ff491bf5abe744abf1e75d)
图9-1 异步方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_625.jpg?sign=1738857614-dckxQzf281SVV2WH7zEQdAZ62dMYEPav-0-7a3a0f43e7bbf66c14cb067927269d9f)
图13-12 策略网络结构示意图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_628.jpg?sign=1738857614-jVzEWzsDSCS6rOCcxH3t1OHeXcjty4X6-0-db034d854824755defa2ce5206e1df5a)
图13-13 价值网络结构示意图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P9_631.jpg?sign=1738857614-rm7fHfyyOizQhli6Fw3r3KcCSMnKu3HF-0-7cc8b1e486149bddb0e22beb956639a1)
图13-16 AlphaGo整体架构
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P10_646.jpg?sign=1738857614-sKySRmlf1mlmgpq0LRNVYBJjd8iwyXNp-0-5b04e8ec80222fb67c1d4c83b03dacfb)
图13-17 在线对弈过程
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P10_649.jpg?sign=1738857614-QqFH9y6DHXQ77qKPSfkoD3ndjJPerjvl-0-281390b4de5539d3cbe7c2a0a48ccddf)
图13-18 AlphaGo Zero下棋原理