Anonymous
Not logged in
Talk
Contributions
Create account
Log in
IT위키
Search
Template:MDP와 Q 러닝
From IT위키
Namespaces
Template
Discussion
More
More
Page actions
Read
Edit source
History
항목
MDP
Q 러닝
결정 과정
전이확률T(s’,a,s) 계산
미래값(Q) 계산
정책(Policy)
π(s) = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑇(𝑠’, 𝑎, 𝑠)
π(s) = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑄(𝑠, 𝑎)
최적 값
수렴 시까지 V(s)수행
Q 테이블 업데이트
Navigation
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Wiki tools
Wiki tools
Special pages
Page tools
Page tools
User page tools
More
What links here
Related changes
Printable version
Permanent link
Page information
Page logs