Talk:Bellman equation

Latest comment: 1 year ago by 98.156.185.48 in topic Reinforcement learning isn't mentioned?


General explanation

edit

Shuroo (talk) 09:19, 29 May 2009 (UTC) This explanation is too specific. We need a general explanation of the Bellman equation.Reply

BPO

edit

I can't find the "principle of optimality" anywhere in Bellman's 1952 paper. Perhaps this should be his 1957 book. —Preceding unsigned comment added by 129.100.144.166 (talk) 17:01, 11 December 2007 (UTC)Reply

Bellman's definition from Dynamic Programming, 1957 (Dover 2003 edition, p. 83):

Principle of Optimality. An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

--Rinconsoleao (talk) 19:57, 25 May 2009 (UTC)Reply

Notation for dynamic problems

edit

The equation given in the article is wrong, or at least not true in general. The right equation is

 

where T(x,y) is the transformation of the state given a state x and an action y.

Shuroo —Preceding unsigned comment added by Shuroo (talkcontribs) 20:27, 27 May 2009 (UTC)Reply

That depends on how we are defining the notation. The way the problem was set up (I didn't write it myself) defines transitions by stating that the feasible future states are  . So the 'transformation of the state' you mention is captured by that notation. Of course, I don't mind moving to clearer notation if others prefer. --Rinconsoleao (talk) 11:04, 28 May 2009 (UTC)Reply

By the way, the notation appears to come from Stokey, Lucas, Prescott (1989), Chapter 4. --Rinconsoleao (talk) 11:07, 28 May 2009 (UTC)Reply

The notation I have suggested is easier to understand and much more practical for dynamic programming problems with finite horizon. (It also better matches with the article in Vikipedia on dynamic programming.) —Preceding unsigned comment added by Shuroo (talkcontribs) 14:28, 28 May 2009 (UTC)Reply

It seems as though the statement of the Bellman equation for general   would be more clearly derived from stating the recursive definition for general   rather than for   specifically. Then the idea of finding a fixed point   for the contractive mapping (i.e., the Bellman equation) is much more easily elucidated to the reader (or happened upon by the reader). Bashfuloctopus (talk) 06:26, 18 January 2017 (UTC)Reply

Notation for stochastic problems

edit

Yet another argument for my notation is that it can be easily extended to stochastic problems (e.g., inventory management). To obtain the stochastic form of the equation we simply add E (expectation) before the second term since the new state is a random variable.

 

where T(x,y) is the transformation of the state given a state x and an action y. Shuroo (talk) 09:21, 29 May 2009 (UTC)shurooReply

I agree that the form you are proposing is probably easier to understand. Go ahead and change the notation if you want. (Notice that you will need to change the notation on the original sequence problem too, for consistency with the form of the Bellman equation you are proposing.) I'll keep an eye open for errors. --Rinconsoleao (talk) 10:21, 29 May 2009 (UTC)Reply

Done Shuroo (talk) 08:51, 30 May 2009 (UTC)Reply

I'm sorry, I thought your notation for the stochastic case would show more explicitly how the next state   is determined. The notation T(x,y) strikes me as misleading, because it looks like a function, which suggests that it should take a unique value. A rigorous way to define it would be to write a c.d.f. of the form   representing the probability distribution of possible states   conditional on the current state   and action  . Alternatively, we could write something like  , where   is a mean-zero shock, but that is more restrictive. --Rinconsoleao (talk) 13:47, 1 June 2009 (UTC)Reply

What I wrote was a simple but valid formula that could explain the example next section. (I used T for the needed random variable but indicated it was a random variable , not a function.) It seems to me that adding too much details would make the article much less attractive because in order to introduce conditional distributions properly you need to add some heavy machinery.

Pity that you erased it.Shuroo (talk) 18:47, 1 June 2009 (UTC)Reply

By the way, the current article uses the version that you erased, in the example, but in an implicit way. Surely it is better to write it explicitly. Shuroo (talk) 19:32, 1 June 2009 (UTC)Reply

Now I've added a lot more discussion, and a stochastic example with (I think) rigorous notation. Let me know if you find errors.--Rinconsoleao (talk) 18:16, 4 June 2009 (UTC)Reply

If you want to do the stochastic part proporly better write it in terms of a Markov decision process. Shuroo (talk) 10:13, 5 June 2009 (UTC)Reply

I wrote it this way because the G(y|x,a) notation is actually very close to your y=T(x,a) notation, except that it makes explicit the fact that we are talking about a random variable. And it allowed me to write a very similar Bellman equation for the stochastic and deterministic cases (so the stochastic subsection ends up very short and simple without being incorrect). But we could certainly include a link to MDP too.--Rinconsoleao (talk) 11:03, 6 June 2009 (UTC)Reply

In any case, if you want the article to be rigorous,at least to some extent, you certainly need to state the Markovian property of the system. Shuroo (talk) 18:57, 6 June 2009 (UTC)Reply

Wording

edit

"It breaks a dynamic optimization problem into simpler subproblems, writing the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices, as Bellman's Principle of Optimality prescribes." this sentance is VERY long and obfuscated ; it may deserve a reformulation dont you think —Preceding unsigned comment added by 124.180.239.182 (talk) 09:35, 20 July 2010 (UTC)Reply

Greedy?

edit

As I understand it, the mathematical expression given under the Bellman's Principle of Optimality section is incorrect. The claim is made that it is the same as (the RHS of, presumably) the previous equation, but I don't think that's true -- as it stands, it is actually a description of a greedy solution, not a solution involving the principle of optimal substructure. The reason being that the first term greedily selects the action at time 0 giving maximum payoff, when in fact a suboptimal action at this time may maximise the value overall by bringing the system into a different state at time 1 whose value (maximum possible discounted sum of payoffs), when added to this (suboptimal) time-0 payoff, is greater. Not sure what the correct expression would be I'm afraid. 130.123.96.22 (talk) 03:18, 15 February 2011 (UTC)Reply

No, the expression does not refer to a 'greedy' algorithm-- in other words, it is not maximizing the first-period payoff in isolation. The control a0 is chosen so that it maximizes the first-period payoff PLUS the continuation value of the rest of the problem (which need not imply that it maximizes the first-period payoff by itself, as a greedy algorithm would). I will add another set of parentheses to make this clearer. Rinconsoleao (talk) 09:06, 15 February 2011 (UTC)Reply

Value function

edit

I think the term "value function" deserves an article alone!Luizabpr (talk) 19:07, 21 May 2011 (UTC)Reply

@Luizabpr: belated, I gave it one. Cheers. --bender235 (talk) 01:57, 23 June 2020 (UTC)Reply

Borken stochastic eqn

edit

There is something wrong with the equation defining the conditional probability, but am not quite sure of the fix. Currently it says:

 

But this can't be right, since   is a state, and so in general, the idea of   is impossible to define. Perhaps the following was intended:

 

??? I dunno, please fix. linas (talk) 17:00, 28 June 2012 (UTC)Reply

I don't understand what you mean when you say that   is impossible to define. Perhaps you are assuming that the state   is chosen at t, and is therefore deterministic at t. But the Bellman equation, as written, shows that the decision-maker chooses an action a at time t, which does not exactly determine  . Instead, it alters the probability distribution of  . That's what the conditional distribution   shows.
So I see no problem with the equation. Removing the discussion flag.Rinconsoleao (talk) 18:56, 19 September 2012 (UTC)Reply


I agree with linas. The presentation of the article in general is a bit hand-wavy. One should at least state the principle of optimality precisely, IMHO. That particular section is confused, in terminology and conceptually. The stochastic shocks are usually assumed to follow a Markov process, which loosely speaking, maps a measurable space to probability measures on that space, zz. They are not "random variables". Neither is the state a "random variable, as the article right now says.
The Bellman equation is incorrect. The expectation should be taken with respect to the appropriate probability measure, the one that determines the next period shock. The Bellman equation depends on the timing of the model:
If control is being chosen after the current period shock is realized:
 
If control is chosen ex ante:
 
It is sloppy to say the state "next period"   is a "random variable". Once an optimal policy function is determined from the Bellman equation, then the state for a given period is a random variable, with the underlying probability space given by the Markov process. If one wants to condition on the current state  , then in the ex post case, the next period state   is not random at all. Mct mht (talk) 12:15, 16 March 2013 (UTC)Reply
edit

Hello fellow Wikipedians,

I have just modified one external link on Bellman equation. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 11:00, 30 October 2016 (UTC)Reply

edit

Hello fellow Wikipedians,

I have just modified one external link on Bellman equation. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 15:19, 17 July 2017 (UTC)Reply

Reinforcement learning isn't mentioned?

edit

Isn't that basically the main application of Bellman equation? 98.156.185.48 (talk) 02:06, 29 September 2023 (UTC)Reply