Wikipedia:Reference desk/Archives/Mathematics/2014 August 1

Mathematics desk
< July 31 << Jul | August | Sep >> August 2 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 1

edit

Rational Prime

edit

I inquired at talk for the prime number page, but thought I might get a quicker answer here. Is there a generally agreed on definition for a prime rational number? — Preceding unsigned comment added by 173.79.197.184 (talk) 13:26, 1 August 2014 (UTC)[reply]

I'm pretty sure we'd all agree that all prime numbers are also rational numbers. Are you interested in notions of non-integer 'primes'? Most of the generalization of primacy I know of are listed at Prime_number#Generalizations. We could view the rational numbers as a ring, and then look for prime elements. But, since the rationals are in fact a field, every non-zero element is a unit, and so there are no prime elements in the ring of rationals. There are also many types of non-prime pseudoprimes, but they are all still integers. quasiprime gets a few google hits, but they seem to all be closely related to various definitions of pseudoprimes. So, if I understand your question correctly, I think the answer is no. SemanticMantis (talk) 16:20, 1 August 2014 (UTC)[reply]
SemanticMantis, if I understand you correctly with reference to prime elements of the ring of rational numbers, the answer to the question is being confused by the triviality of the result. The general definition of a prime element (which you linked) can be applied to the ring of rational numbers. Like for any field, the set of prime elements of the field is empty. So I would say the answer would be: Yes, there is a generally agreed definition of primes that applies to rational numbers, but this definition results in there being none. —Quondum 18:13, 1 August 2014 (UTC)[reply]
Good point, thanks for clarifying. It does put us in a weird position though, because then we're left with true statements like this: "all the prime elements of the ring of rationals wear pink hats" ;) SemanticMantis (talk) 18:27, 1 August 2014 (UTC)[reply]
Heh-heh. Yes, human intuition doesn't deal with triviality, and especially with vacuous truth, very well, which makes it feel weird. It took millennia for society to understand zero generally; perhaps language will eventually change to be more precise in everyday use so that these concepts are introduced at an early age? Without that, I do not see the weirdness of it diminishing. Either way, we're stuck with vacuous truth in mathematics. —Quondum 21:02, 1 August 2014 (UTC)[reply]

How many cards are in the bag?

edit

In a bag there are an unknown number of similarly-sized small cards. Each card has a unique word written upon it - no two cards bear the same word. I draw out a card at random, note the word that it bears, and replace the card in the bag. After 4096 draws and replacements, 1500 different words have been identified, distributed as follows:

Different words Number of cards drawn
Drawn once only 325 325
Drawn twice 426 852
Drawn 3 times 349 1047
Drawn 4 times 228 912
Drawn 5 times 103 515
Drawn 6 times 46 276
Drawn 7 times 16 112
Drawn 8 times 6 48
Drawn 9 times 1 9
Total 1500 4096

Approximately how many draws (with replacement) do I need to make to determine the number of different words in the bag? --Redrose64 (talk) 17:15, 1 August 2014 (UTC)[reply]

Here is an equivalent problem, with a solution that appears to be correct [1]. Note however that you can never explicitly determine the number of different words in the bag by sampling with replacement, but you can estimate it with decent precision. SemanticMantis (talk) 18:31, 1 August 2014 (UTC)[reply]
Mark and recapture is also relevant. --Mark viking (talk) 18:33, 1 August 2014 (UTC)[reply]
Good point. The R package Rcapture [2] will give access to statistical models for this problem as well as a whole range of similar models. It is likely far more sophisticated than anyone here is likely to type up in an afternoon. SemanticMantis (talk) 19:41, 1 August 2014 (UTC)[reply]
Er... thanks. The "estimate-the-number-of-elements-by-random-sampling-with-replacement" one is pretty much it, going by the text on that page; but I come to the formula, and I plug in the two values that I know, and I get P(N=n|1500=r,4096=m) - now what? --Redrose64 (talk) 20:26, 1 August 2014 (UTC)[reply]
Actually I'm not sure the reasoning in the math stack exchange page is entirely correct. The first thing to notice is that as a function of N, the unknown number words in the bag, the probability of getting a certain distribution of repeats only depends, up to a constant multiple, on the number of samples and the number of different samples. For example if I choose 5 samples and get "two pair" (as a poker hand), the probability of that occurring is 15N(N-1)N-2)/N5 and the probability of "three of a kind" is 10N(N-1)N-2)/N5. The functions are the same up to a constant. So in terms of relative probabilities as a function of N, which is what any estimate would be based on, only the number of different samples matters. In fact if m samples are taken and k different values are found, then the probability is (some constant)x(N)k/Nm. (Here (N)k is N(N-1)...(N-k+1).) The next thing to notice is that you're asking for a statistical estimate and this depends on what exactly you're looking for and how much computation you're willing to do to get it. For simplicity you can use the maximum likelihood estimator (MLE) but there are other choices, it's been a while since I took statistics so I don't know them off the top of my head. I did an experiment to find the MLE's with a sample size (m) of 10. The results were:
k  MLE
1  1
2  2
3  3
4  4
5  5
6  8
7 12
8 19
9 45
For example if you had two duplicates in a sample size of 10 then you would take N=19 as your estimate. I used a bit of trial and error here because I didn't see how to get a closed formula. But extrapolating from these results, if m>>k then just take M=k.--RDBury (talk) 04:21, 2 August 2014 (UTC)[reply]

The number of ways to draw one card M times with replacement from a bag of N different cards is

 

The number of ways to draw K different cards when drawing one card M times with replacement from a bag of N different cards is

 

where the first factor is a binomial coefficient, the second factor is a stirling number of the second kind, and the third factor is a factorial.

The probability of getting K different cards when drawing one card M times with replacement from a bag of N different cards is

 

The credibility that the bag contains N different cards, when having drawn one card M times with replacement gave K different cards, is

 

The common factors cancel:

 

The unknown number N is estimated by

 

defined by

 

and

 

Bo Jacoby (talk) 18:22, 2 August 2014 (UTC).[reply]

The line starting "The number of ways to draw at most K different cards when..." is incorrect. The number of ways to draw at most 2 different cards when drawing one card 3 times with replacement from a bag of 3 different cards is 21, not 24=3×23 as your expression would indicate. There are 6 ways of getting all three, the six permutations of {1, 2, 3}. So the ways of getting less than 3 are 27-6=21. I don't know how to do a credibility mass function though.--RDBury (talk) 19:06, 2 August 2014 (UTC)[reply]
OK... I never claimed to be mathematical. I got GCE Ordinary Level Maths grade A one year early, and scored test results ahead of everybody in our class (except for this guy's son (who got his grade A two years early) and two others who now work for the enemy) until the age of 16, then I plummeted. I think that   means "sum the following expression for all real values of N between K and infinity", but I have two knowns which means that the equation must have three unknowns. Where do I feed in my 4096 and 1500, how do I determine two more unknowns, and where does the answer come out? --Redrose64 (talk) 20:20, 2 August 2014 (UTC)[reply]

RDBury is right and I am wrong. More thinking is called for. Bo Jacoby (talk) 21:28, 2 August 2014 (UTC). It is corrected now. Thank you for the error message! Bo Jacoby (talk) 15:34, 3 August 2014 (UTC).[reply]

To Redrose64: Apologies; it seems that the problem is too "interesting" (i.e. there is some question as to how to go about solving it). In your case M is 4096, K is 1500 and the problem is to find N. The relation between M, K and N is not fully understood at this point (at least by me).
To Bo Jacoby: Do you have a reference for the method using the credibility mass function? Nothing turned up when I Googled it.--RDBury (talk) 22:26, 2 August 2014 (UTC)[reply]
Ok, based on the some of the analysis above and a python program I'm getting an MLE of 1632.
Python code here


def MLE_distinct_with_replacement(m, k, upper):
    """Find n with maximum probability of k distinct values being found
    if m samples are taken from a population of n
    with replacement. Upper bound is upper to avoid infinite search
    """
    bestn, bestlp = k-1, -1e+100 #start with neginfinity as max
    for n in range(k, upper+1):
        lp=0
        for i in range(k):
           lp += log(n-i)
        lp -= m*log(n)
        if lp > bestlp:
            bestn, bestlp = n, lp
        else: 
            break #Values continue to fall once peak is reached
    return bestn

MLE_distinct_with_replacement(4096, 1500, 16000)
--RDBury (talk) 02:02, 3 August 2014 (UTC)[reply]
PS. I tweaked the program above to follow Bo Jacobi's credibility method (as I understand it at least) and got N=1633.2±13.7, which includes the MLE. --RDBury (talk) 02:43, 3 August 2014 (UTC)[reply]

The number of ways to draw K different cards was found by brute force counting, and then making the above formula which reproduces the counting results. It is not a mathematical proof.

J code here
     N=.i.9
     M=.3
     K=.1 2 3

     f NB. brute force
 [: +/ [: (=/ ~.) [: #@~."1 $~ #: [: i. ^

    |:>f&M &.> N NB. counting
 0 1 2  3  4  5   6   7   8
 0 0 6 18 36 60  90 126 168
 0 0 0  6 24 60 120 210 336

    s2 NB. http://www.jsoftware.com/jwiki/Essays/Stirling%20Numbers
 1:`(i.@>: ((0 , ]) + [ * ] , 0:) $:@<:)@.*
 
   (K!/N)*(}.s2 M)*!K NB. computing
 0 1 2  3  4  5   6   7   8
 0 0 6 18 36 60  90 126 168
 0 0 0  6 24 60 120 210 336

Frequentists object when talking about the probability that some hypothesis is true: "It is either true or false, so you cannot talk about a probability". I use the word credibility in order not to provoke the frequentists. The "credibility mass function" is a Bayesian probability mass function. Bo Jacoby (talk) 15:34, 3 August 2014 (UTC).[reply]

There are some examples here: [3]. Bo Jacoby (talk) 08:23, 6 August 2014 (UTC).[reply]

Maximizing Expected Value For A Unique Event

edit

Does the expected value of a probability distribution have a meaningful interpretation when there is just one draw from the distribution? In concrete terms, consider the following PMF:

 

Suppose you can choose between one draw from this distribution and a $5 guaranteed payoff. The expected value E[X] = $10, so if you maximize expected value, you will choose the draw. I can understand this reasoning if there were a large number of draws, since the average payoff will approach $10. However, I don't understand the reasoning if there is just one draw. Is there a mathematical reason to maximize expected value in this case?

Note: I know expected utility can differ from expected value, depending on the stakes, risk profile, and so on. I'm only interested in the mathematical interpretation, insofar as it makes sense to speak of one independent from someone's utility. OldTimeNESter (talk) 19:43, 1 August 2014 (UTC)[reply]

I suppose one could argue that even if you only play this game once, over the course of your life you will encounter many choices with different expected values. This is similar to playing one game multiple times. So you can make a similar argument in favor of maximizing expected value for all these choices over your lifetime.
But if you posit a hypothetical being that only lives long enough for a single play of this game, I don't think there is a meaningful argument in favor of maximizing expected value. Mathematical treatments tend to take as given that maximizing expected value is your goal, and proceed from there.--80.109.80.78 (talk) 22:43, 1 August 2014 (UTC)[reply]
Note that those who promote gambling (and hence call it "gaming") insist that the enjoyment people get from gambling itself has value, whether they win or lose. This is to defend themselves from charges that they are just ripping off stupid people. StuRat (talk) 22:52, 1 August 2014 (UTC)[reply]
Of course, such people would be the last to encourage you to maximize your expected value, since the best way to do that is not to gamble.--80.109.80.78 (talk) 23:31, 1 August 2014 (UTC)[reply]
Not if you add in a fudge factor for "intangible enjoyment value". StuRat (talk) 03:13, 3 August 2014 (UTC)[reply]

Risk aversion

edit
See risk aversion. Most rational agents are risk averse to some extent. You can minimize the risk for a given expected payoff or maximize the payoff with a given risk: see Markowitz portfolio theory, which is actually just an application of constrained optimizing toon from the calculus of several variables. Sławomir Biały (talk) 15:30, 2 August 2014 (UTC)[reply]

You may doubt that the stated PMF is correct. Who told you, and who told the guy who told you? Bo Jacoby (talk) 08:44, 2 August 2014 (UTC).[reply]

Expected value for frequentists and for bayesians

edit
The mistake you and several respondents are making is to apply the idea of expected value inappropriately. As you will see from our article, the concept of expectation is properly only applied when one is considering an infinite number of repetitions of a random experiment. "Expected value" has no meaning in a one-off game. Inappropriate application of expectation is a common error (including amongst working statisticians!). RomanSpa (talk) 12:35, 3 August 2014 (UTC)[reply]
Roman, that's false. "Expected value" is a property of a distribution. The OP gave us the distribution in question, so we can calculate its expected value and choose actions based on it, regardless of whether we draw from the distribution once or more.
OP - the expected value of choosing the draw is $10. Whether that is preferred to a certain $5 depends on your utility function. e.g., if you'll starve unless you get $5, you should choose the less risky option. -- Meni Rosenfeld (talk) 15:39, 3 August 2014 (UTC)[reply]
@Meni Rosenfeld: I'm highly confident that my remark was true. Whilst it is certainly the case that given most distributions it is straightforward to determine the expected value, this expected value is a property of the probability space - or, to put it in the form of a "naive" intuition, our average return per experiment over an infinite number of experiments. RomanSpa (talk) 18:17, 3 August 2014 (UTC)[reply]
It's certainly true that if you take a sequence of samples from a distribution, the average of increasingly more terms will go to the distribution's expected value whp. But that's not the only way to define the expected value, and it's not the only way to use it. The expected value of an rv / distribution is static, it just "is".

Utility

edit
In our case, the Von Neumann–Morgenstern utility theorem guarantees that under some reasonable assumptions, every agent's preferences can be encoded as a utility function over world states, where lotteries should be chosen based on maximizing the utility's expectation. So given a lottery, we calculate the probabilities and utilities of the various resulting world states, calculate the expected value, and base our decision on that. No need to assume repeating the same lottery multiple times. -- Meni Rosenfeld (talk) 19:40, 3 August 2014 (UTC)[reply]
The expected value involves both the distribution and the probability space. In this case, the probability space is a two-element set (win, lose) and the payoff is a random variable on this two element set. The expected value of this random variable in a straightforward manner (in this case, the integral is with respect to a two-point discrete probability measure with pmf P(win)=0.01 and P(lose)=0.99). There is nothing in the definition of expected value about the need to be able to repeat the experiment infinitely often. Sławomir Biały (talk) 19:44, 3 August 2014 (UTC)[reply]
@Meni Rosenfeld: I don't think we gain much by introducing utility to the discussion, as it rather blurs the main issue. RomanSpa (talk) 05:47, 4 August 2014 (UTC)[reply]
@Sławomir Biały: I agree that the expected value is easy to calculate, and I agree that in the formal definition there is no discussion of the need to repeat the experiment infinitely often. This is why I restricted by comment about an infinite number of experiments to the "naive intuition" phrase. Restricting ourselves to the formal, my point is that the expected value is a property of a system, and not of an event within the system. It's easier to see this if we consider another property: variance. For many systems calculating the variance is only a little more difficult than calculating the expected value. Just as with the expectation, we can use the variance to describe the system. However, when we consider a single experiment it is certainly the case that we can't calculate the variance of that experimental result. The variance is a property of the system, and doesn't have any meaning when applied to a single experimental trial. The same is true with expected value. RomanSpa (talk) 05:47, 4 August 2014 (UTC)[reply]
Honestly, I have no idea what "the main issue" is. In the OP, the main issue is definitely utility, as he asks about what we should "choose", and asks about being able to justify the choice of action that maximizes expectation.
In the discussion that followed, it seems you agree the expectation is a property of the system, which can be calculated given a description of the probability space and variable. And we (and everyone) obviously agree that the expectation cannot be deduced from a single observation. You either need knowledge of the system, or several observations to base your estimate on. So what is left unresolved? What is this horrible error that "working statisticians" commit? Can you put it in more concrete terms? -- Meni Rosenfeld (talk) 07:29, 4 August 2014 (UTC)[reply]

The dichotomy between frequentist and Bayesian statistics

edit
Actually, I think I understand the problem, and it is essentially the dichotomy between frequentist and Bayesian statistics. Bayesianism is the more correct and useful approach. Let's say a die was rolled, but we don't know the outcome. We'll denote by X the result. Since we don't know what X is equal to, it is a random variable. The mean of X is 3.5, and the variance is 35/12. These are both statements about our own knowledge about X. If when we observe X we get a result which differs from 3.5 by more than a few multiples of sqrt(35/12), we'll be surprised (never mind that we also know the distribution is uniform). Once we observe X, and now know it is equal to, say, 5, we have no reason to talk about the mean and variance of 5. -- Meni Rosenfeld (talk) 07:59, 4 August 2014 (UTC)[reply]
I think the "main issue" does not require discussion of utility, because we are attempting to answer this question: "Does the expected value of a probability distribution have a meaningful interpretation when there is just one draw from the distribution?" The question does not include reference to utility, and the questioner specifically remarks that he knows about expected utility and "I'm only interested in the mathematical interpretation, insofar as it makes sense to speak of one independent from someone's utility". Any answer that includes discussion of utility has missed the point of the question.
As for the question itself, my response (see above) is that it is inappropriate to apply information about the "expected value" of the system to a single event. In the example provided, knowing the expected value does not tell us anything useful about a single experiment that is not already known from the description. It is certainly not the case that the experimenter is deciding between $10 (the expected value if he plays the game) and $5 (his fee if he doesn't play), because there is no way he can receive a $10 payout. Either his payout will be $1000 or it will be $0. Talking about the expected value does not describe the truth of his choice, which is better phrased as "you can receive $5 for sure, or you can receive either $0 with 99% probability or $1000 with 1% probability". If he starts thinking about the expected value of his payout, he is misled about the real outcomes of the experiment. RomanSpa (talk) 08:55, 4 August 2014 (UTC)[reply]

@Roman. In your original post, you linked to our article as if the definition given there illustrated your viewpoint. Instead, we find the standard definition of the expected value, with no need to complete an infinite number of trials. Now you have moved the goalposts and said to disregard the definition that appears there, and that instead the expected value "is a property of the system" in some unspecified way. But, I think here we know what the system under consideration is: one where there are two possible outcomes (win, lose) and probabilities known a priori. So, perhaps I am failing to understand what you mean by "system". What is a system and what statement about the system does its expected value make? It would also be helpful if you could point out literature illustrating the correct and incorrect use of the notions of expected value. (Aside: I don't really see that your attitude on variance should be any less controversial that your attitude on the expected value, since variance is also an expected value, and also has a meaning that doesn't require more than one trial. So let's leave that out of the discussion for now.) Sławomir Biały (talk) 09:07, 4 August 2014 (UTC)[reply]

Oh dear, this seems to be getting rather more complicated than I had expected. I don't believe I've moved the goalposts (and I certainly hope I haven't!), so let me try again: in my original reply to Meni I mentioned a "naive" way of thinking about expectation which did indeed imagine an infinite number of trials, but I'd thought I'd made it clear that I was trying to avoid such thinking in my subsequent remarks. I'm happy to use the definition here, of course. By "system" I simply meant to refer to a probability space, together with any supporting logic (I was simply trying to avoid having to get distracted by tiresome discussions); if it makes it more helpful, for our purposes, we can simply say that the expected value is a property (I suppose, sensu stricto, a derived property) of the probability space (Ω, Σ, P). So to answer your question, "what statement about the system does its expected value make?", my answer would simply be "it gives us the result of a particular integral... " (which would show my background). I'd then go on to say "... and the useful thing about this integral is that it appears in the Law of Large Numbers". That is, if we can calculate the expected value, we learn something about what would almost always happen if we were to continually repeat the experiment. That is to say, what ties the result of our expected value calculation to what we can observe in the real world is the concept of a continually prolonged set of outcomes: the expected value tells us something about this set. It doesn't tell us something about the elements of this set.
Now one obvious reply to this is that a single experiment provides us with a set of results containing a single element, and this is of course true, but what it fails to note is that it is not just necessary for the set to exist, but it is also necessary for the set to be understood as continually extended by further repetitions of the experiment. When we think about expectation there is always a tacit repetition going on. But this is not what's going on in this case.
I'm uneasily aware that it might seem to some readers that I am skating very close to the "infinite number of trials" I disavowed earlier, but I hope you will see that I am not, but am seeking to simply to discuss limits as sets are increased indefinitely.
As for useful references, most of the standard textbooks contain cautions about what may be inferred about small samples. At a slightly deeper level, I first thought about this as a result of reading "Probability and Hume's Inductive Scepticism" by David Stove, which led me directly to Keynes' and particularly Carnap's thoughts on the philosophy of probability. There's a very good paper by Carnap on this, but (inevitably) I can't remember the exact title. RomanSpa (talk) 11:59, 4 August 2014 (UTC)[reply]
"...most of the standard textbooks contain cautions about what may be inferred about small samples..." Who cares about inference? The distribution is already given to us. The expected value is obviously a relevant statistic, but it has apparently been dismissed as meaningless formalism. How would you explain the meaning of this statistic for a single event? A failure to answer this question is a serious flaw in your conceptual framework, and needs addressing. If you cannot assign meaning to the simplest statistic there is, then your conceptual framework is too rigid to accommodate any kind of statistical reasoning regarding the outcome of a single event. Sławomir Biały (talk) 13:07, 4 August 2014 (UTC)[reply]
I think you're missing my point, which is that the expected value is a statistic whose meaning arises from the consideration of continual repetitions of an experiment. I'm not saying that it's a meaningless formalism, but that it has meaning in a particular context, and blandly presenting it out of context is not helpful, and frequently leads to misunderstandings and errors. Which is where we came in. RomanSpa (talk) 13:20, 4 August 2014 (UTC)[reply]
It would be a more defensible position if you were to say that statistical reasoning could not be applied at all to a single event (although one that I would still disagree with). But you seem to want to have it both ways. On the one had, you are asserting that statistics can meaningfully be applied to form one-off decisions, and yet on the other you are asserting that all statistics are meaningless for one event. If I'm wrong about this, perhaps you could identify a statistic that is meaningful, and we can work from there.
As for the OP, I have already given an answer that incorporates the idea of risk. If the agent is risk-neutral, then the expected value is the relevant statistic. You could argue that this is what the OP intended when he said that he did not want an approach using utility functions. In any case, the notion of utility function does give a way to incorporate risk-aversion into the decision if necessary. Expected value will certainly be a part of any such model, though. And arguing that it is meaningless for a single event is just wrong. Sławomir Biały (talk) 13:36, 4 August 2014 (UTC)[reply]

@Meni. It's worse than that. A strict frequentist might object that the probabilities themselves for a one-draw event are meaningless. (They would also be wrong.) But Roman has asserted that the probabilities are meaningful, yet apparently no kind of statistical reasoning based on those probabilities is. Yet here he offers no alternative other than a restatement of the problem, as if the problem were its own solution. Sławomir Biały (talk) 09:54, 4 August 2014 (UTC)[reply]

@Sławomir Biały: No, that's not what I'm saying. I certainly accept that statistical reasoning can be applied to this problem, but what I'm saying (though I'm beginning to get the sense that I'm not expressing myself sufficiently clearly) is that raw statements about the expected value of the experiment are not helpful in this case, because the meaning of the statistic is not compatible with the conditions under which the experiment is performed. RomanSpa (talk) 12:15, 4 August 2014 (UTC)[reply]
So, what statistic do you propose that we should use instead and how does it solve the problem? (Apparently both the expected value and variance are meaningless here, and I struggle to think what other statistic could be meaningful under these baffling circumstances.) Sławomir Biały (talk) 12:27, 4 August 2014 (UTC)[reply]
@Roman: As I mentioned, the expectation just "is". The expected value is a number characterizing the game played in some way. There are many other numbers characterizing it in other ways. What we do with these numbers is entirely up to us, and depends on what we are trying to accomplish. You can say that some given methodology wouldn't produce good results for a given purpose, but to suggest that we somehow are "not allowed" to use the expected value in our analysis is absurd. It's like saying that when I'm deciding whether to take with me an extra battery for my phone, I'm not allowed to use the battery's energy capacity in my calculation, because the battery's behavior is more complicated than just this number.
If the OP, or the player, just wants to know, for curiosity's sake, what the expectation is, then we can tell him it's clearly $10. But anything more relevant to his skin in the game will necessarily boil down to what he wants, that is, his utility function.
If we assume his utility function is linear, we will recommend him to maximize  . If it's logarithmic, we'll recommend maximizing   or its approximation  . The possibilities are endless, and we're free to use any summarizing numbers about the system that we wish. -- Meni Rosenfeld (talk) 10:52, 4 August 2014 (UTC)[reply]
@Meni: It's quite weird to be having this conversation, and I'm beginning to suspect we've got our wires crossed at some deep level. I entirely agree with you when you say "[T]he expectation just "is". The expected value is a number characterizing the game played in some way. There are many other numbers characterizing it in other ways. What we do with these numbers is entirely up to us, and depends on what we are trying to accomplish." Where I think we're getting confused is when you assert that I'm seeking to say that the expected value is "not allowed"; I'm not saying this. Rather, what I'm trying to say is that on its own an expected value is logically incompatible with this particular experiment.
Think about it this way: imagine a computer that calculates statistics and prints out the answer. When we ask it to consider this game, it prints out the following text: "If this game is played repeatedly, then almost certainly the more often the game is played, the closer the actual realised average payoff will be to $10 per game". This absolutely doesn't tell us anything about what will happen if we only play the game a finite number of times. However, the machine then burps and prints some more: "If you only play the game N times, then the actual realised average payoff will be within $X of the expected value with probability p", and the machine prints off a useful table. So the first bit of text is useless to us, because it says something about a situation that doesn't hold in the real world. But the second paragraph (and accompanying table) give us useful information about what will hold true in the real world, where we are only carrying out the experiment a finite number of times. That is, although Sławomir worries that I don't accept any kind of statistical reasoning here, I do; I just think that mentioning the raw expectation on its own is inappropriate. I'm actually quite surprised that this seems to be a subject of such discussion. RomanSpa (talk) 12:39, 4 August 2014 (UTC)[reply]
Ok, now we have something to work with.
What you call "incompatible", I call "incomplete". Yes, if all we know about the game is the expected payoff, we don't know a whole lot. But we do know more than if we didn't know even that. And again, what we choose to do with this knowledge is up to us.
So we agree that "If you only play the game N times, then the actual realised average payoff will be within $X of the expected value with probability p" is useful information. Surely you'll agree that the similarly structured "with probability at least 90%, the payoff will be at most $100" is useful. But this statement follows from the knowledge that the mean is $10 (and the assumption that the payoff is nonnegative). So merely knowing the expectation gives us this information.

Variance

edit
If in addition to just the mean we also have the variance, we can get better bounds with Chebyshev's inequality. These are tiny examples - both Markov's inequality and Chebyshev's inequality are extremely weak. But they demonstrate that knowing the expectation tells you what you should expect from the single experiment. The fact that there is a theorem saying the expectation is equal to the average over infinitely many trials, doesn't mean it has no information content on a single trial. If I know the mean is $10 and the sd is $1, I'll know to expect that the result will be around $10, even if I don't know the complete distribution.
In fact, I don't know why you would think that statements of the form "If you do X, the probability that Y is p" are good, while "If you do X, the expected payoff is Z" are not. A probability is nothing more than the expected value of a Bernoulli trial, and it is subject to uncertainty which diminishes as you repeat more times. If you're not comfortable discussing the expectation of a single experiment, you shouldn't be comfortable discussing the probability of a single (or finite number of) experiment.
And again I must go back to the point that what we choose to do with the information we have (even if all we know is just the expectation) is up to us. If I was presented with the game, I wouldn't need to know the exact payoff distribution to decide. All I'll need to know is the mean and variance. -- Meni Rosenfeld (talk) 13:43, 4 August 2014 (UTC)[reply]
@Meni: I think we're getting close to agreement here, but I need to clarify one point: when you use the expected value to make inferences about the possible outcomes of a finite number of experiments, you are certainly reaching useful conclusions, but these inferences talk about something different (the finite world) from what expected value talks about (a world where experiments are continually repeated). RomanSpa (talk) 15:02, 4 August 2014 (UTC)[reply]
I'm not going to agree that "expected value talks about a world where experiments are continually repeated". Expected value is a property of a distribution, defined e.g. as an integral, which has many features. One of these happens to be related to repeating an experiment continually, but that's not all that expectation is about. -- Meni Rosenfeld (talk) 15:07, 4 August 2014 (UTC)[reply]
(ec w/ Yohan) I am less charitable. Roman still seems to be conflating inductive inference with decision theory. The idea of a computer spitting out statistics after multiple trials is a red herring. We already know the distribution. And no one here is saying that the expectation value gives you the distribution. But Roman seems to be asserting not just that the expectation value gives us incomplete information, but that it is actually meaningless. Sławomir Biały (talk) 14:12, 4 August 2014 (UTC)[reply]
I hope I haven't said that the expected value is "actually meaningless", because that's not so. My point is that the expected value tells us something about what happens in a particular case (when we continually repeat the same experiment over and over). That is, it is meaningful to talk about the expected value in that case. Further, we can use this information to say things about other particular cases, including those cases where we only perform the experiment a finite number of times. But, it isn't (to use a suitable word!) "meaningful" to talk about the raw expected value with reference to a single experiment, because "expected value" is about something else. RomanSpa (talk) 14:47, 4 August 2014 (UTC)[reply]
I'm still baffled by your insistence that a single event can be subjected to statistical reasoning, and yet there are apparently no meaningful statistics that apply to a single event with a known probability distribution. This is surely the most important test-case for your philosophical position. Otherwise there is no point in doing any statistical analysis to begin with if we are unable to form opinions about the outcome of a single event with a known probability distribution. Sławomir Biały (talk) 15:58, 4 August 2014 (UTC)[reply]
One time is a finite number of times. YohanN7 (talk) 14:53, 4 August 2014 (UTC)[reply]
And we've gone full circle to the point where I say: No. The limiting case of continuous repetitions is just one feature of expectation. It's not the only place where it comes into play. -- Meni Rosenfeld (talk) 15:00, 4 August 2014 (UTC)[reply]
If you argue that knowing one thing (call it one draw) is exactly useless, then it follows by simple induction that knowing N things (call it N draws) is exactly useless. This is obviously wrong. Then, also, the OP doesn't ask us to analyze an unknown probability distribution. YohanN7 (talk) 13:53, 4 August 2014 (UTC)[reply]
@YohanN7: This isn't what I said. (Also, you haven't supplied the inductive step in your induction! :-) ) RomanSpa (talk) 14:49, 4 August 2014 (UTC)[reply]
I've caught up now with some previous inline discussion. I think the crux is this statement: "... and the useful thing about this integral is that it appears in the Law of Large Numbers". No, that's a useful thing about this integral. It's far from being the only useful thing. I've demonstrated above several ways to use it. -- Meni Rosenfeld (talk) 14:57, 4 August 2014 (UTC)[reply]
@Meni: That's a very fair point: "... a useful thing..." it is! RomanSpa (talk) 15:03, 4 August 2014 (UTC)[reply]
A tangent topic. Every good poker player knows that maximizing expectation is the most important factor in decision making. It doesn't matter whether this is the first, last or only hand you play. The topic of poker is inherently much more complicated than what has been discussed here. Little is known (Nash equilibria, for instance, exist, but are impossible to compute except in really simple cases), but you are often dealing with a "probability distribution" (quotation marks needed unless we are playing against certain kinds of robots) that is unknown to you. The one thing that you can often estimate is the expectation value. Higher moments (is that the term) like variance of certain plays can sometimes be estimated as well. It is a rare time that a good poker player goes against what the expectation value dictates. He might make a marginally bad play to gain in future hands, or he may make a bad decision because he can't afford to go broke this hand. So, even in cases where the exact expectation value is unknown, the players guess of it will be the most influencing factor. YohanN7 (talk) 15:34, 4 August 2014 (UTC)[reply]
I think I'll have one last go at this before getting on with my life  :-) .
Think of it from the point of view of a computer programmer: inside a computer program there are variables, and in many programming languages these variables have a "type", such as "integer", "floating point", "Boolean", "character", and so on. When we write a function to perform some calculation, the inputs to this function may be of one or more types, and the output of the function will also be of a particular type. (I'm here ignoring overloading.)
We can think of "expected value" as having the type "statement about what happens when you continually repeat an experiment", while inferences about finite cases have the type "statement about finite cases". A statement about a single trial must have type "statement about finite cases" to have meaning. However, just as we can write a function whose output is of one type using variables whose inputs are of other types, so we can infer a statement about a single trial - a "statement about finite cases" - using inputs of other types, including those of type "statement about what happens when you continually repeat an experiment". I'm merely remarking that there is a type incompatibility: expected value is a "statement about what happens when you continually repeat an experiment", while to say something meaningful about a single trial you must say something of type "statement about finite cases".
I'm quite surprised that this isn't clear, but it's certainly been an instructive discussion. Thank you. RomanSpa (talk) 15:26, 4 August 2014 (UTC)[reply]
The definition of "expected value" doesn't involve "experiments" at all. There is an interpretation of it, "statement about what happens when you continually repeat an experiment". The latter is probably too shaky in most mathematician's tastes to serve as a definition. If you stick to the original definition, there are no "statements of incompatible types" to worry about. The expectation value — as opposed to the quality of a measurement of the expectation value, is independent of the "number of draws". YohanN7 (talk) 16:40, 4 August 2014 (UTC)[reply]
So, assuming this analogy is apt, I should be unable to compute the expected value if a single random variable with known probability distribution. But clearly this is wrong, and I think you already conceded that point. In fact, it is trivial to program a computer to do just that. Sławomir Biały (talk) 16:51, 4 August 2014 (UTC)[reply]
As we repeated continually, we disagree with the premise that "expected value" has the type "statement about what happens when you continually repeat an experiment, so this analogy isn't really going to help. -- Meni Rosenfeld (talk) 20:14, 4 August 2014 (UTC)[reply]

Language

edit

Perhaps it is a problem of language rather than of math. The term 'expected value' is confusing, because nobody expect that value. The term 'mean value' applies to the totality of outcomes, but not to a specific outcome. The OP has 100 envelopes, 99 are empty and 1 contains $1000. The mean value is $10 per envelope, and the standard deviation is $99.50. So the value is $10.00±99.50. If one envelope is picked randomly, the value of that envelope is either $1000 or 0, we don't know. We can say that the value is $10.00±99.50, meaning that the order of magnitude is $10.00, and the statistical uncertainty is $99.50. Perhaps such words are more acceptable. Bo Jacoby (talk) 17:46, 4 August 2014 (UTC).[reply]

I expect that value. That is, I expect the experimental result to be close to the expected value, where "close" is quantified by the variance. -- Meni Rosenfeld (talk) 20:14, 4 August 2014 (UTC)[reply]

Can we agree to answer the OP with "Yes, there is a mathematical reason to maximize expected value", or is there still disagreement? Recall that the OP isn't interested in "expected utility". (In my view, it would be rather strange if this otherwise excellent reference desk collectively would come up with something different after several days of discussion.) YohanN7 (talk) 21:04, 4 August 2014 (UTC) I introduced subsection headers above in order to structure the discussion. Feel free to improve it. Bo Jacoby (talk) 09:59, 7 August 2014 (UTC).[reply]