Wikipedia:Reference desk/Archives/Mathematics/2011 August 3

Mathematics desk
< August 2 << Jul | August | Sep >> August 4 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 3

edit

Statistical significance concept

edit

Hi, if one can choose from a large number of experiments that look for patterns in random happenings, it is probably not very remarkable to find a result that looks unlikely to be the result of chance (in other words, looks statistically significant). Is there a name for this concept? I was reminded of this while glancing at the article Mars effect. There must be many celestial events that one could try to correlate against sporting performance, so probably sooner or later someone is going to find something that looks statistically significant. A similar scenario might involve looking for "unusual" patterns in a list of random numbers. Given that there are a large number of "unusual" patterns, it becomes that much more likely to find at least one of them, even if, once that one has been identified, the probability of it occurring can be shown to be low. So, what is this effect called? 86.161.61.206 (talk) 01:18, 3 August 2011 (UTC)[reply]

Statisticians call it the problem of multiple comparisons -- the accepted solution is to do a Bonferroni correction. Our article on testing hypotheses suggested by the data also discusses ways of dealing with the problem. Looie496 (talk) 02:34, 3 August 2011 (UTC)[reply]
Thank you! 86.181.200.241 (talk) 11:17, 3 August 2011 (UTC)[reply]
Also look at Type I and Type II errors86.155.185.195 (talk) 18:33, 3 August 2011 (UTC)[reply]
One occasionally sees the disparaging term "a posteriori probability" for that. Say, you accidentally run into an old school mate in a bar halfway around the world from where you grew up. You ask yourself: "Wow, how likely was that?". You find a very small probability, of course. The thing is that you would have reacted in the same way if you were hit by a meteorite of if any other unlikely event happened to you, in which case it would never have occurred to you at all that you could have met that school mate. Because there are so many possible events that are unlikely, taken by themselves, something unlikely happens to you all the time. Now, if today you'd calculate the probability of running into that school mate tomorrow, and tomorrow you actually did, then that would give you some legitimate reason to wonder. Finding an unusual pattern in random data may lead to the formulation of a hypothesis that can be tested in a new experiment, but one should be very skeptical if the data that lead to a hypothesis are also used to support it. --Wrongfilter (talk) 10:04, 4 August 2011 (UTC)[reply]
Pattern recognition is useful, for its helpful with constructing rules, (such as maths and laws), following those rules, solving problems, as well as recognizing novel and familiar phenomena such as fraud, security, danger, food and the tools we need to survive. All this problem-solving ability that we are naturally endowed with leads us to useful knowledge, apocryphal knowledge and apophenia (the experience of seeing "meaningful" patterns in random data). Simply knowing that some patterns are simply a result of randomness, as the questioner points out, is helpful, for instance see the article infinite monkey theorem. Patterns are ubiquitous, thus I find the coincidences of small world examples to be both comforting and disconcerting, and fortunately, the same pattern-recognition skills that sometimes lead us astray can also be used to ferret out mistakes. --Modocc (talk) 15:45, 4 August 2011 (UTC)[reply]

BZ2 question about entropy

edit

I have two data sets. I compress them with BZ2, which uses removal of redundant sequences for compression, and both compress to 50% of the original size. Now, just out of curiosity, I make two new files by simply copying the original files to the new ones twice. So, if the old file was "abcde", the new file is "abcdeabcde". I compress them the same way. One compresses to 50% of the original size just like before. The other compresses to 75% of the original size. So, I triple the files ("abcdeabcdeacbde") and compress them again. The 50% one remains 50%. The one that was at 75% now compresses to 87.5% of the original size. I quadruple it and one is still at 50%. The other is now at 93.75% of the original size. So, no matter how many times I repeat the original file, one remains at 50% compression. The other steadily approaches 100%. Why? Without looking at the original data, does this trait imply something about how much entropy there is in the data? -- kainaw 14:21, 3 August 2011 (UTC)[reply]

I think it actually implies something about the BZ2 algorithm. You might be more likely to find somebody with in-depth knowledge of it on the Computing desk. Looie496 (talk) 16:39, 3 August 2011 (UTC)[reply]

Epsilon

edit

Why do they use epsilon to represent a small positive number? --134.10.113.198 (talk) 16:20, 3 August 2011 (UTC)[reply]

I think it's arbitrary. A more natural choice might be ι (iota), but that would be hard to recognize. Looie496 (talk) 16:35, 3 August 2011 (UTC)[reply]
There's no way to answer your question with certainty. It's just the way it is. Why do we call an apple an apple? Why do we use x, y and z for the coordinates in 3-space? Why do we use letters like m and n for integers? One reason I would conjecture is that a school of mathematicians working together in the same area shared common notation and their work made major impact. People tend to copy the people they learned from. Fly by Night (talk) 23:04, 3 August 2011 (UTC)[reply]
It's historical: δ was already in use for a long time to refer to an infinitesimal change in the independent variable, so the next letter of the Greek alphabet, ε, was used in the rigorous epsilon-delta definition of a limit. An apocryphal, but more satisfying, explanation is that ε stands for the absolute error of the output of a function subject to a small perturbation (δ) in its input (due, for instance, to inaccuracy of the outcome in an experiment). Sławomir Biały (talk) 00:16, 4 August 2011 (UTC)[reply]
wikt:apocryphal Sławomir Biały (talk) 01:56, 4 August 2011 (UTC)[reply]
Yes, once a notational precedent gets started, it's difficult (and often low-value) to change it. Related issues: this guy [1] wants us to stop using pi to denote the circle constant. Also, Paul Erdős was so influenced by the usage of epsilon for small things that he reportedly referred to all children as epsilons :) SemanticMantis (talk) 18:17, 4 August 2011 (UTC)[reply]

At limit of a function it says "Weierstrass first introduced the epsilon-delta definition of limit in the form it is usually written today" and has a bit of additional stuff about the history of epsilon-delta methods. Michael Hardy (talk) 00:26, 6 August 2011 (UTC)[reply]

a question of modal logic, validity and rules of inference

edit

This question refers a book, "Saul Kripke" edited by Alan Berger Cambridge University Press, 1 edition (June 13, 2011), ISBN 0521674980, specifically to statements in Chapter 5 Kripke Models by John Burgess. As not everyone will have access to this volume, I will attempt to summarize the exposition leading up to the point of my question; I apologize for the length of this section but would appreciate any insight.

As a prelude to discussion Kripke's model theory for modal logic, Burgess first discusses sentential (propositional) logic. He defines validity for a formula as truth in all models, satisfiability truth in some model. A proof procedure is a set of axioms or schemata describing types of forumlae that are valid and rules of inference such as modus ponens (MP).

For sentential logic a model is just a valuation V that assigns T or F to the atoms. A model theory is obtained by extending the valuation for atoms to a valuation for more complex formulas via the clauses for the connectives: [table p120]

  • A is true in M iff V(A) is T
  • ~A is true in M iff A is not true in M
  • A&B is true in M iff A is true in M and Bis true in M
  • [etc.]

Later, the author states that we cannot get a model theory for modal logic merely by extending these definitions of truth with clauses for modal operators as: [table p123]

  • Box(A) is true in M iff necessarily A is true in M
  • Dia(A) is true in M iff possibly A is true in M

since, he observes, since all mathematical truths are necessary, then if A were true in M, Box(A) would also be true in M for any M which "would make A -> Box(A) valid, which it ought not to be. (p123)". This is my first point of confusion, perhaps this last observation should be obvious, but I'm afraid it isn't to me.

But even accepting that, earlier it's stated that modal logics include in their proof procedures the rule of necessitation which permits one to infer Box(A) from A, which would seem to contradict the above statement that A->Box(A) should not be counted as valid.

Unless the problem is that I am conflating |- (turnstile) with -> (implication), that is:

A |- Box(A) (Rule of Necessitation)

is not the same thing as:

|- A->Box(A) [The (apparently erroneous) validity of A->Box(A)]

But isn't that just the Deduction Theorem? Is this theorem not valid for modal logics?

Thanks, BrideOfKripkenstein (talk) 18:22, 3 August 2011 (UTC)[reply]

So, I believe you're a bit confused on the Rule of Necessitation (or I am). I believe the rule is from an axiom A I can infer Box(A). So the rule requires an axiom. In the Kripke semantics this makes sense; any axiom (by soundness of the system) will be true outright in any world. Thus, one semantically, we see that it is valid to infer it is necessary. For example, we have the law of the excluded middle as a classical axiom,  . RN in the modal system then says that  . But, for instance, just the propositional variable P should not imply Box(P). Semantically, this makes sense, since the Kripke countermodel is 2 worlds, one where P is true, the other not, and the accessibility relation is the complete one. So, syntactically, one does not want, from P to infer Box(P), and RN does not give us that power as P itself is not an axiom. Hopefully that is helpful (and correct) Wgunther (talk) 00:44, 4 August 2011 (UTC)[reply]