Wikipedia:Reference desk/Archives/Mathematics/2011 March 14

Mathematics desk
< March 13 << Feb | March | Apr >> March 15 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


March 14

edit

complex analysis

edit

(stupid question removed)...I take it this is correct wikiquette? Robinh (talk) 09:24, 14 March 2011 (UTC)[reply]

I think striking out your question and adding a retraction is better. -- Meni Rosenfeld (talk) 10:08, 14 March 2011 (UTC)[reply]
OK, I'll do that next time I ask a stupid question. best, Robinh (talk) 18:39, 14 March 2011 (UTC)[reply]

Quick identification of outliers

edit

I know that this can be done using mean average and standard deviation, but I would like to know if there is any form of quick identification of outliers that I can study. For example, assume you have a person's yearly weight as 178, 191, 184, 81, 180. It is clear that the 81 is an outlier (it was measured in kg, not lb). What is the computationally fastest method for identifying 81 as an outlier? -- kainaw 14:53, 14 March 2011 (UTC)[reply]

Quickly removing outliers without thinking about them using an algorithm is what lead to the ozone hole being missed. What you said above is probably easiest, just using the average and some number of standard deviations you choose. Here's some real systolic/diastolic readings for a few days, you can see that just chopping out figures that seem out can be quite wrong here: 155/89 94/58 83/53 130/84 Dmcq (talk) 16:16, 14 March 2011 (UTC)[reply]
I'm not looking to remove outliers. I'm looking to identify them for inspection. I have 10 years of weight/height information on over two million patients and I want to quickly mark which patients have data with outliers so the outliers can be verified as correct or as incorrect (and then corrected). I expect about 10% of the data to be outliers, so I don't want to go through all of it by hand. I am using standard deviation now, but it will take another hour or so to finish processing. That is why I'd like to study a different (faster) technique. -- kainaw 16:44, 14 March 2011 (UTC)[reply]
10% outliers! That's huge. The bit of statistics I do I normally just include all the data including outliers unless I'm very very certain about them. Are you counting data not supplied as outliers? I can't think of a better way of doing it than what you say. Dmcq (talk) 16:50, 14 March 2011 (UTC)[reply]
(edit conflict) You could look at the year-on-year delta and throw out any measurement that had a year-on-year change of more than, say, 50 pounds. But that 10% estimate is worrying. If I had a data set in which 10% of the observations were so far out of line that they were obviously incorrect, I wouldn't have much confidence in the quality of the remaining 90% either. Gandalf61 (talk) 16:55, 14 March 2011 (UTC)[reply]
10% is conservative. Most people in this position report around 50% error. I am assuming that I'm working with better data. The issue is units. Nurses take a weight measurement in pounds, but record it as kilograms. Height gets messed up as well because there are so many ways to write something like 5 feet 8 inches: 5'8", 5/8, 68, 1.7, etc... If you tell a computer that you just measured 5.8 meters, it will record the value and unit. What I'm doing is examining outliers to see if it is clearly an issue with units. -- kainaw 17:03, 14 March 2011 (UTC)[reply]
Instead of using the average and standard deviation, you might want to use the more robust median and inter-quartile range. In particular, if you have a large fraction of outliers (I doubt that with 50% it makes much sense to speak of outliers in a purely statistical sense), the estimate of standard deviation will be heavily influenced by the outliers, leading to an overestimation of standard deviation and a reduction of the efficiency of outlier detection. However, if I understand you problem correctly, you could maybe simply define a range of reasonable values (you wouldn't expect anybody to be 5.8 meters tall) and chop off everything outside this range. --Wrongfilter (talk) 19:42, 14 March 2011 (UTC)[reply]
That approach won't work in all cases. For example, some people weigh 100 pounds and others weigh 100 kilograms. However, if the same person changes from 100 lb to 100 kg between visits, that would be a red flag (unless they REALLY pigged out over the intervening holidays :-) ). StuRat (talk) 22:36, 14 March 2011 (UTC)[reply]
Correct. Height isn't much of an issue. Weight is. It is perfectly acceptable to have someone be 70lbs or 70kg. Similarly, 200lbs and 200kg is acceptable. The overlap between pounds and kilograms is huge. Similarly, the overlap between serum creatinine and urine creatinine is huge. The two labs are often mislabeled. So, if a person has a history of serum creatinine and suddenly the value jumps up beyond a standard deviation, I can flag it as a possible urine creatinine. There are many vitals/labs in which sloppy data entry is a major problem and I'm simply trying to flag possible errors as quickly as possible with a long-term goal of highlighting them during data entry to try to influence the humans to take a tiny bit of care in entering what should be accurate information. -- kainaw 05:23, 15 March 2011 (UTC)[reply]
You should take some logarithm of your data before doing anything else. Height and weight cannot have negative values, and so they cannot have a normal distribution. The logarithm of height or weight, however, may have a normal distribution. Taking the natural logarithm of 178 191 184 81 180 you get 5.18 5.25 5.21 4.39 5.19. Note that the log of the conversion factor between kilogram and pound is 0.79, and 4.39 + 0.79 = 5.18. So you are not looking for outliers in general, but rather for outliers that ceases to be outliers when increased by 0.79. Bo Jacoby (talk) 13:05, 16 March 2011 (UTC).[reply]

Power Series

edit

It's been a couple years since I had complex analysis so I want to make sure I understand a certain aspect. So, my question is, have I put all this together correctly?

I'm looking at a certain function and I want to know a bit about the coefficients of its power series. I can prove that the function itself is bounded on any closed disc around any point in the complex plane. This means there are no poles or singularities for the function on the entire complex plane, i.e., it is holomorphic on the entire complex plane. Therefore, the function has a unique power series which converges to the function for all complex numbers, i.e., the radius of convergence is infinity. Therefore, we know  , where   are the coefficients in the power series. Since the terms are all nonnegative, this actually means the limit itself exists and is 0.

Is that all correct? Thanks StatisticsMan (talk) 17:07, 14 March 2011 (UTC)[reply]

Hmm, okay and the function is continuous. I thought of a counterexample to the above where I start with the same function but change its value at a few points and then it is no longer holomorphic as it is not even continuous. StatisticsMan (talk) 18:31, 14 March 2011 (UTC)[reply]
I'm not sure. Being continuous and bounded isn't enough to prove homolomorphisity, is it? Homolomorphisity is a differentiability condition. For example, take ƒ(z) = Re(z). This function is continuous and bounded on all of C, but it's not homolomorphic. Fly by Night (talk) 19:13, 14 March 2011 (UTC)[reply]
You are correct. Okay, so I'm forgetting obvious things. I know the function is a composition of two entire functions and therefore is entire. How about everything after that? Thanks for the reply! StatisticsMan (talk) 19:17, 14 March 2011 (UTC)[reply]
It seems fine to me. I'm not sure I like the use of the phrase unique power series; but that's just a matter of taste I guess. You have a well defined and convergent power series expansion about each point of the complex plane. Take a look at function germ if you feel like more reading. These are equivalence classes of functions. A function and its power series look very different, but provided the function is holomorphic at the point of interest, they are essentially the same in a local way. That's what germs try to make concrete. Fly by Night (talk) 19:35, 14 March 2011 (UTC)[reply]
Well, the coefficients of a power series centered at 0 for an entire function are unique. That's all I care about. StatisticsMan (talk) 19:42, 14 March 2011 (UTC)[reply]
Oh, brilliant. Well you're fine then. Fly by Night (talk) 19:48, 14 March 2011 (UTC)[reply]
Okay, thanks for the help. StatisticsMan (talk) 19:51, 14 March 2011 (UTC)[reply]

Problem with completing the square

edit

I'm helping a friend with putting equations into vertex form, which requires completing the square.

y= - 2x^2+6x+1

This one is giving me problems in particular. The -2 in front of the X^2 in particular is throwing me off completely. If I divide through by 2, then I will get 3x eventually which doesn't mesh easily into a form that can be factored into a square. Need some help. ScienceApe (talk) 23:16, 14 March 2011 (UTC)[reply]

To start, you should divide both sides of y = 1 + 6x – 2x2 by –2 to give
 
Completing the square on the right hand side gives:
 
Finally, multiply back through by –2 to give
 
Is that what you were looking for, or are you after something else? Fly by Night (talk) 00:05, 15 March 2011 (UTC)[reply]


I would do it slightly differently:
 
Completing the square yields:
 
Finally, multiply back through by –2 to get:
  — Preceding unsigned comment added by StuRat (talkcontribs) 01:16, 15 March 2011 (UTC) [reply]