Talk:Gaussian process

Latest comment: 1 year ago by Tensorproduct in topic A word of caution regarding the "Wikibook"


Inadequate material for non-technical readers

edit

The introduction is pretty much impenetrable for a lay reader. The attempt to define simply what a stochastic process is, for example, says "a collection of random variables indexed by time or space". But what if the reader is not familiar with random variables? Or does not instantly grasp what indexing by time or space means? A real-world example would go a long way here, especially one where the random variables and space/time-indexing can be concretely and intuitively linked to something in everyday experience.

This theme continues throughout the whole article, as the reader is assumed to have a strong mathematical or statistical background. There is a distinct lack of non-specialist, non-abstract examples. The introductory text in the "Applications" section fails to identify a single concrete example of a problem that Gaussian Processes might be applied to. "Given any set of N points in the desired domain of your functions..." OK, but what might those points and functions represent in the real world? "Gaussian processes are thus useful as a powerful non-linear multivariate interpolation tool." OK, but what kind of real-world problem might require non-linear multivariate interpolation?

Etc. — Preceding unsigned comment added by 2A02:6B6E:B8CD:0:7D5A:C359:DADB:4C38 (talk) 21:49, 30 December 2021 (UTC)Reply

Article talks separately about kernel functions and covariance functions -- it would be good to explain their relationship

edit

Gaussian Process vs. integral of Gaussian Process

edit

Is the integral of a Gaussian process somehow also a Gaussian process? Or is this just a common abuse of terminology? I think it's the later, and made some changes to reflect that... — Preceding unsigned comment added by 132.204.26.35 (talk) 21:33, 16 September 2014 (UTC)Reply

An integral is a linear operator, and linear transformations of gaussian distributions are gaussian, so it is still a gaussian process. Joanico (talk) 18:36, 23 May 2020 (UTC)Reply

Untitled

edit

Added cleanup tag: this article does not give someone in the field an adequate overview of what a Gaussian process is, and goes off on a tangent involving undefined math. —Preceding unsigned comment added by Ninjagecko (talkcontribs)

Perhaps it could be made accessible to a somewhat broader audience, but where does it go off on a tangent or get into "undefined math"? It gives the definition and a simple characterization, and then it lists examples, with links. Michael Hardy 21:45, 4 December 2006 (UTC)Reply
Of course, any article can be improved in many ways, and surely this one can also. However, I have no idea what you mean by undefined math. In addition, I would have thought that for someone in the field, this article is rather banal and uninteresting, since surely its contents would be already familiar to such an individual. Do you mean someone not in the field? --CSTAR 03:54, 5 December 2006 (UTC)Reply
Somehow my reply never went through. Michael-- Yes, you're right. Technically the indices were previously defined way at the top, thus I removed the cleanup tag. Nevertheless it wasn't very clear I thought, so I improved the article lots by categorizing all the glomped-up text, and making the definition abit clearer. CSTAR-- No, I meant what I said: "someone in the field". Even as a reference, it was hard to follow. I've already fixed it though. —The preceding unsigned comment was added by Ninjagecko (talkcontribs) 09:21, 6 December 2006 (UTC).Reply
Also CSTAR, I personally find it rather haughty, to imagine the only people who have any business reading this entry are people who've been working with this material for 4+ years. The point of a reference is to be a reference for someone who wants to learn or brush up on the material. No offense. Ninjagecko 09:24, 6 December 2006 (UTC)Reply
I don't think your statement(the only people who have any business reading this entry are people who've been working with this material for 4+ years) paraphrases in any way what I said. In any case what I had intended to say was that the article was technicaly correct. --CSTAR 13:36, 6 December 2006 (UTC)Reply

suggestions for clarification

edit

I'm not in the field, and I have found some things I wish this article would clarify. Please feel free to say there is some other, introductory article to the topic that I should have read which would have explained the answers to my questions.

  • 1. What is an easy, mathematical example of a Gaussian process?
  • 2. Does the definition imply that a Gaussian process is normally distrusted? (I think the answer is obviously yes, but I have no experience to justify changing this article.)
  • 3. How does the definition imply the parenthetical remark "any linear functional applied to the sample function Xt will give a normally distributed result"? An example? So integrating Xt yields a Gaussian process?
  • 4. What is a sample function? pdf? cdf? Other types?

141.214.17.5 (talk) 19:46, 10 December 2008 (UTC)Reply

After looking around some more, I can't tell why this doesn't redirect to the article for multivariate normal distributions. Any explanation? 141.214.17.5 (talk) 16:11, 11 December 2008 (UTC)Reply

Gaussian processes are distributions over infinite dimensional objects (i.e functions), whereas multivariate normal distributions are defined over finite dimensional objects or variables. In other words, GPs can be thought of as extension of multivariate normal distributions to infinite dimensionality. appoose (talk)
I do not know the proof, but for 3, integration of a GP results in a GP as well as any other linear operation (summing, differentiation, etc.) Aghez (talk) 20:52, 11 March 2012 (UTC)Reply


I am in the field. The "definition" will be scrubbed and the "alternate definition" will take its place. Done. — Preceding unsigned comment added by Izmirlig (talkcontribs) 15:34, 17 August 2017 (UTC)Reply

edit

I have renamed the link to www.gaussianprocesses.com, to "The Gaussian Processes Research Group at the Australian Centre for Field Robotics". The web site has a very general sounding name, but the home page is currently recruiting students to a lab, rather than explaining the theory of Gaussian processes, as the link description previously claimed to do. I hope this avoids confusion. Mebden (talk) 08:26, 5 March 2009 (UTC)Reply

Alternative definition

edit

Is the   that appears in the second display formula of the section the Imaginary unit? If it is an index, it is not bound to any summation sign. Maybe a real-valued variable? I do not have a reference with me of the formula so I cannot fix it, but I guess that something is missing. I would be grateful if someone does fix it. Junkie.dolphin (talk) 15:49, 3 July 2012 (UTC)Reply

The fact that it is the imaginary unit is confirmed/implied by the equation being part of a sentence starting "Using characteristic functions ....". Melcombe (talk) 16:55, 3 July 2012 (UTC)Reply
Thanks for the clarification, I had somehow failed to notice that detail. Junkie.dolphin (talk) 15:45, 24 July 2012 (UTC)Reply

"Process" is a "distribution"?

edit

The current article says: "A Gaussian process is a statistical distribution Xt, t ∈ T, for which any finite linear combination of samples has a joint Gaussian distribution." I think a "process" is an indexed collection of a random variable while a "distribution" is a function associated with a single random variable. The notation apparently intends to convey the idea of "an indexed collection of distributions", so it would be better to use those words than the singular "a statistical distribution".

Tashiro~enwiki (talk) 18:15, 30 October 2015 (UTC)Reply

Yes, this must be wrong and it's confusing. It means you have to look somewhere else for the actual efinition (outside of Wikipedia). 76.118.180.76 (talk) 03:11, 15 December 2015 (UTC)Reply
Hello. It is a distribution, but over an infinite dimensional space. Which makes it rather different from more common distributions, like e.g. the Gaussian distribution. I think the term "distribution" is more misleading than helpful here, so I have replaced it with plain "statistical model", since the text does then go on to define a GP. I hope that helps. — Preceding unsigned comment added by Winterstein (talkcontribs) 09:09, 11 June 2016 (UTC)Reply
Agreed. This is the first time ever that my opinion of Wikipedia as the definitive source for mathematics has ever taken a big hit. I can appreciate the fact that from the writer's perspective, that the first sentence, referenced just above in this discussion, looks simpler than to say statement as written — Preceding unsigned comment added by 156.40.216.3 (talk) 15:27, 17 August 2017 (UTC)Reply

Lazy learning and Optimization

edit

Winterstein, I noticed the addition on the page relating GPs to lazy learning and them usually being fitted with optimization software. While I appreciate that your experience may have given you this practical insight, I am not sure that this is beneficial to someone trying to understand what is a GP.

Regarding lazy learning, I am not familiar enough with the concept to be able to tell if it applies here, but from the short wikipedia article and your blog I can see how it would apply to a GP used for krigging.

Regarding optimization software, what is really necessary is some matrix algebra, which includes a matrix inversion, to get the posterior mean (if you want a single value estimate) and some more to get the posterior variance if you want that too. While in certain cases (large matrices, etc.) optimization software may be used to find these, it is not something fundamental to the process that one reading this article would need to know about.

Finally, it can only be viewed as a machine learning algorithm when used for prediction (krigging) as you mention, so overall I think your comments would be more at home in the Applications section. It might also be more appropriate to give actual sources than a blog entry, despite how impressive your background is. Thank you. Webdrone (talk) 17:38, 7 June 2016 (UTC)Reply

Actually it would be a great help if you could help fix the very first sentence which reads "[...] a Gaussian process is a statistical distribution, [...]". Webdrone (talk) 17:42, 7 June 2016 (UTC)Reply
Hello Webdrone. Thank you for your thoughtful comments.

I think it is appropriate that the overview section should include notes on the uses of a technique as well as the technical definition -- otherwise it isn't an overview. Also, we'd like the overview to be readable by a range of people. As it was, the overview was not accessible to anyone other than probability theorists. Making it a little more accessible to the machine learning community is a good thing. I think there is more work to be done making this article accessible, both within these communities and to more communities, but I do believe my addition helps.

I also think that the infinite-dimensional distribution-based phrasing is a challenging way to introduce new people to this model (especially for the majority of those who use statistical methods but have not studied e.g. Hilbert spaces). Giving people a couple of ways to get their head around these ideas can only help.

Regarding the mention of "using optimisation software" -- thank you for the observation about matrix algebra being enough. Optimisation software is needed if you use a parameterised kernel (which opens up a wider range of applications beyond "traditional" kriging). I will amend the text now to give both.

Regarding sources for a paragraph that is an aid towards understanding -- academic papers go straight to the technical definitions by their very nature, and I don't know of a GP textbook yet which has an introduction for non-probability-theorists. Blog posts are the "natural" source for this kind of material. If you know of a better source, please do put one in. I don't think it would be appropriate to fully expand this paragraph within this article, as the explanation-for-machine-learning-people would then somewhat swamp the important technical matter.

Thank you again for your comments. I believe we're improving the article considerably through this. --winterstein (talk) 08:49, 11 June 2016 (UTC)Reply

Hello WebDrone. Re. the first sentence -- I agree it could use work, but I can't think of a good re-phrasing. I've replaced the "stats distribution" phrase -- which other people have also complained about (see above) with the less confusing (if also less meaningful) phrase "stats model". — Preceding unsigned comment added by Winterstein (talkcontribs) 09:05, 11 June 2016 (UTC)Reply
Winterstein, I guess you are right, including your comments might make it more accessible to people from different backgrounds. I hope we're improving the article -- it annoys me that it's not well-written, but I'm not sure how to improve it.
As for the infinite-dimensionality explanation, I feel like alternative explanations are always missing something. I come from physics where Hilbert spaces are often used so maybe that's why. Do you think an explanation along the following lines might help a reader visualise the infinite dimensionality setting?
"The function (f(x)=y) can be thought to exist as a single point in a (infinite-dimensional) space where each point x in the function's domain is a separate dimension in this new space. Values of y associated with each x point are coordinates of the function in that x dimension; think of f(x)=y as a very long vector, with an element for each possible x value -- since x is continuous it has infinite possible values and so the vector is infinitely long. We define a covariance kernel which relates an x dimension to another, and use it along with a mean function (m(x) which is usually taken to be 0) to set a multi-variate Gaussian prior over the infinite-dimensional space. We can then consider a set of observations (x, y) to be jointly Gaussian with non-observed points (x*, y*) with mean and covariance given by our prior. Conditioning on the observations, we can create a posterior Gaussian for y*|y, with a new mean and covariance which takes into account given points. Sampling points from this multi-variate Gaussian posterior gives possible functions which satisfy our conditions. Alternatively, just the posterior mean can be used as the MAP estimate of the function, with the new covariance used to find the uncertainty for each dimension (x value). In case of zero noise assumed for observed values (y), the new mean will go through the y values with 0 posterior variance (uncertainty), for the associated x dimensions."
Webdrone (talk) 19:30, 18 June 2016 (UTC)Reply

Covariance function/Correlation function

edit

The listed examples of covariance functions are really correlation functions (With exeption of the white noise one). I.e. they should be multiplied with sigma^2 — Preceding unsigned comment added by 188.113.80.156 (talk) 20:55, 30 May 2017 (UTC)Reply

Merge with Kriging article?

edit

They are the same ? — Preceding unsigned comment added by 143.159.115.78 (talk) 14:01, 6 March 2017 (UTC)Reply

Gaussian process regression and Kriging are very similar(maybe the same except for formalism, but I don't know enough to say so). Gaussian Processes has uses outside of regression, though. — Preceding unsigned comment added by 188.113.80.156 (talk) 21:02, 30 May 2017 (UTC)Reply

Integral of a white noise

edit

About the recent edit by User:Kri: "Dubious|reason=The expected magnitude of a finite difference of a Wiener process divided by the step size approaches infinity as the step size approaches 0, but the expected magnitude of Gaussian noise is finite, so obviously this can't be true as is. So what is it that this (incorrect) statement actually means?"

The expected magnitude of a (usual) Gaussian process is finite, but the white noise is a generalized process; its expected magnitude (at a point) is infinite; only after integration is becomes finite. I'll add a link to white noise article.

Generalized processes are mentioned in White noise § Continuous-time white noise: "Also the covariance   becomes infinite when  ; and the autocorrelation function   must be defined as  , where   is some real constant and   is Dirac's "function"." See also Gaussian free field § The continuum field: "it does not exist as a random height function. Instead, it is a random generalized function". Boris Tsirelson (talk) 06:55, 17 March 2019 (UTC)Reply

Okay, thank you for the clarification. Indeed, it makes more sense if you treat it as a generalized function. —Kri (talk) 15:10, 17 March 2019 (UTC)Reply

Simple cos/sin example is bimodal?

edit

The "simple example" given of

 

suggests that each variable X_t can be the sum of two Gaussian-distributed variables. But this can't be a Gaussian process, can it, because the sum of two Gaussians is not a Gaussian in general? What am I missing? Fyedernoggersnodden (talk) 13:44, 7 May 2021 (UTC)Reply

Sum of Gaussians is another Gaussian (even for dependent RVs). See this Abs xyz (talk) 04:13, 25 October 2022 (UTC)Reply

Collection?

edit

A process is a family, not a set (mathematics)! Collection is to ambiguous.Sigma^2 (talk) 22:37, 28 July 2023 (UTC)Reply

A word of caution regarding the "Wikibook"

edit

The linked Wikibook has some mistakes such as the claim "a stochastic process is a distribution". Another mistake in the Wikibook is for example in the section on operations on Gaussian variables. The user says: "For two correlated signals, the sum can be expressed by a scalar multiplication" which is false. The sum of two non-independent Gaussians is not necessarly Gaussian, it's only Gaussian if they are joint normal.--Tensorproduct (talk) 13:27, 8 September 2023 (UTC)Reply