Talk:Nearest neighbor search
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||
|
Misleading complexities
editThe nearest-neighbors-searching community has an unfortunate habit of stating that the "dimension is constant", probably because then the asymptotic results look way better than they actually are. For example, the all-nearest neighbors algorithm by Vaidya (which is cited here) is actually an awfully slow algorithm to do its task: it becomes unusable already in quite low dimensions such as 4 or 5. This is because there is a hidden factor exponential in dimension d. The fastest way I know to do the all-nearest neighbors search is simply to do repeated searches to a spatial subdivision structure such as BBD-tree or a sliding-midpoint kd-tree. I find the "dimension is constant" habit highly misleading and I'd rather not see this habit in the Wikipedia articles. I therefore suggest to take a more explicit route to including those exponential factors. There are similar problems with the kd-tree article, but I'll comment there separately.
Kaba3 (talk) 22:11, 6 August 2010 (UTC)
This seems like a great point. I have seen similiar issues in other algorithmic topics. The most informative solution would seem to be to express the asymptotic speeds with both dimension and set size parameters. Can somebody go through the references and fill in the missing d's?
More on all-nearest neighbors
edit"As a simple example: when we find the distance from point X to point Y, that also tells us the distance from point Y to point X, so the same calculation can be reused in two different queries.". This is not true: the is-a-nearest-neighbor-of relation is not symmetric, as can be seen from a small example already in 1D: *--*-* (ascii art, stars are points).
Kaba3 (talk) 10:18, 7 August 2010 (UTC)
- The information can be reused -- you can use it as an upper bound in your next NN search. User A1 (talk) 16:37, 7 August 2010 (UTC)
- Rewording may be in order to prevent misunderstanding. It true that point A's nearest neighbor, point B, might itself have a nearest neighbor that is distinct from point A. However, both point A and point B must consider (or rule out) distance when finding their nearest neighbors; accordingly, it's possible to reduce the total number of distance computations by half by exploiting the metric's symmetry. If the distance computation is expensive compared to other work (floating-point comparison, iteration, etc.), which it usually is, this will cut the running time in half. On the other hand, the mentioned algorithm is considerably better than this. Rriegs (talk) 13:31, 14 March 2013 (UTC)
Dynamic nearest-neighbor tracking
editI'm interested in tracking nearest neighbors as points move. I don't know much about this topic, but it seems like there should be an O(N)f(d) algorithm for tracking nearest-neighbors of all points when there is an lower bound on the seperation between points and points move a small amount relative to this lower bound at each time-step by looking at the nearest neighbors of the old nearest neighbors. (memory would scale like O(N) for this method) This also avoids the tree-backtracking issues in the spatial-partitioning method. Can anybody supply a reference in the main body of the page?
Please consider incorporating material from the above draft submission into this article. Drafts are eligible for deletion after 6 months of inactivity. ~Kvng (talk) 16:17, 19 March 2020 (UTC)
Specialisation..
editLike most algorithms, NNS can often be optimised in real-world cases, where specialisation is appropriate due to the restricted space / dimensionality of the sample.
For instance, distance measuring can be done with an abs(a1-a0) when potential neighbours vary only on a single axis. (This is not unusual when using integer coordinates in any cartesian space and neighbours are always orthogonal)
Mentioning BallTree
editPython's sklearn uses either k-d trees or ball trees for knn, with no mention of r-trees - [1]
And yet ball-tree is buried as a link in the page's bottom, while r-tree is mentioned in the preamble - perhaps add a link to ball-tree's wiki page as well?
Spell checking?
editSpell checking is given as an application of nearest neighbor search, but that doesn't make sense:
- Spell checkers need to handle insertion and deletion of letters, not just in-place modification. I don't see how nearest neighbor would be useful for insertion and deletion.
- The usual distance metrics don't make sense for letters in a word. Just because e and f are adjacent in alphabetical order doesn't mean that they're commonly exchanged. I suppose you could encode letters in two dimensions by their position on the keyboard, but keyboard adjacency errors aren't that common. Similarly, you could encode letters by their sounds in multiple dimensions.
So it seems unlikely to me that spelling correction is a real application of nearest neighbor search. I'm deleting it. --Macrakis (talk) 22:30, 22 March 2022 (UTC)
- See the lead where it says "Most commonly M is a metric space and dissimilarity is expressed as a distance metric, ... However, the dissimilarity function can be arbitrary. ..." So if you define an apprropriate dissimilarity function that accounts for insertions, deletions, transpositions, and substitutions, you can put spell checking into the form of NN search. Dicklyon (talk) 00:08, 23 March 2022 (UTC)
- Sure, dissimilarity metrics can in principle be arbitrary, and so can NN algorithms. But the article is largely about vector distance metrics and efficient algorithms (not O(n^2)). Is there any published literature (WP:RS) which uses NN algorithms for spelling correction? --Macrakis (talk) 13:38, 23 March 2022 (UTC)
- More generally, it would be nice to have at least one source for each application. --Macrakis (talk) 13:55, 23 March 2022 (UTC)
- I agree, sources for the applications would be nice. But spelling correction is certainly approached as an NN problem, just not with the Euclidean vector methods. Here are a bunch of sources on that. Dicklyon (talk) 03:57, 24 March 2022 (UTC)
- More generally, it would be nice to have at least one source for each application. --Macrakis (talk) 13:55, 23 March 2022 (UTC)
Reverse nearest-neighbor search: finding the smallest shape encompassing all points for which a point is the nearest neighbor
editThis article talks about finding the nearest point to a given point, in a set of points.
What I'm wondering is, given a set of points, how can we find or approximate a shape for each point in the set, that includes points not in the set for which the aforementioned point is the nearest neighbor?
For example, suppose I'm drawing a color-coded paper map to distribute to paramedics so they can find the closest hospital. They'd look up where they are on this map and then the color of the surrounding shape indicates the nearest hospital. Moonman239 (talk) 18:24, 4 January 2023 (UTC)