Wikipedia:Featured article candidates/Binary search algorithm/archive1

The following is an archived discussion of a featured article nomination. Please do not modify it. Subsequent comments should be made on the article's talk page or in Wikipedia talk:Featured article candidates. No further edits should be made to this page.

The article was promoted by Laser brain via FACBot (talk) 12:13, 14 August 2018 [1].


Nominator(s): Esquivalience (talk) 20:53, 11 June 2018 (UTC)[reply]

This article is about one of the most ubiquitous algorithms in computing. Binary search is usually one of the first algorithms taught to computer science students. The premise is quite simple: given a sorted list of numbers, binary search eliminates halves of the list in which the number you are looking for cannot lie until it finds the number. However, binary search has lots of subtleties. Surprisingly, when a famous programmer asked his students to implement binary search, 90 percent could not provide a correct solution. I have wrote this article to not only cover those subtleties, but also compare binary search to other search algorithms, placing the algorithm in context. I believe that this article is FA quality as I have strived to place every relevant detail that I could find on binary search, which is surprisingly a lot of details and subtleties that the average textbook chapter on search algorithms neglects to cover. Esquivalience (talk) 20:53, 11 June 2018 (UTC)[reply]

  • Comment. Nice work! Interesting article, I especially like the diagrams & the comparison with non-sorted array datastructures. My own preference would be to use very tiny sample Python programs rather than the mathier description of what the program does, but I respect that if it was done that way, there might be a problem with random students replacing it with their own code.
There may be copyright issues if code is copied verbatim from a book (mostly for the CS students needing to plagiarize binary search code), and if I write binary search code based on the information in the article in Python for example, there may be verifiability concerns. In addition, the mathematical description is more abstract and eliminates language-specific implementation details. However, I will consider adding a pseudocode version like in Euclidean algorithm#Implementations Esquivalience (talk) 21:51, 13 June 2018 (UTC)[reply]
as binary search trees can effectively be structured in filesystems.

I think "efficiently" might be more clear here?

Good suggestion; replaced. Esquivalience (talk) 21:51, 13 June 2018 (UTC)[reply]
If the array must first be sorted, that cost must be amortized over any searches.

Kinda sorta? Phrasing is a little weird too, with "must" sounding like accountants are being ordered around. If we're talking about a hypothetical system, The opening sentence of the section already notes that constantly sorting the array is kinda inefficient; future binary searches presumably would still have to sort the array, because they aren't sure if a previous check happened and already sorted it, so it's not like the cost gets to be amortized through these. Could have an "isDirty" type function that would remember if the array has been modified since it was last sorted of course, but that's getting too much into the weeds I think.

I agree that it's kind of obvious that the cost of sorting needs to be considered, and given that the target audience really is new computer science students, communicating such an important aspect with concepts like amortized analysis is confusing, so I decided to only mention that the array needs to be sorted and leave it at that. Esquivalience (talk) 21:51, 13 June 2018 (UTC)[reply]
Exponential search

Can the diagram (File:Exponential search.svg) be modified to clearly be unbounded? Seems misleading at a glance... if nothing else, have the right edge with a "...".

I still have the TeX file, so I'll add the ..., however it can also be applied to bounded arrays. Esquivalience (talk) 21:51, 13 June 2018 (UTC)[reply]
Fibonacci search

This section is not real clear at the moment. There's something to be said for not taking too much space, and letting people who are curious about more click over to the article, but maybe give it a once-over if there's a way to still be precise but also accessible? It's great there's a diagram, but maybe call out the green highlighted section a little more in File:Fibonacci_search.png to show that section "won". If nothing else, I would recommend highlighting in the first or second sentence that rather than a specific answer (like in binary search), Fibonacci merely returns a range where the answer might be.

I think it's better to remove it, because it is only vaguely similar to binary search. It's more of an optimization algorithm rather than a "search" algorithm, and an obscure one at that. Esquivalience (talk) 21:51, 13 June 2018 (UTC)[reply]
The noisy binary search problem can be considered as a case of the Rényi-Ulam game,[49] which Alfréd Rényi introduced in 1961.[50]

Why is the date it was created relevant? I'd say a description would be more helpful. "a case of the Rényi-Ulam game, a variant of twenty questions where the answers can be wrong," say.

I think it is better to put it in a footnote, as it does not add much to the main description, and I have adopted your proposed wording. Esquivalience (talk) 21:51, 13 June 2018 (UTC)[reply]
In 1946, John Mauchly made the first mention of binary search as part of the Moore School Lectures, the first ever set of lectures regarding any computer-related topic.

There is no way this is possibly true as phrased - there are tons of computer-related topics, like math, that people have been giving lectures on since the ancient Babylonians. Not sure what TAOCP was saying here, but judging by the Wikipedia article, maybe something like "the first and foundational college course in computers?"

Reworded it a bit, but I decided to use "seminal" instead of "first" as there is no way of knowing whether it is the first course on computing (some can say that the study of logic and abstract machines counts as a computing course, and those fields were studied decades before). Esquivalience (talk) 22:19, 13 June 2018 (UTC)[reply]
ninety percent failed to provide a correct solution after several hours of working on it,[57] and another study published in 1988 shows that accurate code for it is only found in five out of twenty textbooks

I think the article is taking Bloch & Bentley at their word a bit in a way that will confuse casual readers. Binary search, while not quite fizz buzz, is a sample easy problem generally that takes 5 minutes to solve, and surely Bentley's complaints were on hyper-specific issues. I would add in "rare edge case" or the like in discussing the overflow error in Programming Pearls and Java; it's arguably not even really an "interesting" flaw, because arrays are already size-capped in most languages to the size of the index (unsigned int, say), so this error was merely not taking advantage of the maximum amount of space available. (And heck, even with BigInteger array indices, declaring too large an array will already cause an out of memory exception unless you are on a Turing Machine with an actually-infinite tape.) SnowFire (talk) 20:07, 13 June 2018 (UTC)[reply]

I added the wording about many of the errors being rare edge cases for Bentley's assignment, but textbooks should be expected to provide correct code that works on edge cases, so I left the textbook sentence as is. Esquivalience (talk) 21:51, 13 June 2018 (UTC)[reply]
@SnowFire: Thanks for the suggestions, and I hope my changes have addressed your concerns. Esquivalience (talk) 21:51, 13 June 2018 (UTC)[reply]

Images are appropriately licensed. Nikkimaria (talk) 19:24, 16 June 2018 (UTC)[reply]

Support Good to see a CompSci article at FAC. Some interesting tidbits in the article. Some comments:

  • Alexandrescu (2010) and Leiss (2007) are not referenced in the article. Suggest removing them.
  • Suggest removing the wiki-links to Childs, Landahl & Parrilo 2007 and Høyer, Neerbek & Shi 2002 (which just points to the reference) as we already have footnotes.
  • No reference on the second last paragraph of "Implementation issues"
Hawkeye7 (discuss) 01:51, 23 June 2018 (UTC)[reply]
Added the ref and removed the redundant citations, thanks for the suggestions. Esquivalience (talk) 20:14, 23 June 2018 (UTC)[reply]


Comments Edwininlondon

edit

Great to finally see a computer science article here. Some comments on the lead to start with:

  • "and the search continues on the remaining half until the target value is found" --> I think the first paragraph of the lead should explain the next step better. This is too vague. Perhaps something along the lines of "and the search continues on the remaining half, again taking the middle element for comparison, repeating this until the target value is found"
  • the O is Big O notation, and log is the logarithm --> An example here would be good. "For an array of 100 elements it takes at most .."
  • I don't think it would be accurate to say that for some number of elements, it takes some amount of time, because Big O notation is related to how fast the time complexity grows. For example, a O(log n) algorithm can take an eternity if, say, the algorithm must do a lot of computation before processing the data. The log n really states that the time complexity grows quite slowly. Esquivalience (talk) 19:34, 4 July 2018 (UTC)[reply]
I think that's exactly the kind of info the lay person needs. So if you feel an example is not appropriate, then mention this. Edwininlondon (talk) 08:01, 5 July 2018 (UTC)[reply]
  • this paragraph about complexity would also be the best place to contrast the algorithm against the simplest alternative, serial search, which I think is needed for the lay-person.
  • Although the idea is simple --> Bit of a random mini-paragraph. Conceptually this sentence fits better at the end of the first paragraph
  • Done.
  • The lead says nothing about unsorted arrays. I think it should. Briefly.

More later. Edwininlondon (talk) 09:31, 1 July 2018 (UTC)[reply]

@Edwininlondon: Thanks for the feedback. Esquivalience (talk) 19:34, 4 July 2018 (UTC)[reply]
  • Procedure for finding the rightmost element --> I feel this section is a bit overkill. It makes it look like a textbook. I'm not sure such drive for completion is necessary in an encyclopedia. I'd leave it out.
  • This problem is solved by binary search --> What exactly is "This problem" referring to? Maybe a little rewrite could help me?
  • The problem is already explained in the paragraph, so I just said that "By dividing the array in half, binary search ensures that the size of both subarrays are as similar as possible." Esquivalience (talk) 02:19, 11 July 2018 (UTC)[reply]
  • Fractional cascading can be used to --> perhaps add a short explanation of fractional cascading. Oh I now see that comes later. That's a bit unfortunate. Do we need it even here? Is it not sufficient to have it only under the Variations section?
  • There seems to be a weird character between ref 17 and 18
  • Enormous n --> would be nice to give some indication of size ... billions?
  • It depends on the time it takes to do arithmetic operations on the computer, which varies from system to system, so it is not really possible to give an exact or even approximate amount. A more detailed analysis is given in footnote (c), but it is still very specific and is only used to illustrate why it is that way. Esquivalience (talk) 02:19, 11 July 2018 (UTC)[reply]
  • In addition, all operations possible on a sorted array can be performed—such as finding the smallest and largest key --> I think this is describing binary search, but then the word "key" threw me off and now I'm not sure
  • key refers to an element, but I replaced "key" with element and clarified it a little.
  • This applies even to balanced binary search trees, binary search trees that balance their own nodes—as they rarely produce optimally-balanced trees—but to a lesser extent --> not so easy to read. Maybe see if a rewrite could help
  • although the array needs to be sorted beforehand --> I think it is necessary for this article to describe the cost of sorting. The reader will wonder what typically happens in cases where the array is unsorted.
  • Different sorting algorithms as well as different computers take different amounts of time to sort an array, so it is not possible to say that sorting has the same cost as n binary searches, but what can be given is that it takes a multiple of n log n comparisons, so I added that.
  • Binary search also enables efficient approximate matches and other operations --> we already knew this from earlier section, but what is missing here is something about linear search with respect to these matches and operations
  • Set membership --> it's a bit out of place. Maybe introduce this kind of problem earlier so that this problem can be discussed within the hashing, trees and linear search sections as well
  • I introduced set membership in the first paragraph of the section to put it in context.
  • A bit array is the simplest --> perhaps a short description of what this is would be useful
  • which may improve the algorithm's performance on some systems. --> why?
  • (L + R) / 2 --> this one and a few others probably need a math template to prevent wrapping
  • Done
  • any reason why multiplicative binary search is only mentioned in the See also section?
  • Multiplicative binary search seems to be quite niche and I don't think it's used often compared to the other variations. The Art of Computer Programming does not cover it even though it is generally considered to be a "comprehensive" resource. Looking at the article, it's basically a special method of storing a binary search tree, so if it belongs it belongs in that article instead. Esquivalience (talk) 02:19, 11 July 2018 (UTC)[reply]

Edwininlondon (talk) 18:22, 8 July 2018 (UTC)[reply]

The references need a bit of cleaning up as there are some inconsistencies:

  • ISBN numbers are sometimes ISBN 10 and sometimes 13. Also need hyphens. I'll do this once I have time.
  • Done
  • Some Articles Are in Camel Case, but some others are not
  • Quite a few have no page number. Knuth especially
  • For the Knuth sources, I only have the eBook version which doesn't contain page numbers for some reason. But that reason is probably because Knuth's The Art of Computer Programming has a famed error bounty program. If a reader finds an error in one of his books, including TAOCP, then Knuth sends a check to the reader who made an error, and which is usually framed as memorabilia for obvious reasons. Even though Volume 3 was published 20 years ago, there have already been at least 26 republished versions with error corrections, which messes up the page numbering. For the Sedgewick and Wayne versions, the authors have provided a condensed, summarized version of the book for free online, so chapter and section numbers are better for guiding readers to the right page. I'll try to add page numbers for the other sources shortly.Done. Ref 52 has no page number given by the publisher however.
  • the link to PDF for 55 seems wrong
  • Don't know how it got there, but I removed the link. It's an old article, but not in the public domain, so there is no preprint or copyright-free version.

Edwininlondon (talk) 21:08, 10 July 2018 (UTC)[reply]

Coord note

edit

@Edwininlondon and SnowFire: have you had a chance to review responses to your comments? Cheers, Ian Rose (talk) 10:29, 12 August 2018 (UTC)[reply]

  • Sorry, missed this. Having reviewed the responses just now, I am happy to support this. Edwininlondon (talk) 12:06, 12 August 2018 (UTC)[reply]
  • Sure, support. I had honestly forgotten all about this. My slant would still be slightly different on a few of the topics, but that's just personal preference; I think that the most important thing, being a comprehensive and accurate overview of the topic, is met and met strongly. SnowFire (talk) 04:03, 13 August 2018 (UTC)[reply]
The above discussion is preserved as an archive. Please do not modify it. No further edits should be made to this page.