Wikipedia:Reference desk/Archives/Computing/2022 June 20

Computing desk
< June 19 << May | June | Jul >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


June 20

edit

Is there a find in page-like plugin or app for numbers?

edit

Find in page is very useful but sometimes I want to see all numbers between x and y or over 9000 or under 59.409, that kind of thing. It'd be nice if it'd let you tell it what to show: numbers that'd pass your chosen inequality if commas are decimal points and periods aren't, numbers that'd pass if it's the other way around, numbers that'd pass if 500 000 means half million instead of 500 and zero. Click all that apply.

How hard would this be to write in Python or C? Would it be easier to copy and paste the webpage so your program can search plain text instead of HTML? I don't know anything that can do this on pasted text without computer language knowledge either. Maybe one of them things I never use like Word or Emacs? Sagittarian Milky Way (talk) 22:51, 20 June 2022 (UTC)[reply]

Regular expression page-search browser extensions do exist. Here's "find+" for Firefox, although it says it's for Chrome. I guess the code is basically the same, and it looks like the same extension exists on Google's site for browser extensions. You then have the fun of figuring out how to write regexes for inequalities. I admit that regex is for strings, not for numbers, and the whole idea is "smelly". Python can parse web pages, which might have potential: you could point a python script at a URL and it could probably find all the numbers on the page and understand them as numbers and make a list of the ones that pass an inequality. (And if you're happy to just copy and paste the page, you can save it as a text file and don't even need the parser.) So far so good, but the problem is how it would then render the page back to you with the results highlighted. It could save a local copy of the page, I guess, edited to have highlighting, and send it to your browser to open. Dynamic web pages would probably screw this up. (Or it could do something similar with the text file, like using triple asterisks ***59.408 for highlighting.) Javascript is probably the sensible way, since it can run in your browser and rewrite the page to highlight things. Yet the guts of that find+ browser extension are not simple-looking to me and I wouldn't know how to adapt it for numbers. (Is "sendResponse" where the actual regex takes place? What even is that? And etc.) There's a bookmarklet which does a case-sensitive search-on-page, which is nice and short and comprehensible, except that it has a similar problem to the browser extension: it's relying on a high-level method, in this case "indexOf", to match a string, and there's no corresponding "match numbers passing a test" method that I know of. There's probably some syntax in Javascript for a lambda function slotted into a loop ... and what would the loop loop through, the words in the text, separated by spaces? There goes the option to match "500 000". It might kinda work though (apart from unusual cases like that). I think the way to go would be: separate the text into words in an array, use findIndex() to find the words that are numeric and that pass the inequality test, then, um, keep track of their indexes and lengths in the text (vague) and use that information to do the highlighting.  Card Zero  (talk) 13:12, 21 June 2022 (UTC)[reply]
So I could still view all the matching numbers quicker with regex, which is a lot quicker to learn than a language too. This should mark (at least some of the numerals of) numbers over 9000 without too many false positives/negatives right? If there aren't any exact 9,000's, large negative numbers, or 1e4, 1x10⁴, ten thousand, Zehn­tausend, 0x2710, MMMMMMMMMM, 10.000,0 or 10 000-style numbers on the page. In practice I'll often only need to catch one consistent format per webpage so it wouldn't have to be perfect. ([1-9]\d,\d{3})|([1-9]\d{4})|(9\d{3})|(9,\d{3}) Sagittarian Milky Way (talk) 06:02, 22 June 2022 (UTC)[reply]