MediaWiki talk:Captcha-addurl-whitelist
|
|
more links
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Hi, can we add the BBC, Guardian and independent?:
- ϢereSpielChequers 17:58, 21 March 2018 (UTC)
- At least 1 day hold - not sure if this should have wider review (perhaps at WP:RSN)? — xaosflux Talk 12:22, 26 March 2018 (UTC)
- On hold @WereSpielChequers: I think this needs a wider review as it has a broad impact. Please bring post at least a discussion on WP:RS for the specific additions you would like. If there is no objection after a week (or if consensus forms for support) please include that discussion link here and reactivate the edit request. — xaosflux Talk 14:13, 28 March 2018 (UTC)
- We have the New York Times, which is unfortunately paywalled, and it seems to me that dozens of other sources could safely be listed. In the UK, the BBC and Guardian are obviously of a similar calibre. The Independent is not impartial but I think it still qualifies along with the Times (of London), Financial Times, Telegraph and some of the tabloids. I expect most other countries with free speech could provide a similar list. Certes (talk) 14:39, 28 March 2018 (UTC)
- @Certes: this page has very little watchers, I suggest you bring this up at WP:RSN or another large forum. Please include specific URL/domain names in discussions for review. This certainly CAN be expanded easily from a technical level. — xaosflux Talk 14:46, 28 March 2018 (UTC)
- Thanks. I certainly didn't know about this page until you kindly pointed me at a link to it. My reply was more aimed at WereSpielChequers or anyone else bringing the topic up at WP:RSN. It's all too easy for us to take easy editing for granted and to overlook the obstacles which (perhaps for good reasons) lie in the way of newcomers. Certes (talk) 15:35, 28 March 2018 (UTC)
- On hold @WereSpielChequers: I think this needs a wider review as it has a broad impact. Please bring post at least a discussion on WP:RS for the specific additions you would like. If there is no objection after a week (or if consensus forms for support) please include that discussion link here and reactivate the edit request. — xaosflux Talk 14:13, 28 March 2018 (UTC)
- @Certes: if you brought this up at a larger venue like RSN and it was OK, feel free to reactivate the edit request and add the discussion link below. — xaosflux Talk 00:29, 19 April 2018 (UTC)
From the Wikipedia Library
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Hi,
Sam Walton provided this list of websites from the Wikipedia Library partners. Clayoquot (talk | contribs) 23:13, 29 March 2018 (UTC)
Extended content
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
- @Clayoquot: I posted at Wikipedia:Reliable_sources/Noticeboard#White_listing_sites_from_WP:TWL for a review, if no issues in a week please activate the edit request tag at the top of this section. Thanks, — xaosflux Talk 01:51, 30 March 2018 (UTC)
- {{on hold}} pending RSN or time. — xaosflux Talk 14:51, 30 March 2018 (UTC)
- Thanks. The relevant discussion is now archived and there were no objections. Cheers, Clayoquot (talk | contribs) 22:14, 18 April 2018 (UTC)
- Doing... — xaosflux Talk 23:09, 18 April 2018 (UTC)
- Done @Clayoquot: these have been added, let me know if you see any trouble. — xaosflux Talk 23:14, 18 April 2018 (UTC)
- Excellent! I'm glad I mentioned it, which I think is what led to all this activity. Thanks for getting some sensible updates through, all. :) Quiddity (WMF) (talk) 23:41, 18 April 2018 (UTC)
- Thanks to @Samwalton9 (WMF): as well. — xaosflux Talk 00:27, 19 April 2018 (UTC)
- No problem! There are definitely many more sites that could be added here, but that's a good start :) Samwalton9 (WMF) (talk) 09:50, 19 April 2018 (UTC)
- Thanks to @Samwalton9 (WMF): as well. — xaosflux Talk 00:27, 19 April 2018 (UTC)
- Excellent! I'm glad I mentioned it, which I think is what led to all this activity. Thanks for getting some sensible updates through, all. :) Quiddity (WMF) (talk) 23:41, 18 April 2018 (UTC)
- Done @Clayoquot: these have been added, let me know if you see any trouble. — xaosflux Talk 23:14, 18 April 2018 (UTC)
Proposal to add major newspapers etc.
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
A short RSN discussion showed some support for the principle of adding major newspapers to this list, and I think we can extend that to some other media such as the BBC. Should we produce a full list for approval?
Please can non-UK editors add respected journals from their own countries? The Washington Post, The Globe and Mail and The Hindu have been suggested. I've left off tabloids such as The Sun (United Kingdom) and the Daily Mirror to maximise the chance of approval. I hope we can leave the initial www off the URL pattern, to allow variants such as news.bbc.co.uk. The Times has a paywall; is it worth including such sources?
Someone recently posted a link to a useful article with a Venn diagram classifying news sources by political bias and level of detail, but I've lost it. Please can someone point us at that again? Thanks, Certes (talk) 10:44, 19 April 2018 (UTC)
- On hold activated an edit request too see if any patrolling admins want to comment before processing. — xaosflux Talk 12:03, 19 April 2018 (UTC)
- Would it be better to start this discussion somewhere else, returning if and when it has enough detail and support to qualify as an edit request? If so, is WP:RSN the right forum? I don't think anyone doubts that these are reliable sources; the question is whether they should be added to this whitelist. Certes (talk) 12:19, 19 April 2018 (UTC)
- @Certes: RSN is the best forum I can think of for these, you can move it there, or just link in to this from there with a summary. Basically if domains are representative of reliable sources, are useful for new users, and not being abused (such as for spam, advertising, selling subscriptions, etc) they are OK to be on this list as far as I'm concerned. — xaosflux Talk 12:27, 19 April 2018 (UTC)
- A notice was posted at WP:RSN on 19 April asking that people come here to comment. EdJohnston (talk) 14:39, 22 April 2018 (UTC)
- @Certes: RSN is the best forum I can think of for these, you can move it there, or just link in to this from there with a summary. Basically if domains are representative of reliable sources, are useful for new users, and not being abused (such as for spam, advertising, selling subscriptions, etc) they are OK to be on this list as far as I'm concerned. — xaosflux Talk 12:27, 19 April 2018 (UTC)
- Would it be better to start this discussion somewhere else, returning if and when it has enough detail and support to qualify as an edit request? If so, is WP:RSN the right forum? I don't think anyone doubts that these are reliable sources; the question is whether they should be added to this whitelist. Certes (talk) 12:19, 19 April 2018 (UTC)
- FWIW, I fully support this. Ed [talk] [majestic titan] 19:57, 22 April 2018 (UTC)
- Doing... — xaosflux Talk 20:08, 22 April 2018 (UTC)
- Done — xaosflux Talk 20:12, 22 April 2018 (UTC)
- Thank you! I still hope editors from beyond the UK will contribute similar lists for their countries. Certes (talk) 22:48, 22 April 2018 (UTC)
- Done — xaosflux Talk 20:12, 22 April 2018 (UTC)
What exactly is this?
editI wonder what exactly is this? Is this just a list of urls that don't require a CAPTCHA for unregistered users? Therefore should we add all low risks but popular URLs? --Emir of Wikipedia (talk) 20:49, 22 April 2018 (UTC) (please mention me on reply; thanks!)
- @Emir of Wikipedia: yes, normally unregistered and new editors have to solve a captcha to add links; these specific domains are exempt from that. There is some performance to consider, so keeping this to "popular" as in links that are actually being appropriately added to pages is a factor. In general this means the links should be for "reliable sources". It is important that the exemptions are not useful for disruptive use as well. We have only recently begun using this and this page is not well watched - I suggest discussing additions at WP:RSN first. — xaosflux Talk 21:39, 22 April 2018 (UTC)
- Thanks for the information. I have seen the discussions at RSN and came here for clarification. --Emir of Wikipedia (talk) 20:01, 23 April 2018 (UTC)
Please add IPCC and National Academies domains
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Could you please add:
- ipcc.ch (Intergovernmental Panel on Climate Change)
- nap.edu (National Academies of Sciences, Engineering, and Medicine)
? Clayoquot (talk | contribs) 22:52, 22 February 2020 (UTC)
- Not done (not yet) following the directions, please link to where this was
discuss additions publicly such as at the Wikipedia:Reliable sources/Noticeboard
. — xaosflux Talk 14:02, 23 February 2020 (UTC)- Xaosflux, it's pretty inconceivable that a discussion at RSN would yield a result other than "yes, those are reliable sources". Would you consider pulling an IAR to add these two without going through a community process? Best, Clayoquot (talk | contribs) 17:57, 23 February 2020 (UTC)
- @Clayoquot: I'll leave this open for at least a day in case anyone else wants to skip the discuss (which on these is usually more of a 'no objections, go ahead') type. I've never heard of ipcc.ch, (it appears to only have 5 article usages). nap.edu only appears to have 4 article usages as well - so at the very least these don't seem to be popular sources. — xaosflux Talk 19:00, 23 February 2020 (UTC)
- Xaosflux, For www.nap.edu, I'm seeing usage in 957 pages,[1] and www.ipcc.ch appears to be referenced in 736 pages.[2] Clayoquot (talk | contribs) 17:42, 24 February 2020 (UTC)
- Looks like I had my wildcard wrong, more popular than my first count indeed :) — xaosflux Talk 18:14, 24 February 2020 (UTC)
- Xaosflux, We've all done that :) Clayoquot (talk | contribs) 02:54, 25 February 2020 (UTC)
- @Clayoquot: please post at WP:RSN if you are ignored for a week, reactivate and I'll add here. — xaosflux Talk 15:26, 27 February 2020 (UTC)
- Posted there. Thanks. Clayoquot (talk | contribs) 18:20, 27 February 2020 (UTC)
- Done. There were no objections: https://en.wiki.x.io/wiki/Wikipedia:Reliable_sources/Noticeboard/Archive_286#CAPTCHA_exemption_for_reliable_domains Clayoquot (talk | contribs) 22:00, 7 March 2020 (UTC)
- Could someone make this change please? @Xaosflux:? Clayoquot (talk | contribs) 17:30, 11 March 2020 (UTC)
- Done. There were no objections: https://en.wiki.x.io/wiki/Wikipedia:Reliable_sources/Noticeboard/Archive_286#CAPTCHA_exemption_for_reliable_domains Clayoquot (talk | contribs) 22:00, 7 March 2020 (UTC)
- Posted there. Thanks. Clayoquot (talk | contribs) 18:20, 27 February 2020 (UTC)
- @Clayoquot: please post at WP:RSN if you are ignored for a week, reactivate and I'll add here. — xaosflux Talk 15:26, 27 February 2020 (UTC)
- Xaosflux, We've all done that :) Clayoquot (talk | contribs) 02:54, 25 February 2020 (UTC)
- Looks like I had my wildcard wrong, more popular than my first count indeed :) — xaosflux Talk 18:14, 24 February 2020 (UTC)
- Xaosflux, For www.nap.edu, I'm seeing usage in 957 pages,[1] and www.ipcc.ch appears to be referenced in 736 pages.[2] Clayoquot (talk | contribs) 17:42, 24 February 2020 (UTC)
- @Clayoquot: I'll leave this open for at least a day in case anyone else wants to skip the discuss (which on these is usually more of a 'no objections, go ahead') type. I've never heard of ipcc.ch, (it appears to only have 5 article usages). nap.edu only appears to have 4 article usages as well - so at the very least these don't seem to be popular sources. — xaosflux Talk 19:00, 23 February 2020 (UTC)
- Xaosflux, it's pretty inconceivable that a discussion at RSN would yield a result other than "yes, those are reliable sources". Would you consider pulling an IAR to add these two without going through a community process? Best, Clayoquot (talk | contribs) 17:57, 23 February 2020 (UTC)
- Done @Clayoquot: as there were no objections, I've added. — xaosflux Talk 17:38, 11 March 2020 (UTC)
RfC on adding generally reliable sources to the CAPTCHA whitelist
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
There is a request for comment on adding generally reliable sources from the perennial sources list to the CAPTCHA whitelist, which allows new and anonymous users to cite them in articles without needing to solve a CAPTCHA. If you are interested, please participate at WP:RSN § Adding generally reliable sources to the CAPTCHA whitelist. — Newslinger talk 19:42, 7 March 2020 (UTC)
- The discussion has passed with "near-unanimous" consensus in favour of the proposal and should be implemented. For future reference, it is now archived at Wikipedia:Reliable_sources/Noticeboard/Archive_291#Adding_generally_reliable_sources_to_the_CAPTCHA_whitelist. 107.190.33.254 (talk) 17:01, 7 May 2020 (UTC)
- Would someone please regex this up in to a ready to go addition, then activate the edit request here? — xaosflux Talk 00:57, 8 May 2020 (UTC)
@Newslinger and Xaosflux: Not sure why this discussion died out, but on WP:RSNP, this did the trick:
console.log([...$('.perennial-sources .s-gr a[href*="Linksearch&target=https://"]')].map(a => '\\b' + a.href.match(/\*\.(.*)/)[1].replaceAll(".", "\\.")).join("\n"))
RSNP list
|
---|
\babcnews\.com \babcnews\.go\.com \btheage\.com\.au \bafp\.com \baljazeera\.com \baljazeera\.net \bamnesty\.org \badl\.org \baon\.com \barstechnica\.com \barstechnica\.co\.uk \bap\.org \bapnews\.com \btheatlantic\.com \btheaustralian\.com\.au \bavclub\.com \bavn\.com \baxios\.com \bbbc\.co\.uk \bbbc\.com \bbehindthevoiceactors\.com \bbellingcat\.com \bbloomberg\.com \bbusinessweek\.com \bburkespeerage\.com \bbuzzfeednews\.com \bbuzzfeed\.com \bcsmonitor\.com \bclimatefeedback\.org \bcnet\.com \bcnn\.com \bcodastory\.com \bcommonsensemedia\.org \btheconversation\.com \btelegraph\.co\.uk \bdeadline\.com \bdeadlinehollywooddaily\.com \bdebretts\.com \bdeseretnews\.com \bdw\.com/en \bdigitalspy\.co\.uk \bdigitalspy\.com \bthediplomat\.com \beconomist\.com \biranicaonline\.org \bengadget\.com \bew\.com \bft\.com \bforbes\.com \bfoxnews\.com \bfoxbusiness\.com \bgamedeveloper\.com \bgamasutra\.com \bgameinformer\.com \bwyborcza\.pl \bgeonames\.usgs\.gov \bgizmodo\.com \btheglobeandmail\.com \btheguardian\.com \bguardian\.co\.uk \btheguardian\.co\.uk \bhaaretz\.com \bhaaretz\.co\.il \bthehill\.com \bthehindu\.com \bhollywoodreporter\.com \bhuffpost\.com \bhuffingtonpost\.com \bhuffingtonpost\.co\.uk \bhuffingtonpost\.ca \bhuffingtonpost\.com\.au \bhuffpostbrasil\.com \bhuffingtonpost\.de \bhuffingtonpost\.es \bhuffingtonpost\.fr \bhuffingtonpost\.gr \bhuffingtonpost\.in \bhuffingtonpost\.it \bhuffingtonpost\.jp \bhuffingtonpost\.kr \bhuffpostmaghreb\.com \bhuffingtonpost\.com\.mx \bidolator\.com \bign\.com \bindependent\.co\.uk \bindianexpress\.com \binsider\.com \bthisisinsider\.com \bipsnews\.net \bipsnoticias\.net \bipscuba\.net \btheintercept\.com \bifcncodeofprinciples\.poynter\.org \bjacobinmag\.com \bcatalyst-journal\.com \bjamanetwork\.com \bthejc\.com \bkirkusreviews\.com \bkommersant\.ru \bkommersant\.com \bkommersant\.uk \blatimes\.com \bmg\.co\.za \bthemarysue\.com \bmetacritic\.com \bgamerankings\.com \bmonde-diplomatique\.fr \bmondediplo\.com \bmotherjones\.com \bmsnbc\.com \bthenation\.com \bnationalgeographic\.com \bnbcnews\.com \bnewrepublic\.com \bnymag\.com \bvulture\.com \bthecut\.com \bgrubstreet\.com \bnydailynews\.com \bnytimes\.com \bnewyorker\.com \bnzherald\.co\.nz \bnewslaundry\.com \bnewsweek\.com \bnpr\.org \bpeople\.com \bpewresearch\.org \bpeople-press\.org \bjournalism\.org \bpewsocialtrends\.org \bpewforum\.org \bpewinternet\.org \bpewhispanic\.org \bpewglobal\.org \bpinknews\.co\.uk \bplayboy\.com \bpolitico\.com \bpolitifact\.com \bpolygon\.com \bpropublica\.org \bqz\.com \brfa\.org \brappler\.com \breason\.com \btheregister\.co\.uk \breligionnews\.com \breuters\.com \brollingstone\.com \brottentomatoes\.com \bsciencebasedmedicine\.org \bscientificamerican\.com \bscotusblog\.com \bnews\.sky\.com \bsnopes\.com \bscmp\.com \bsplcenter\.org \bspace\.com \bspiegel\.de \bsmh\.com\.au \bthewrap\.com \btime\.com \bthetimes\.co\.uk \bthesundaytimes\.co\.uk \btimesonline\.co\.uk \btorrentfreak\.com \btvguide\.com \btvguidemagazine\.com \busnews\.com \busatoday\.com \bvanityfair\.com \bvariety\.com \bventurebeat\.com \btheverge\.com \bvogue\.com \bvoanews\.com \bvox\.com \bwsj\.com \bwashingtonpost\.com \bweeklystandard\.com \bthewire\.in \bthewirehindi\.com \bthewireurdu\.com \bwired\.com \bwired\.co\.uk \bnews\.yahoo\.com \bzdnet\.com |
Duplicates to remove from the old list
|
---|
\bbbc\.com \bbbc\.co\.uk \bft\.com \bindependent\.co\.uk \bjamanetwork\.com \btelegraph\.co\.uk \btheguardian\.com \bthetimes\.co\.uk |
I participated in that discussion, but see no reason think the consensus isn't still valid. Suffusion of Yellow (talk) 19:19, 20 May 2023 (UTC)
- @Suffusion of Yellow I was only here as an edit request patrolling admin, the ER wasn't ready - if it's ready now, please reactivate the request to enqueue this again. — xaosflux Talk 19:50, 20 May 2023 (UTC)
- Well, I don't see any problems, but can't hurt to ask Headbomb who probably has RSNP memorized. Does it look like I generated that list properly? Suffusion of Yellow (talk) 23:21, 20 May 2023 (UTC)
- Minor quibble: does the /en after bdw.com actually work? I'm not exactly how the check does with the whitelist, but I imagine it works only on the domain (not the path within the host), to prevent citations such as wikipedia.org.spamsite.tld/spamspamspam.html. Certes (talk) 11:10, 21 May 2023 (UTC)
- Oops, it doesn't: see #Protected edit request on 11 April 2021 (updated today) below. Certes (talk) 22:34, 21 May 2023 (UTC)
- I've reactivated the request, per lack of objection. Please:
- Add all lines from the "RSNP" list above
- Remove all lines from the "Duplicates" list
- Thanks. Suffusion of Yellow (talk) 20:54, 23 May 2023 (UTC)
- Done Izno (talk) 23:09, 24 May 2023 (UTC)
Adding NCBI to the list
editIs undeniably a source of reliable peer-reviewed journal articles and is often used in citations (eg. WP:PUBMED) - i.e. same as jstor.org, which is already on the list. 107.190.33.254 (talk) 17:08, 7 May 2020 (UTC)
- The entire nih.gov domain is already on the list - is it not working? — xaosflux Talk 17:48, 7 May 2020 (UTC)
- My bad; then; I only searched for "ncbi" using ctrl+f and couldn't find it. Through I could have sworn it didn't always work; maybe it was some other website as result of citation templates or maybe I was adding multiple sources. Anyway, now it works without a doubt, case closed. Thanks, 107.190.33.254 (talk) 18:19, 7 May 2020 (UTC)
Protected edit request on 14 May 2020
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Remove "such as those used in {{cite doi}}." from the header and "and in Template:Cite doi" from the comment after doi.org, since Template:Cite doi was deprecated. * Pppery * it has begun... 19:35, 14 May 2020 (UTC)
- Done. Thanks for submitting this! — Newslinger talk 21:46, 14 May 2020 (UTC)
Protected edit request on 11 April 2021
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
- Change every single regex entry to have
$
at the end. Two example lines:- -
\bwikipedia\.org # All language versions of Wikipedia - +
\bwikipedia\.org$ # All language versions of Wikipedia
- (...)
- -
\bbbc\.com - +
\bbbc\.com$
- -
I've indicated with <del>
and <ins>
what the respective changes for these lines should be, but I think the changes should be self-explanatory.
The reason this change is necessary is because currently this whitelist also whitelists urls such as http://wiki.x.io.phishing.site.example.org/my_virus_url, just to give a blatant example of a bad url. Please do test this yourself, but from my testing on another wiki, those URLs were accepted as long as the regular expressions are not finished with a $
. As the page states: "Every non-blank line is a regex fragment which will only match hosts inside URLs". This means that the end of the domain name can safely be finished with a $
marker, since the text that will be matched against will never contain anything after the last character in the domain name.
I'm not sure if this should be communicated to other international versions of wikipedia, but it seems relevant for you guys to change this since you are the first hit on Google when I search for the system message name ("MediaWiki:Captcha-addurl-whitelist"). Joeytje50 (talk) 17:43, 11 April 2021 (UTC)
- I'm pretty sure this would break it to only allow
https://wiki.x.io
, and not sayhttps://wiki.x.io/any/page.php
. If I'm right, what you actually want is to add a/
to the end. Anomie⚔ 01:03, 12 April 2021 (UTC)- If the trailing slash is optional then we need something like
\bwikipedia\.org(/.*)?$
, though I think this still allowsnot-wikipedia.org
. Certes (talk) 10:14, 12 April 2021 (UTC)- The \b boundries aren't stopping that? — xaosflux Talk 17:58, 16 April 2021 (UTC)
- If the trailing slash is optional then we need something like
- Not done this needs more review and testing before bulk changes are made. — xaosflux Talk 17:58, 16 April 2021 (UTC)
@Joeytje50, Anomie, Xaosflux, and Certes: Some tests at test2wiki (testwiki's link handling is broken) Anything not marked (captcha) didn't get a captcha:
\bacm\.org
\bacs\.org$
- https://acs.org
- https://acs.org/ (captcha)
- https://acs.org.spam.site (captcha)
\banb\.org/
- https://anb.org (captcha)
\bapa\.org(?:/|$)
(?<=[./])bbc\.com(?:/|$)
- https://bbc.com/
- https://foo.bbc.com
- https://foo-bbc.com/ (captcha)
- https://bbc.com.spam.site (captcha)
\bdw\.com/en
- https://dw.com/en
- https://dw.com/spam (captcha)
- https://dw.com (captcha)
So yes, the problem is real. It looks like the right format is (?<=[./])some\.good\.site(?:/|$)
Not sure what to do here. Adding all those (?:/|$)
seems cheap enough. But what about all those (?<=[./])
lookbehinds? Could that cause a performance hit? Suffusion of Yellow (talk) 21:54, 21 May 2023 (UTC)
- Even that will match https://malicious.domain/pretending.to.be.some.good.site/virus.exe, though not https://some.good.site:80/innocent.html. Is the whole URL matched against the pattern? If so, we may need to parse the whole URL, starting the regexp with ^. There's at least one whole website devoted to how to do that properly, or see page 50 of https://www.ietf.org/rfc/rfc3986.txt. Certes (talk) 23:03, 21 May 2023 (UTC)
- No, see the https://spam.site/acm.org example above. Assuming this is the right place, the regexes are bundled together, then prefixed with
^(?:https?:)?\/\/+[a-z0-9_\-.]*
. We could use the<noprotocol>
option and supply the prefixes ourselves, but would that be even slower? Or we could do the bundling ourselves, but that would make this page as unreadable as some edit filters. Suffusion of Yellow (talk) 23:47, 21 May 2023 (UTC)- @Suffusion of Yellow and Certes: If the prefix
^(?:https?:)?\/\/+[a-z0-9_\-.]*
is added, then that would be an issue in MediaWiki itself, right? You would expect the prefix to require a period at the end, if there is any subdomain preceding the whitelisted domain. Otherwise I'm pretty sure almost every single wiki that has a whitelist is vulnerable to adding a link tohttp://fake-wikipedia.org
(demo). A simple\b
is not sufficient, due to the existence of the dash in domain names. - So regardless of this protected edit request, I'd say MediaWiki should change the prefix to
^(?:https?:)?\/\/+([a-z0-9_\-.]*\.)*
to enforce the period at the end. Let me know what you guys think about that. - Regarding this edit request, I'd say the testing done by Suffusion of Yellow is pretty conclusive that some changes are needed. The lookbehind is required because of the aforementioned issue with hyphens (simple
\b
is insufficient), and the lookahead for the trailing slash or string terminator is required because otherwisewikipedia.org.spam.site
would be whitelisted as well. I haven't re-enabled the edit request template at the top, but if anyone knows what the impact would be on performance, I think this request can be re-enabled. If performance is impacted significantly, I think the aforementioned change to MediaWiki software is even more important, and if lookbehinds are impacting performance, I'd assume changing the lookbehind to(/|$)
as a regular capturing group would work as well. - The updated edit request is now:
- At the start of every line:
→\b
(?<=[./])
- At the end of every line:
(?:/|$)
- At the start of every line:
- Joeytje50 (talk) 11:49, 29 January 2024 (UTC)
- Thanks, that looks good to me. It's hard to be sure without analysing the code which will apply the regexp, but I am hopeful that it will work without side effects. Certes (talk) 13:48, 29 January 2024 (UTC)
- @Suffusion of Yellow and Certes: If the prefix
- No, see the https://spam.site/acm.org example above. Assuming this is the right place, the regexes are bundled together, then prefixed with
Protected edit request on 20 May 2023
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Please add:
\btoolforge\.org
I assume this will be uncontroversial; wmflabs is already there. Suffusion of Yellow (talk) 00:22, 20 May 2023 (UTC)
- Done — xaosflux Talk 01:07, 20 May 2023 (UTC)
Protected edit request on 1 June 2023
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Please add the following URLs (except for books.google.com and cnbc.com, those are auto-generated by various CS1 templates when the required IDs are passed to them; see Template:Citation Style documentation/id2):
\bapi\.semanticscholar\.org \barxiv\.org \bbiorxiv\.org \bbooks\.google\.com \bciteseerx\.ist\.psu\.edu \bcnbc\.com \bhdl\.handle\.net \blccn\.loc\.gov \bmathscinet\.ams\.org \bopenlibrary\.org \bosti\.gov \bpapers\.ssrn\.com \btools\.ietf\.org \bui\.adsabs\.harvard\.edu \bzbmath\.org
93.72.49.123 (talk) 14:50, 1 June 2023 (UTC)
- Done — Martin (MSGJ · talk) 12:28, 13 June 2023 (UTC)
Protected edit request on 8 June 2024
editThis edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Please Google and Bing to the list:
\bgoogle\.com \bbing\.com
Since {{AfC submission/pending}} template includes links to the search engines through the {{find sources}} invocation, unconfirmed users are forced to enter captchas when submitting drafts. Unconfirmed users have a rate limit of 8 edit attempts per minute which is not much. The counter is incremented every time an edit is interrupted due to a captcha requirement, and also every time a captcha entered is incorrect. According to the metrics collected from the submission wizard, 10% of all submits fail with a rate limit error. The issue has also been reported by users: Wikipedia talk:WikiProject Articles for creation#Rate limit issue, Wikipedia talk:WikiProject Articles for creation/Submission wizard#Need help draft.
Links to search results don't help with SEO or otherwise have much spam potential. – SD0001 (talk) 06:39, 8 June 2024 (UTC)
- This seems like a bad idea. General purpose commercial search engines like Google and Bing are certainly not reliable sources and shouldn't be getting linked to; change the template to fix problem with this one use case. — xaosflux Talk 09:26, 8 June 2024 (UTC)
- Are you saying that {{find sources}} should not link to Google or Bing? – SD0001 (talk) 11:05, 8 June 2024 (UTC)
- Or {{AfC submission/pending}} could not transclude {{find sources}}… jlwoodwa (talk) 04:26, 9 June 2024 (UTC)
- Yup, more along that. Improving that workflow seems like a better idea. — xaosflux Talk 12:39, 9 June 2024 (UTC)
- Or {{AfC submission/pending}} could not transclude {{find sources}}… jlwoodwa (talk) 04:26, 9 June 2024 (UTC)
- So the idea is to use the captcha system to generate friction for editors trying to add search engines? Sounds like it is too broad if it is also generating friction when using official templates such as {{AfC submission/pending}}. Not sure what the performance cost would be, but an edit filter could potentially warn against this with a better warning message and less false positives. The regex would be something like
<ref[^>]*>[^<]+google\.com
. Although I suppose this would only catch refs and not external links. Hmm. –Novem Linguae (talk) 08:48, 9 June 2024 (UTC)
- Are you saying that {{find sources}} should not link to Google or Bing? – SD0001 (talk) 11:05, 8 June 2024 (UTC)
- Not done these are certainly not reliable sources; additional discussion needed. Wikipedia:Reliable sources/Noticeboard is the standard venue for such discussion. — xaosflux Talk 12:41, 9 June 2024 (UTC)