Wikipedia:Bots/Noticeboard/Archive 17
This is an archive of past discussions on Wikipedia:Bots. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page. |
Archive 10 | ← | Archive 15 | Archive 16 | Archive 17 | Archive 18 | Archive 19 |
MalnadachBot task 12 was recently speedily approved to correct lint errors on potentially hundreds of thousands of pages. As the bot started making some initial edits, my watchlist has started blowing up. The majority of the edits that I see in my watchlist are fixing deprecated <font> tags in one particular user's signature, in ancient AfD nominations that I made 10+ years ago. A very small sampling: [1][2][3][4] These edits do not change the way the page is rendered; they only fix the underlying wikitext to bring it into compliance with HTML5. Since no substantive changes are being made in any of these edits, I believe this bot task should not have been approved per our bot policy; specifically, WP:COSMETICBOT. I'd like to request that this task (and any other similar tasks) be reviewed in light of this. Pinging bot owner and bot task approver: @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ: @Primefac: —ScottyWong— 15:59, 28 January 2022 (UTC)
- Scottywong, when should obsolete HTML tags be converted to modern syntax? Lint errors have been flagged by MediaWiki since 2018, so a small group of editors have already been fixing errors for over three years and there are still millions of errors. Given that we have fixed a lot of the easy errors, the remaining millions of errors will take multiple years to fix. – Jonesey95 (talk) 16:13, 28 January 2022 (UTC)
- It is properly tagging the edit as "bot" and "minor" - so watchlist flooding should be able to be alleviated by hiding bot edits. — xaosflux Talk 16:23, 28 January 2022 (UTC)
- I understand why the bot is making these edits, and how to hide them from my watchlist. However, if you're suggesting that WP:COSMETICBOT is no longer a valid part of bot policy, perhaps we should delete that section from WP:BOTPOL? Or can you explain how this bot is not making purely cosmetic edits to the wikitext of pages? —ScottyWong— 16:40, 28 January 2022 (UTC)
- I haven't gone through that part, was looking if there was any immediate tech issue that was causing flooding. — xaosflux Talk 16:47, 28 January 2022 (UTC)
- WP:COSMETICBOT explicitly mentions
[fixing] egregiously invalid HTML such as unclosed tags, even if it does not affect browsers' display or is fixed before output by RemexHtml (e.g. changing <sup>...</sub> to <sup>...</sup>)
as non-cosmetic. – SD0001 (talk) 16:47, 28 January 2022 (UTC)- Scottywong, I quoted COSMETICBOT to you once today, but maybe you haven't seen that post yet. Here it is again:
Consensus for a bot to make any particular cosmetic change must be formalized in an approved request for approval.
That happened. The BRFA and the bot's edits are consistent with WP:COSMETICBOT. – Jonesey95 (talk) 16:49, 28 January 2022 (UTC)- The BRFA was speedily approved in less than 3 hours, with no opportunity for community discussion. This discussion can act as a test for whether or not there is community consensus for this bot to operate in violation of WP:COSMETICBOT. The <sub></sup> example given above is substantive, because it would actually change the way the page is rendered. Changing <font> tags to <span> tags results in no change whatsoever, since every modern browser still understands and supports the <font> tag, despite it being deprecated. —ScottyWong— 16:54, 28 January 2022 (UTC)
- Well, unlike the hard work we did to clear out obsolete tags in Template space, we're not going to fix millions of font tags in talk space pages by hand, which leaves two options that I can see: an automated process, or leaving the font tags in place until it is confirmed that they will stop working. It sounds like what you want is an RFC at VPR or somewhere to ask if we should formally deprecate the use of font tags on the English Wikipedia. You might want to ask about other obsolete tags (
<tt>...</tt>
,<strike>...</strike>
, and<center>...</center>
) while you're at it. – Jonesey95 (talk) 17:02, 28 January 2022 (UTC)- There's a difference between "from this point on, let's not use font tags anymore" and "let's go back to millions of dormant AfD pages (most of which will never be read or edited ever again, for the rest of eternity) and make millions of edits to change all of the font tags to span tags." Let's see how this discussion goes first, and then we can determine if a wider RFC is necessary. —ScottyWong— 17:15, 28 January 2022 (UTC)
- My bot edits are not in violation of WP:COSMETICBOT, Lint errors are exempt from the usual prohibition on cosmetic edits. See point 4
fixed before output by RemexHtml
covers Lint errors.As for the speedy approval, the context for that is the prior BRFAs for MalnadachBot. They were to fix very specific types of Lint errors that were all done successfully after testing and discussion, fixing over 4.7 million Lint errors in the process. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 17:18, 28 January 2022 (UTC) - Perhaps this is a pedantic point that misses the crux of what you're saying, but there are millions of errors, not millions of AfDs (per a Quarry there are only 484,194 AfD subpages, excluding daily logs). jp×g 20:39, 31 January 2022 (UTC)
- My bot edits are not in violation of WP:COSMETICBOT, Lint errors are exempt from the usual prohibition on cosmetic edits. See point 4
- There's a difference between "from this point on, let's not use font tags anymore" and "let's go back to millions of dormant AfD pages (most of which will never be read or edited ever again, for the rest of eternity) and make millions of edits to change all of the font tags to span tags." Let's see how this discussion goes first, and then we can determine if a wider RFC is necessary. —ScottyWong— 17:15, 28 January 2022 (UTC)
- Regarding the speedy approval: the bot operator had 10 successful similar runs fixing these types of errors, so to say that there was "no opportunity for discussion" is a little silly - the first task was approved in May 2021, so in my mind that is 9 months worth of bot edits during which the task(s) could have been discussed. When a bot operator gets to a point where they have a bunch of similar tasks that are trialled and run with zero issues, I start speedy-approving them, not only because it saves the botop time, but it has been demonstrated that the type of tasks being performed by the bot are not an issue. Primefac (talk) 17:19, 28 January 2022 (UTC)
- To be clear, I'm not saying that the speedy approval was necessarily inappropriate. I was responding to Jonesey95's assertion that the BRFA represents explicit community consensus for this task. I'm also not suggesting that the bot is doing anything technically incorrect, or that it has any bugs. All I'm suggesting is that if fixing purely cosmetic wikitext syntax issues on ancient AfD pages doesn't qualify as WP:COSMETICBOT, then I'm not sure what would. If WP:COSMETICBOT no longer reflects the way that bots work on WP, then perhaps we should remove it. But until it's removed, I still believe that this type of task falls on the wrong side of bot policy, as currently written. —ScottyWong— 17:32, 28 January 2022 (UTC)
- I quoted the relevant portion of COSMETICBOT above. – Jonesey95 (talk) 17:48, 28 January 2022 (UTC)
- Speaking only for myself, I have a hard time finding a basis revoking approval (or denying it). The bot is doing the legwork to future proof our pages with deprecated syntax. That to me, is a good thing. The bot op / bot page however, could mention WP:HIDEBOTS as a way to reduce watchlist clutter for those annoyed by the task, but the task itself is IMO legit. Headbomb {t · c · p · b} 17:50, 28 January 2022 (UTC)
- I quoted the relevant portion of COSMETICBOT above. – Jonesey95 (talk) 17:48, 28 January 2022 (UTC)
- I agree with Primefac's assessment here. A dozen BRFAs that are preventative measures to avoid pages falling into the state where we would find COSMETICBOT irrelevant, never mind the lines in COSMETICBOT that indicate that these changes can be executed today? Sounds reasonable to me. It also avoids (good faith) time spent elsewhere like at WP:VPT when we get questions about why someone can't read an archive. Izno (talk) 18:51, 28 January 2022 (UTC)
- It seems that I'm the lone voice on this one (apart from one other user that expressed concern on the bot owner's talk page), which is fine. If you wouldn't mind, I'd like to leave this discussion open for a while longer to give anyone else an opportunity to express an opinion. If, after a reasonable amount of time, there is clear consensus that making these cosmetic changes is a good thing for the project, I'm happy to respect that and crawl back into my hole. I paused the bot while this discussion was ongoing; I will unpause the bot now since it seems somewhat unlikely that approval for this task will be revoked, and allowing the bot to continue making these edits might draw more attention to this discussion. —ScottyWong— 18:56, 28 January 2022 (UTC)
- Just a quick note from my phone, re: COSMETICBOT: Something that would qualify as a cosmetic edit that would probably never gain approval would for example be aligning '=' signs in template calls (as you often see in infoboxes) or or removing whitespace from the ends of lines. These things might clean up the wikitext, but they don't change the HTML output. AFAIK, that's what COSMETICBOT is good for. --rchard2scout (talk) 15:09, 30 January 2022 (UTC)
- I agree, cosmetic bot should be used with common sense it's not a hard red line. The question is this bot a good idea, enough to justify so many edits, and the answer is yes IMO. Furthermore, while the changes may technically be cosmetic today they won't be in the future, presumably, if/when some browsers stop supporting older syntax. I wish we had a way to make these types of edits hidden by default vs. opt-in. -- GreenC 15:24, 30 January 2022 (UTC)
... make these types of edits hidden by default...
- they already are, as the default email watchlist settings are to hide minor edits. Hell, if it's a minor bot edit, it will keep it off your watchlist even if you do want it to show up. Primefac (talk) 20:48, 30 January 2022 (UTC)- Are you sure? At the top of my watchlist I have an array of checkboxes to hide:
- I agree, cosmetic bot should be used with common sense it's not a hard red line. The question is this bot a good idea, enough to justify so many edits, and the answer is yes IMO. Furthermore, while the changes may technically be cosmetic today they won't be in the future, presumably, if/when some browsers stop supporting older syntax. I wish we had a way to make these types of edits hidden by default vs. opt-in. -- GreenC 15:24, 30 January 2022 (UTC)
- Just a quick note from my phone, re: COSMETICBOT: Something that would qualify as a cosmetic edit that would probably never gain approval would for example be aligning '=' signs in template calls (as you often see in infoboxes) or or removing whitespace from the ends of lines. These things might clean up the wikitext, but they don't change the HTML output. AFAIK, that's what COSMETICBOT is good for. --rchard2scout (talk) 15:09, 30 January 2022 (UTC)
- It seems that I'm the lone voice on this one (apart from one other user that expressed concern on the bot owner's talk page), which is fine. If you wouldn't mind, I'd like to leave this discussion open for a while longer to give anyone else an opportunity to express an opinion. If, after a reasonable amount of time, there is clear consensus that making these cosmetic changes is a good thing for the project, I'm happy to respect that and crawl back into my hole. I paused the bot while this discussion was ongoing; I will unpause the bot now since it seems somewhat unlikely that approval for this task will be revoked, and allowing the bot to continue making these edits might draw more attention to this discussion. —ScottyWong— 18:56, 28 January 2022 (UTC)
- To be clear, I'm not saying that the speedy approval was necessarily inappropriate. I was responding to Jonesey95's assertion that the BRFA represents explicit community consensus for this task. I'm also not suggesting that the bot is doing anything technically incorrect, or that it has any bugs. All I'm suggesting is that if fixing purely cosmetic wikitext syntax issues on ancient AfD pages doesn't qualify as WP:COSMETICBOT, then I'm not sure what would. If WP:COSMETICBOT no longer reflects the way that bots work on WP, then perhaps we should remove it. But until it's removed, I still believe that this type of task falls on the wrong side of bot policy, as currently written. —ScottyWong— 17:32, 28 January 2022 (UTC)
- Well, unlike the hard work we did to clear out obsolete tags in Template space, we're not going to fix millions of font tags in talk space pages by hand, which leaves two options that I can see: an automated process, or leaving the font tags in place until it is confirmed that they will stop working. It sounds like what you want is an RFC at VPR or somewhere to ask if we should formally deprecate the use of font tags on the English Wikipedia. You might want to ask about other obsolete tags (
- The BRFA was speedily approved in less than 3 hours, with no opportunity for community discussion. This discussion can act as a test for whether or not there is community consensus for this bot to operate in violation of WP:COSMETICBOT. The <sub></sup> example given above is substantive, because it would actually change the way the page is rendered. Changing <font> tags to <span> tags results in no change whatsoever, since every modern browser still understands and supports the <font> tag, despite it being deprecated. —ScottyWong— 16:54, 28 January 2022 (UTC)
- Scottywong, I quoted COSMETICBOT to you once today, but maybe you haven't seen that post yet. Here it is again:
- I understand why the bot is making these edits, and how to hide them from my watchlist. However, if you're suggesting that WP:COSMETICBOT is no longer a valid part of bot policy, perhaps we should delete that section from WP:BOTPOL? Or can you explain how this bot is not making purely cosmetic edits to the wikitext of pages? —ScottyWong— 16:40, 28 January 2022 (UTC)
- registered users
- unregistered users
- my edits
- bots
- minor edits
- page categorization (checked by default)
- Wikidata (checked by default)
- probably good edits
- I see all minor and bot edits to articles in my watchlist. Were it true that minor and bot edits are default hidden, Monkbot/task 18 might have run to completion.
- —Trappist the monk (talk) 21:04, 30 January 2022 (UTC)
- Apologies, should have specified I was referring to email notifications. I have updated my comment accordingly. Primefac (talk) 21:09, 30 January 2022 (UTC)
- I understand where everyone is coming from, and I don't intend to continue arguing about it when it's clear I'm in the minority, but perhaps I'm a little out of the loop. Here's my question: is there any real evidence that any major, modern browsers have plans to fully deprecate support for things like <font> tags and other HTML4 elements? Will there come a time that if a browser sees a font tag in the html source of a page, it literally will ignore that tag or won't know what to do with it? It seems like such an easy thing for a browser to continue supporting indefinitely with little to no impact on anything. I suppose I'm wondering if this bot is solving an actual problem, or if it's trying to solve a hypothetical problem that we think might exist at some point in the distant future. —ScottyWong— 21:28, 30 January 2022 (UTC)
- That is a great question to address to MediaWiki's developers, who have deliberately marked
<font>...</font>
and a handful of other tags as "obsolete" by inserting error counts into the "Page information" page for all pages containing those tags. In the software world, marking a specific usage as obsolete or deprecated is typically the first step toward removal of support, and the MW developers have removed support for many long-supported features over the years. The MediaWiki developers may have similar plans for obsolete tags, or they may have other plans. – Jonesey95 (talk) 23:30, 30 January 2022 (UTC)- There are some good notes at Parsing/Notes/HTML5 Compliance. There's also some discussion buried in gerrit:334990 which mostly represents my current thoughts (though I certainly don't speak for the Parsing Team anymore), which is that it is probably unlikely browsers will stop supporting
<font>
,<big>
, etc. in the near future. If they do, we could implement those tags ourselves to prevent breakage either at the wikitext->HTML layer or just in CSS. I don't think there are any plans or even considerations to remove these deprecated tags from wikitext before browsers start dropping them. I said this in basically the same discussion on Meta-Wiki in 2020. - That said, I would much rather see bots that can properly parse font/etc. tags into their correct span/etc. replacements so it can all be done in one pass instead of creating regexes for every signature. Legoktm (talk) 18:49, 4 February 2022 (UTC)
- There are some good notes at Parsing/Notes/HTML5 Compliance. There's also some discussion buried in gerrit:334990 which mostly represents my current thoughts (though I certainly don't speak for the Parsing Team anymore), which is that it is probably unlikely browsers will stop supporting
- This is not a hypothetical problem, we know for sure that at least one obsolete html tags marked by Linter does not work in mobile.
<tt>...</tt>
renders as plain text in Mobile Wikipedia. Compare this in mobile and desktop view. Based on this we can reasonably conclude that<font>...</font>
will stop working at some point as well. Besides not everything that is obsolete in HTML5 is counted as obsolete by Linter. For example tags like<big>...</big>
and table attributes likealign
,valign
andbgcolor
are not marked by Linter even though they too are obsolete in HTML5 like font tags. So it seems the developers have plans to continue support for these, but not for font tags. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 06:00, 31 January 2022 (UTC)- Although for the record,
<tt>...</tt>
does not work in mobile because of specific css overriding it in the mobile skin (dating back to 2012). Whether any common mobile browsers didn't or don't support it is unclear. Anomie⚔ 13:57, 31 January 2022 (UTC)- So, it still sounds like we're fixing hypothetical problems that we think will eventually become real problems. I agree that it'll probably eventually become a real problem one way or the other, but honestly I still don't see the point of correcting problems in someone's signature on a 12 year-old AfD. —ScottyWong— 07:17, 1 February 2022 (UTC)
- Although for the record,
- That is a great question to address to MediaWiki's developers, who have deliberately marked
- As much I don't like my watchlist full of bot edits, it's not a COSMETICBOT violation as long the changes clear a maintenance error. What I would say is a problem is that the bot doesn't actually fix all the issues at once. For example, this edit is fine, but what about these font tags? It the bot really going to come back and edit the page again? Or even the same task like this edit, which is fine, except there are two other font tag uses. Is the bot really replacing each signature one at a time? — HELLKNOWZ ∣ TALK 11:12, 31 January 2022 (UTC)
- This is, to be honest, one of the reasons why I gave a slightly-more-blanket approval for the task; instead of "here are another three signatures" I was hoping the botop would find a wide range of similar linter errors that would likely be on the same page(s), and hit them all at once. As the run progressed, and new errors were found, they could just be added to the run logic without the need for a subsequent BRFA. If this is not the case, then it sure should be. Primefac (talk) 11:34, 31 January 2022 (UTC)
- I have just been adding specific patterns to the replacement list as and when I find them. I don't want to use general purpose regexes to do replacements since this is a fully automated task. They work fine most of the time but edge cases are troublesome. My experience Linting Wikipedia has shown that people are... creative in using all sorts of things that would cause problems for general purpose regexes. Considering the size of this task, even with 99.9% accuracy, it would still leave thousands of false positives. This is the kind of bot task that when things go smoothly, most people wouldn't care, but if there are a few errors, lots of people would come to your talk with complaints. When the number of Lint errors are down to less than a hundred thousand instead 16.6 million today, then it would be possible to do a supervised run and try to clear out all errors in a page with a single edit. My current approach of using only specific replacements may not fix all errors in the page at a time, but it does the job by keeping false positives as close to zero as possible. This to me is the most important thing.That said, I will increase the number of find and replace patterns the bot considers at a time so that more can be replaced if they are present in a page. The bot will take more time to process a page and will have to use a generic edit summary, but that's a good tradeoff I guess. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 18:07, 31 January 2022 (UTC)
- That's not really a good reason to not consolidate all of these changes into one edit. If you have code that can correct <font> tags and you have another piece of code that can correct <tt> tags, then all you have to go is grab the wikitext, run it through the font tag code, then take the resulting corrected wikitext and run it through the tt tag code, then take the resulting wikitext and run it through any other blocks of delinting code that you have, and then finally write the changes to the article. Instead, you're grabbing the wikitext, running it through the font tag code, and then saving that edit. Then, sometime later, you're grabbing that wikitext again, and running it through tt tag code, and saving that edit. There's really no difference, except for the number of edits you're making. —ScottyWong— 07:14, 1 February 2022 (UTC)
- To clarify, the code I have to correct font tags (i.e general purpose regexes to correct font tags) and some other Lint errors works fine most of the time, but gives some false positives which makes it not suitable for use in a fully automated task like this. You can read the first BRFA and this discussion for why I do not use such code with my bot. Usually in a situation like this, we would run it as semi-automated task and approve every edit before saving so that false positives can be discarded. But that is not possible here due to the huge number of pages involved. So I am left to work with a set of definite replacements, like specific user signatures and substituted templates, that are checked in a page before saving an edit. I have increased the number of replacements it will check to try and get more in an edit. This would be an example of when more than one of the replacements checked by the bot were present in a page and fixed in the same edit. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 15:58, 1 February 2022 (UTC)
- That's not really a good reason to not consolidate all of these changes into one edit. If you have code that can correct <font> tags and you have another piece of code that can correct <tt> tags, then all you have to go is grab the wikitext, run it through the font tag code, then take the resulting corrected wikitext and run it through the tt tag code, then take the resulting wikitext and run it through any other blocks of delinting code that you have, and then finally write the changes to the article. Instead, you're grabbing the wikitext, running it through the font tag code, and then saving that edit. Then, sometime later, you're grabbing that wikitext again, and running it through tt tag code, and saving that edit. There's really no difference, except for the number of edits you're making. —ScottyWong— 07:14, 1 February 2022 (UTC)
it's not a COSMETICBOT violation as long the changes clear a maintenance error
I disagree. The basis for COSMETICBOT in the first place is that cosmetic editsclutter page histories, watchlists, and/or the recent changes feed with edits that are not worth the time spent reviewing them
, so they should not be performed unless there is an overriding reason to do so. That is not in the text of the policy, but clearly a balance has to be struck between usefulness and spamminess; the present case has fairly low usefulness (clearing a maintenance error is hardly high-priority).- Basically I agree with Scottywong. I have no strong feelings for or against COSMETICBOT, but if that bot task is deemed to be compliant, it means the policy is toothless, so we might as well remove it. (It might be argued that the bot task is against COSMETICBOT but should still be allowed as an explicit exception, but I do not see anyone making that argument.) TigraanClick here for my talk page ("private" contact) 13:08, 3 February 2022 (UTC)
- "unless there is an overriding reason to do so" -- yes, and one of the common examples is right below in the policy text: "egregiously invalid HTML [..]". I mean, I agree that the policy sets no limit on spamminess, but that's a separate matter. — HELLKNOWZ ∣ TALK 15:16, 3 February 2022 (UTC)
- The policy says
egregiously invalid HTML such as unclosed tags
. Unclosed tags is a much more serious issue than deprecated tags, let alone deprecated tags that are still supported. "Egregiously invalid HTML" does not include only unclosed tags, but IMO it does not include font tags. At the very least, that is the sort of thing you would expect some discussion about at BRFA - if we are serious about enforcing the policy as written. TigraanClick here for my talk page ("private" contact) 13:13, 4 February 2022 (UTC)- Again I ask: When should we start fixing these deprecated tags, if not now? There are millions of instances of deprecated tags. Based on our historical pace of fixing Linter errors, it will take multiple years to fix all of them, especially since font tags in signatures are still, inexplicably, allowed by the MediaWiki software. – Jonesey95 (talk) 14:30, 4 February 2022 (UTC)
- There is a saying that prevention is better than cure. Software updation is a natural part of all websites. That
<font>...</font>
,<center>...</center>
and other obsolete tags are still supported doesn't mean it will continue to be so in future, hence why developers have marked them as errors and giving us time to replace them. Imagine logging in one day and seeing pages out of alignment and colors not being displayed, among other things. It had already happened once in July 2018. Will they not be egregiously invalid after that happens? These edits will benefit editors and readers by making sure that pages continue to display as editors intended when software changes. This is basically why COSMETICBOT allows fixing htmleven if it does not affect browsers' display or fixed before output by RemexHtml
. My bot has already replaced about 2.5 million font tags while running the previous 10 BRFA Lint fixing tasks, hardly something there was no discussion about. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 14:36, 4 February 2022 (UTC) - (edit conflict) I consider tags that will stop working on par with unclosed tag pairs. Invalid markup is invalid. Be it for visual display, parsing, screen readers, printing, accessibility, forwards-compatibility, etc. — HELLKNOWZ ∣ TALK 14:40, 4 February 2022 (UTC)
- @Hellknowz: But these deprecated HTML4 tags are still supported, and no plans have been announced by anyone to stop supporting them at a specific date. Sure, eventually they will be unsupported, maybe next year, maybe coinciding with the heat death of the universe, or maybe sometime in between. At this point, we're reacting to something that we think might happen in the future, but we don't know when, and we don't know the specifics of how that deprecation will be handled. Maybe we'll be given 5 years notice of the deprecation. Maybe, by the time HTML4 tags are fully unsupported, we'll already be using HTML6 tags, and we'll have to go through this whole process a second time when we could have just done it once. The point is: we're acting reflexively but we don't know anything yet. And our response is to edit millions of decades-old closed AfDs to mess with someone's signature. I'm amazed that so many people are pushing back on this one. I mean, I can understand going through article space to fix these issues. But 15 year-old AfDs? Really? How is that worth anyone's time? What is the worst case scenario if someone happens to open a 15 year-old AfD and someone's signature is displaying in the default font instead of the custom font that the editor originally intended? —ScottyWong— 17:31, 4 February 2022 (UTC)
- You say "worst case" but you describe one of the best cases. Worst case is the whole page throws a 5xx server error and doesn't render anything. I'm not saying it will happen, but I am saying we are trying to guess the future instead of fixing something that has been marked as deprecated. The original concern was that COSMETICBOT doesn't apply (or doesn't explicitly mention this use case). I only argue that it does follow policy even if people don't like it. But whether we actually want to fix this or leave it until it (may be) breaks is a different issue that wasn't opposed until watchlists lit up. This noticeboard can review approval on policy grounds, which I don't find a problem with. As I said, I don't like watchlist spam either and not doing all the replacements at the same time is pretty terrible. And this is likely not the first nor the last time something will need fixing. This sounds like a watchlist (search, sort, filter) functionality problem. — HELLKNOWZ ∣ TALK 19:10, 4 February 2022 (UTC)
- If some wikitext is causing 5xx errors, that's absolutely a developer/sysadmin problem, not one for on-wiki editors or bots to fix. Legoktm (talk) 06:42, 5 February 2022 (UTC)
- You say "worst case" but you describe one of the best cases. Worst case is the whole page throws a 5xx server error and doesn't render anything. I'm not saying it will happen, but I am saying we are trying to guess the future instead of fixing something that has been marked as deprecated. The original concern was that COSMETICBOT doesn't apply (or doesn't explicitly mention this use case). I only argue that it does follow policy even if people don't like it. But whether we actually want to fix this or leave it until it (may be) breaks is a different issue that wasn't opposed until watchlists lit up. This noticeboard can review approval on policy grounds, which I don't find a problem with. As I said, I don't like watchlist spam either and not doing all the replacements at the same time is pretty terrible. And this is likely not the first nor the last time something will need fixing. This sounds like a watchlist (search, sort, filter) functionality problem. — HELLKNOWZ ∣ TALK 19:10, 4 February 2022 (UTC)
- @Hellknowz: But these deprecated HTML4 tags are still supported, and no plans have been announced by anyone to stop supporting them at a specific date. Sure, eventually they will be unsupported, maybe next year, maybe coinciding with the heat death of the universe, or maybe sometime in between. At this point, we're reacting to something that we think might happen in the future, but we don't know when, and we don't know the specifics of how that deprecation will be handled. Maybe we'll be given 5 years notice of the deprecation. Maybe, by the time HTML4 tags are fully unsupported, we'll already be using HTML6 tags, and we'll have to go through this whole process a second time when we could have just done it once. The point is: we're acting reflexively but we don't know anything yet. And our response is to edit millions of decades-old closed AfDs to mess with someone's signature. I'm amazed that so many people are pushing back on this one. I mean, I can understand going through article space to fix these issues. But 15 year-old AfDs? Really? How is that worth anyone's time? What is the worst case scenario if someone happens to open a 15 year-old AfD and someone's signature is displaying in the default font instead of the custom font that the editor originally intended? —ScottyWong— 17:31, 4 February 2022 (UTC)
- The policy says
- "unless there is an overriding reason to do so" -- yes, and one of the common examples is right below in the policy text: "egregiously invalid HTML [..]". I mean, I agree that the policy sets no limit on spamminess, but that's a separate matter. — HELLKNOWZ ∣ TALK 15:16, 3 February 2022 (UTC)
- Some people have minor/bot edits showing on their watchlists for good reason. Eg, to monitor minor or bot edits to active pages. What they dont have them turned on for is to see a bot going back through a decades worth of archived/closed AFD's making trivial corrections to errors that barely deserve the name. Congratulations, you just pinged a load of deleted articles (quite a few contentious) back to the top of the watchlist of editors. Well done. Countdown to recreation in 5, 4, 3.... Only in death does duty end (talk) 15:14, 31 January 2022 (UTC)
- I'm coming here with a related problem. I was trying to search for something in the ANI archives, sorted by date. But that's impossible, because the date searched for is the last modified one, which is distorted because the bot has fixed minor errors long after the archive was created. For example, Wikipedia:Administrators' noticeboard/IncidentArchive969 contains two comments saying "please do not edit the archives", and yet this bot did it anyway. I don't really care what problem the bots were trying to solve, they have broken Wikipedia's search mechanism, making it unusable. Ritchie333 (talk) (cont) 12:16, 3 February 2022 (UTC)
- Honestly Wikipedia search is doomed. @Ritchie333: The problem you mention (searching archives where 'last modified' is updated by bot edits) I've gotten around by filtering by creation date, which should roughly correspond to the real date of entries in the archive. ProcrastinatingReader (talk) 15:21, 3 February 2022 (UTC)
- The problem Ritchie333 describes has existed forever, and it happens whether bots clean up the archive page, humans do manual tidying, or Xfd-processing editors do it. Pages need to be edited when MW code changes; that's just reality. It's the search dating that is broken. – Jonesey95 (talk) 15:32, 3 February 2022 (UTC)
- That doesn't have to be reality if we don't make it reality. We could choose to have our reality be one where archived discussion pages (like AN, ANI, XfD) are never edited, by bots or anyone else. And if the consequence is that, 20 years from now, an editor's fancy signature shows up in the default font instead of the custom font that the editor originally intended, well... we'll just have to find a way to emotionally deal with that problem. Coming from someone who uses custom fonts in their signature, I'm confident that I can find a way to work through that problem. It might take some extra therapy sessions, but I think I can do it. —ScottyWong— 17:36, 4 February 2022 (UTC)
- The problem Ritchie333 describes has existed forever, and it happens whether bots clean up the archive page, humans do manual tidying, or Xfd-processing editors do it. Pages need to be edited when MW code changes; that's just reality. It's the search dating that is broken. – Jonesey95 (talk) 15:32, 3 February 2022 (UTC)
- Honestly Wikipedia search is doomed. @Ritchie333: The problem you mention (searching archives where 'last modified' is updated by bot edits) I've gotten around by filtering by creation date, which should roughly correspond to the real date of entries in the archive. ProcrastinatingReader (talk) 15:21, 3 February 2022 (UTC)
- For what it's worth -- I think that this type of discussion often ends up with a pessimistic bent because people who don't see a problem don't care enough to comment about it -- I don't see a problem. Okay, maybe it breaks search: this seems like a potential real problem, but the deeper problem is that search sucks if it gets broken by this. I don't see why you would keep a page on your watchlist for ten years, besides the fact that ten years ago there wasn't a temporary-watchlist feature. It's not like there is any benefit to watchlisting an AfD that expired ten years ago -- unless there is? Is there? jp×g 20:12, 8 February 2022 (UTC)
- @JPxG Slightly off topic, but I do occasionally want to go back to old AFD discussions for one reason or another, and viewing the raw watchlist is a great way to find them. ~ ONUnicorn(Talk|Contribs)problem solving 19:15, 30 March 2022 (UTC)
Why not resolve this in MediaWiki?
I seem to recall seeing in some prior discussions (on WP:VPT IIRC, though I could not find the discussion so apologies for not linking it) that MediaWiki was going to have an update at some point that would basically take the wikitext (bad HTML and all) and correct it for output. It seems like clogging up edit histories with tens of thousands (or probably millions when it's all said and done) of revisions to "correct" markup that can be massaged/fixed in the rendering pipeline is a massive waste of time and resources. —Locke Cole • t • c 03:27, 7 February 2022 (UTC)
- As opposed to clogging up the rendering pipeline by requiring translation being a massive waste of time and resources for every person viewing every revision everywhere? :) Revisions in a database are fundamentally cheap, anyway.
- There was a brief time where one particular tag was translated, but it was quickly undone since it was used in places that were not compatible with the translation, among other reasons. Izno (talk) 04:40, 7 February 2022 (UTC)
- Considering rendering is only an issue for pages that are edited regularly, and most of the pages with errors seem to be old/stale discussion pages, I'm not convinced mass bot edits is somehow better. —Locke Cole • t • c 05:34, 7 February 2022 (UTC)
- No, the server needs to re-render every single viewed revision. The HTML for the revision of your comment is not stored as HTML, it is stored as wikitext. Izno (talk) 05:46, 7 February 2022 (UTC)
- Considering rendering is only an issue for pages that are edited regularly, and most of the pages with errors seem to be old/stale discussion pages, I'm not convinced mass bot edits is somehow better. —Locke Cole • t • c 05:34, 7 February 2022 (UTC)
- Locke Cole, the opposite of what you suggest is what actually happened. The code that renders pages had been silently correcting syntax errors for years, and when a new renderer was deployed some of those workarounds were not carried over. Hence Special:Linterrors, which flags errors that could cause rendering errors (most of which have been fixed by gnomes and bots since 2018) as well as conditions that will presumably have their workarounds removed at some point. For a deep dive, see mw:Parsing/Replacing Tidy. – Jonesey95 (talk) 04:52, 7 February 2022 (UTC)
- @Jonesey95: Thank you for the pointer. What drew my attention to this was this edit which replaced
<tt>
with<code>
tags which I thought was an odd thing to be done on a mass scale (semi-automated or not). —Locke Cole • t • c 05:45, 7 February 2022 (UTC)
- @Jonesey95: Thank you for the pointer. What drew my attention to this was this edit which replaced
The funny thing is that this doesn't even need to be fixed in MediaWiki, because no major browser in the world has trouble with HTML4 tags. If this bot didn't fix it, and MediaWiki software didn't fix it, then your browser would fix it. If anything, these issues should be fixed in article space only. I've still heard no legitimate reason why it's considered valuable to fix font tags on 12 year-old closed AfDs. —ScottyWong— 05:54, 7 February 2022 (UTC)
- True, in the case of
<tt>
, which rabbit-holed me to this discussion, MDN still shows the tag as being fully supported in every major browser on the market. Being deprecated in the spec doesn't mean chase down issues that don't exist. —Locke Cole • t • c 06:11, 7 February 2022 (UTC) I've still heard no legitimate reason
is basically a fallacy, because I can say the same thing and have it be just as true.- As for browsers, that's true today. For the same reason MediaWiki developers can shut off an arbitrary tag, so too could the browsers. And they have done it before, which cannot be said of the developers.
- Never mind that mobile today already applies a CSS reset to a bunch of the old tags—<tt> and <small> off the cuff, both of which render as normal text. Izno (talk) 06:15, 7 February 2022 (UTC)
- Those are all great reasons to fix these deprecated tags within article space. However, what is the value in making millions of edits to fix these deprecated tags on old AfD pages that closed over a decade ago? If anyone can provide one good reason why that would be valuable, I'll gladly shut up. I don't think it's fallacious to ask "why?" and hope for a cogent answer. —ScottyWong— 17:12, 7 February 2022 (UTC)
- I have, on multiple occasions, needed to check an old AFD log page (generally when someone improperly transcludes a TFD deletion template) and when there are linter errors on the page it takes ages to sort out where they're coming from in order for me to a) fix them, and b) find what I was originally looking for. To me, that is reason enough for to fix old things.
- On the "watchlist spam" topic, I go through about once a year and clear out about half my watchlist of things that I will probably never need to see again, many of which are deletion discussion pages. Primefac (talk) 17:21, 7 February 2022 (UTC)
- Those are all great reasons to fix these deprecated tags within article space. However, what is the value in making millions of edits to fix these deprecated tags on old AfD pages that closed over a decade ago? If anyone can provide one good reason why that would be valuable, I'll gladly shut up. I don't think it's fallacious to ask "why?" and hope for a cogent answer. —ScottyWong— 17:12, 7 February 2022 (UTC)
Request that Malnadachbot be limited to one edit per page
The fixing of lint errors continues ad infinitum, hopefully all of those AfDs from 12 years ago will display nicely for the hordes of editors that are reading them. Anyway, in all seriousness, it has come to my attention that MalnadachBot is making multiple edits per page to fix lint errors. My RfA page keeps popping up in my watchlist periodically. The history of that page shows that Malnadachbot has now made 8 separate edits to the page to fix lint errors. Five edits were made under Task 2, and three edits were made under Task 12. All of the edits are extremely similar to each other, and there is no reason that they couldn't be made in a single edit, if the bot owner had any idea how to properly use regex and AWB. How many more edits will he need to make to this 10 year-old archived page before it is free of lint errors? I honestly feel like this is evidence that User:ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ is not a competent enough bot editor to carry out this task properly. At the very least, if there isn't support to remove his bot access, I'd like to request that his bot task approvals make it extremely clear that he must not make more than one edit per page to fix these lint errors. —ScottyWong— 16:31, 2 March 2022 (UTC)
- Note the comment by @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ:
To clarify, the code I have to correct font tags (i.e general purpose regexes to correct font tags) and some other Lint errors works fine most of the time, but gives some false positives which makes it not suitable for use in a fully automated task like this. You can read the first BRFA and this discussion for why I do not use such code with my bot.
― Qwerfjkltalk 16:40, 2 March 2022 (UTC) - I don't see this as a problem, these are handled in waves and seem reasonable edits still. This bot appears to be properly asserting the "bot" flag on the edits, so anyone that doesn't want to see bot edits on their watchlist is able to easily filter them out. As these are appear to require manual definitions for signature fixes, it isn't reasonable to expect that every case would be per-identified. Now, if these were happening very rapidly, like 8 edits in a day perhaps we should address it better, but when that is spread out over many months per page I don't. — xaosflux Talk 16:44, 2 March 2022 (UTC)
- Note: this isn't an endorsement or condemnation on the suitability of the task(s) in general (which is being discussed above) - just that I don't see this specific constraint as a good solution. — xaosflux Talk 16:46, 2 March 2022 (UTC)
- I don't see any reason why it wouldn't be possible to fix these issues in an automated way without requiring multiple edits or manual intervention. Anyone with a decent understand of regular expressions can do this. This is not a complicated problem to solve for a competent coder. If this bot operator claims that he is not capable of fixing all the errors on a page in a single edit, or that his code is so inefficient that it produces "false positives" and requires him to manually supervise every edit, then I think we should find a different bot operator. I'll volunteer to take over the tasks if no one else wants to. FYI - the bot operator himself (not the bot) has now manually edited my old RfA page and has claimed to fix all of the lint errors on the page. —ScottyWong— 17:00, 2 March 2022 (UTC)
- @Xaosflux: It would be one thing if this bot was going through pages "in waves" and fixing different issues each time. That's not what's happening here. The bot is going to the same page to make multiple edits to fix different instances of the same issue. This is unnecessarily filling up watchlists, clogging up edit histories, and changing the last modified date of old archived pages, among other problems. If a page has 10 instances of the same exact lint error, there is no technical reason (besides poor coding) that it should take more than one edit to fix all 10 instances. I realize I'm probably being annoying by continuing to complain about this bot operator and the tasks he's carrying out, but it really is supremely annoying to me. —ScottyWong— 19:57, 2 March 2022 (UTC)
- The bot waves appear to be along the lines of "fix this batch of signatures" not "fix all instances of lint error:n" or even harder "fix all lint errors of all known types". I understand you don't like this, but I know the signature fixes can be a pain to deal with and multiple waves are often the best way to tackle them on an ad-hoc type basis. As far as the problems you identified, clogging watchlists is the most likely to get quick admin action - but as bot flags are being asserted it seems fairly trivial. I don't see any serious problems with last touch dates or a few extra revisions on any one page being a significant problem. Building a better bot is almost always universally welcomed, but stopping improvement while waiting for such to materialize usually isn't. — xaosflux Talk 22:58, 2 March 2022 (UTC)
- Exactly. We should not let perfect be the enemy of good. After all the wiki model works by the principle of incremental improvement. Before I submitted my first BRFA, I looked at all previous en.wp Lint fixing bot task attempts to see what is workable. The successful tasks involved a batch of specific patterns per BRFA, failed tasks involved bot operators trying to fix everything and giving up after realising the scale of problem. Running the bot in multiple waves using divide and conquer approach is only realistic way to reduce the backlog. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:28, 3 March 2022 (UTC)
- The bot waves appear to be along the lines of "fix this batch of signatures" not "fix all instances of lint error:n" or even harder "fix all lint errors of all known types". I understand you don't like this, but I know the signature fixes can be a pain to deal with and multiple waves are often the best way to tackle them on an ad-hoc type basis. As far as the problems you identified, clogging watchlists is the most likely to get quick admin action - but as bot flags are being asserted it seems fairly trivial. I don't see any serious problems with last touch dates or a few extra revisions on any one page being a significant problem. Building a better bot is almost always universally welcomed, but stopping improvement while waiting for such to materialize usually isn't. — xaosflux Talk 22:58, 2 March 2022 (UTC)
- @Xaosflux: It would be one thing if this bot was going through pages "in waves" and fixing different issues each time. That's not what's happening here. The bot is going to the same page to make multiple edits to fix different instances of the same issue. This is unnecessarily filling up watchlists, clogging up edit histories, and changing the last modified date of old archived pages, among other problems. If a page has 10 instances of the same exact lint error, there is no technical reason (besides poor coding) that it should take more than one edit to fix all 10 instances. I realize I'm probably being annoying by continuing to complain about this bot operator and the tasks he's carrying out, but it really is supremely annoying to me. —ScottyWong— 19:57, 2 March 2022 (UTC)
- I don't see any reason why it wouldn't be possible to fix these issues in an automated way without requiring multiple edits or manual intervention. Anyone with a decent understand of regular expressions can do this. This is not a complicated problem to solve for a competent coder. If this bot operator claims that he is not capable of fixing all the errors on a page in a single edit, or that his code is so inefficient that it produces "false positives" and requires him to manually supervise every edit, then I think we should find a different bot operator. I'll volunteer to take over the tasks if no one else wants to. FYI - the bot operator himself (not the bot) has now manually edited my old RfA page and has claimed to fix all of the lint errors on the page. —ScottyWong— 17:00, 2 March 2022 (UTC)
- Note: this isn't an endorsement or condemnation on the suitability of the task(s) in general (which is being discussed above) - just that I don't see this specific constraint as a good solution. — xaosflux Talk 16:46, 2 March 2022 (UTC)
- @Scottywong: I have manually fixed all 95 or so Lint errors in your RFA. Took me 15 minutes with script assisted editing. Do you really expect a bot to get them all in a single edit? It really isn't true that
there is no reason that they couldn't be made in a single edit, if the bot owner had any idea how to properly use regex and AWB
. I have more experience with bot fixing of Lint errors than anyone else, I am running the bot as efficiently as it is possible to do without leaving behind false positives. Even so, let me point out that the bot has fixed 1.4 million Lint errors since the time this thread was opened. RFA pages by their nature have a lot more signatures than most wiki pages, so the bot revisits it more than usual. This task is far more complicated than it looks. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 17:06, 2 March 2022 (UTC)- What you're saying makes no sense. I understand regex quite well, and I used to operate a bot on Wikipedia that fixed similar issues (it probably also fixed millions of individual issues, but it didn't need to make millions of edits to do so because it was properly coded). This is not more complicated than it looks, in fact, it's not complicated at all, it's simple regex find and replace. There is no technical reason why a properly coded bot cannot fix all of these lint issues in a page with a single edit, without human intervention or supervision. If your regex was properly designed, the risk of false positives would be extremely low. In the case of my RfA page, your bot made multiple edits to fix different instances of the exact same problem. In this edit, you fix six different instances of
<font color="something">Text</font>
. Then, a few weeks later, you make this edit to fix 14 more of the exact same issue. Why did your code miss these issues on the first pass (or, more specifically, on the first 8 passes)? —ScottyWong— 18:31, 2 March 2022 (UTC)- It was not fixing different instances of the exact same problem in your RFA page. Edits from task 2 was to fix Special:LintErrors/tidy-font-bug. Unlike the more numerous font tags inside a link, errors of this type already makes visual difference in the page. If you see all edits done in this task, the ones in line 109 and 134 would be difficult for a bot to apply correctly in a single edit with others in the same task, if it was targeting a general pattern. No matter how well designed any regexes are, they cannot catch all of these in a single edit. For RFAs and other pages with a lot of signatures, we can only reduce the number of bot edits by using a larger batch of replacements, which I am already doing. You should spend some time fixing Lint errors to get an understanding of the problem, you wouldn't be casually dismissing this as an easy task then. Please submit your own BRFA if you think it is so simple to fix all errors in a single edit. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:01, 3 March 2022 (UTC)
- Scottywong: Wikipedia:Village pump (technical)/Archive 70 is a sample page with approximately 180 Linter errors, nearly all of them font-tag-related. I encourage you to try to create a set of false-positive-free regular expressions that can fix every Linter font tag error on that page, and other VPT archive pages, with a single edit of each page. If you can do so, or even if you can reduce the count by 90% with a single edit, you will be a hero, and you may be able to help Wikipedia get out from under the burden of millions of Linter errors much more quickly. Here's a sample of what you'll be looking at:
- It was not fixing different instances of the exact same problem in your RFA page. Edits from task 2 was to fix Special:LintErrors/tidy-font-bug. Unlike the more numerous font tags inside a link, errors of this type already makes visual difference in the page. If you see all edits done in this task, the ones in line 109 and 134 would be difficult for a bot to apply correctly in a single edit with others in the same task, if it was targeting a general pattern. No matter how well designed any regexes are, they cannot catch all of these in a single edit. For RFAs and other pages with a lot of signatures, we can only reduce the number of bot edits by using a larger batch of replacements, which I am already doing. You should spend some time fixing Lint errors to get an understanding of the problem, you wouldn't be casually dismissing this as an easy task then. Please submit your own BRFA if you think it is so simple to fix all errors in a single edit. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:01, 3 March 2022 (UTC)
- What you're saying makes no sense. I understand regex quite well, and I used to operate a bot on Wikipedia that fixed similar issues (it probably also fixed millions of individual issues, but it didn't need to make millions of edits to do so because it was properly coded). This is not more complicated than it looks, in fact, it's not complicated at all, it's simple regex find and replace. There is no technical reason why a properly coded bot cannot fix all of these lint issues in a page with a single edit, without human intervention or supervision. If your regex was properly designed, the risk of false positives would be extremely low. In the case of my RfA page, your bot made multiple edits to fix different instances of the exact same problem. In this edit, you fix six different instances of
<font face="Century Gothic">[[User:Equazcion|<span style="color:#000080">'''Equazcion'''</span>]] <small>[[User talk:Equazcion|'''<sup>(<span style="color:#007BA7">talk</span>)</sup>''']]</small> 02:29, 16 Jan 2010 (UTC)</font> [[User:IP69.226.103.13|<font color="green"><strong>IP69.226.103.13</strong></font>]] | [[User talk:IP69.226.103.13|<font color="green"><strong>Talk about me.</strong></font>]] [[User:Terrillja|<font color="003300">Terrillja</font>]][[User Talk:Terrillja|<font color="black"><sub> talk</sub></font>]] [[User:December21st2012Freak|<font color="#922724">'''December21st2012Freak'''</font>]] <sup>[[user talk:December21st2012Freak|<font color="#008080">''Talk to me''</font>]]</sup> <font face="monospace" color="#004080">[[User:Flowerpotman|<span style="color:#004080; font-variant:small-caps">FlowerpotmaN</span>]]·([[User talk:Flowerpotman|t]])</font> <font face="Myriad Web">'''[[User:Mrschimpf|<span style="color:maroon">Nate</span>]]''' <span style="color:dark blue">•</span> <small>''([[User_talk:Mrschimpf|<span style="color:dodgerblue">chatter</span>]])''</small></font> <font face="Baskerville Old Face">[[User:the_ed17|<font color="800000">Ed</font>]] [[User talk:the_ed17|<font color="800000">(talk</font>]] • [[WP:OMT|<font color="800000">majestic titan)</font>]]</font> <font style="font-family: Vivaldi">[[User:Intelligentsium|<span style="color:#013220">Intelligent</span>]]'''[[User_talk:Intelligentsium|<span style="color:Black">sium</span>]]'''</font> <font color="blue"><sub>'''[[User_talk:Noetica |⊥]]'''</sub><sup>¡ɐɔıʇǝo</sup><big>N</big><small>oetica!</small></font><sup>[[User_talk:Noetica |T]]</sup> [[User:Ruhrfisch|Ruhrfisch]] '''[[User talk:Ruhrfisch|<sub><font color="green">><></font></sub><small>°</small><sup><small>°</small></sup>]]''' <font color="#A20846">╟─[[User:TreasuryTag|Treasury]][[User talk:TreasuryTag|Tag]]►[[Special:Contributions/TreasuryTag|<span style="cursor:help;">directorate</span>]]─╢</font> [[User:Screwball23|<font color="0000EE">Sc</font><font color="4169E1">r</font><font color="00B2EE">ew</font><font color="FF6600">ba</font><font color="FFFF00">ll</font><font color="9400D3">23</font>]] [[User talk:Screwball23|talk]] <font color="32CD32">''[[User:Jéské Couriano|Jeremy]]''</font> <font color="4682B4"><sup>([[User talk:Jéské Couriano|v^_^v]] [[Special:Contributions/Jéské Couriano|Boribori!]])</sup></font> [[User:Masem|M<font size="-3">ASEM</font>]] ([[User Talk:Masem|t]]) <span style="border:2px solid black;background:black;-webkit-border-radius:16px;-moz-border-radius:16px;color:white;width:20px;height:20px">([[user talk:Flyingidiot|<font color="white">ƒ''î''</font>]])</span><span style="position:relative;top:12px;left:-20px;">[[user:flyingidiot|<font color="black">»</font>]]</span> '''[[User:Floydian|<font color="#5A5AC5">ʄɭoʏɗiaɲ</font>]]''' <sup>[[User_talk:Floydian|<font color="#3AAA3A">τ</font>]]</sup> <sub>[[Special:Contributions/Floydian|<font color="#3AAA3A">¢</font>]]</sub>
- I omitted some easy ones. Note that the page should look the same (or better) when you are done compared to how it looks now. There are more types of Linter errors on that page, but if you can fix the font tags in signatures all at once in an automated or semi-automated fashion, that would be outstanding. I picked the above page at semi-random, knowing that VPT pages tend to have large numbers of interesting signatures; there are thousands of discussion pages with this level of complexity. – Jonesey95 (talk) 05:56, 3 March 2022 (UTC)
- I have to, I absolutely have to, point to my solution at WP:SIGRANT just because my signature will never become an HTML problem. Now I'm happy and will walk away silently. ~ ToBeFree (talk) 21:48, 31 March 2022 (UTC)
- I omitted some easy ones. Note that the page should look the same (or better) when you are done compared to how it looks now. There are more types of Linter errors on that page, but if you can fix the font tags in signatures all at once in an automated or semi-automated fashion, that would be outstanding. I picked the above page at semi-random, knowing that VPT pages tend to have large numbers of interesting signatures; there are thousands of discussion pages with this level of complexity. – Jonesey95 (talk) 05:56, 3 March 2022 (UTC)
- Most days, most of the edits on my watchlist are made by this bot. I'd love it if its editing rate could be throttled back a bit, guys.—S Marshall T/C 08:51, 1 April 2022 (UTC)
- Its not due to the edit rate per se, but due to the sheer number of pages involved. If I were to run this to the fullest possible extent of clearing most of remaining 13.4 million Lint errors, this would be one of the, if not The, largest bot task in en.wp's history. Older pages are more likely to be edited than newer pages. The more pages you have in your watchlist, the more bot edits you will see. This can be easily avoided by hiding bot edits from watchlist. If you don't want to hide all bot edits, you can specifically hide MalnadachBot's edits by following WP:HIDEBOTS. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 09:40, 1 April 2022 (UTC)
- No, I'd like to check the bot edits as well, including this bot. What I'd prefer is if the bot could make fewer edits per day please.—S Marshall T/C 10:47, 1 April 2022 (UTC)
- It seems like cutting down on the edit rate would be an acceptable way to move forward. Enterprisey (talk!) 19:52, 1 April 2022 (UTC)
- I agree. It's obviously problematic if everybody has to opt-out of the bot's edits showing up on people's watchlists, rather than opt-in. WaltCip-(talk) 20:10, 1 April 2022 (UTC)
- I think bot edits are hidden by default. ― Qwerfjkltalk 20:34, 1 April 2022 (UTC)
- @S Marshall and @Scottywong, I'm assuming you have that setting (to hide bot edits) turned off to keep better track of the articles or something? Enterprisey (talk!) 01:35, 2 April 2022 (UTC)
- I've been able to see bot edits since ages ago. If it's on by default then I presume I turned something off early in my Wikipedia editing. I wish to continue reviewing bot edits because I think it's perfectly possible that a vandal could get access to a bot account.—S Marshall T/C 08:44, 2 April 2022 (UTC)
- My thinking has been that bot edits populating watchlists would be the main reason why people would complain about this task. Since most of the edits are to pages that haven't been edited in a while and very few of the edits are in mainspace, MalnadachBot's edits overlapping potential recent vandal edits in watchlists is very low. So I figured it is better to get this over with faster rather than continue showing up in watchlists over a longer period of time. If a slower speed is desired and spread it over a longer time, I will certainly do that. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 17:20, 2 April 2022 (UTC)
- While that's reasonable, I also think it's reasonable for bots to continue normal operations. If people manually show bot edits on watchlists, they have to accept their watchlists will become far more spammy and be filled with routine bot-like edits as well. I don't really think we can require the throttling of an otherwise acceptable task for this reason. ProcrastinatingReader (talk) 17:53, 2 April 2022 (UTC)
- Yup, that makes sense. Enterprisey (talk!) 19:34, 2 April 2022 (UTC)
- Could we please just reduce the editing rate of this bot? Thanks.—S Marshall T/C 00:23, 3 April 2022 (UTC)
- How about just having pipelines in the code, in which each stage of your edits are done within the code before submitting the final edit. You can still use multiple expressions, which each expression operating on changed text, before submitting it all as one edit.
Popular web browsers may down support for tt tags at some point in future
will not happen and even if it does we can use older browsers to access the content (the pages in question are 10+ years old anyway and are not being accessed on a regular basis). After all, core parts of Wikipedia rely on sources that aren't even online in the first place (newspapers, etc..). Rlink2 (talk) 02:38, 3 April 2022 (UTC)- It checks for everything I feed in a batch before saving it as a single edit.
<tt>...</tt>
tags already do not work in Mobile Wikipedia. We shouldn't be asking people to use older browsers to access content but be making sure that content is accessible in modern browsers. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 18:05, 3 April 2022 (UTC)- 12 year old AfDs and RfAs are not valuable "content" that need to be made highly available to anyone. Virtually no one views these kinds of pages, and the few that do wouldn't care about minor font rendering errors on them. The vast majority of this bot's work is not worth the disruption it's causing to the project. This bot should focus on fixing lint errors in articles, not long-dead pages that are deeply buried in the WP namespace and other project namespaces. —ScottyWong— 21:12, 4 April 2022 (UTC)
- It checks for everything I feed in a batch before saving it as a single edit.
- Could we please just reduce the editing rate of this bot? Thanks.—S Marshall T/C 00:23, 3 April 2022 (UTC)
- Yup, that makes sense. Enterprisey (talk!) 19:34, 2 April 2022 (UTC)
- I've been able to see bot edits since ages ago. If it's on by default then I presume I turned something off early in my Wikipedia editing. I wish to continue reviewing bot edits because I think it's perfectly possible that a vandal could get access to a bot account.—S Marshall T/C 08:44, 2 April 2022 (UTC)
- @S Marshall and @Scottywong, I'm assuming you have that setting (to hide bot edits) turned off to keep better track of the articles or something? Enterprisey (talk!) 01:35, 2 April 2022 (UTC)
- I think bot edits are hidden by default. ― Qwerfjkltalk 20:34, 1 April 2022 (UTC)
- I agree. It's obviously problematic if everybody has to opt-out of the bot's edits showing up on people's watchlists, rather than opt-in. WaltCip-(talk) 20:10, 1 April 2022 (UTC)
- It seems like cutting down on the edit rate would be an acceptable way to move forward. Enterprisey (talk!) 19:52, 1 April 2022 (UTC)
- No, I'd like to check the bot edits as well, including this bot. What I'd prefer is if the bot could make fewer edits per day please.—S Marshall T/C 10:47, 1 April 2022 (UTC)
- Its not due to the edit rate per se, but due to the sheer number of pages involved. If I were to run this to the fullest possible extent of clearing most of remaining 13.4 million Lint errors, this would be one of the, if not The, largest bot task in en.wp's history. Older pages are more likely to be edited than newer pages. The more pages you have in your watchlist, the more bot edits you will see. This can be easily avoided by hiding bot edits from watchlist. If you don't want to hide all bot edits, you can specifically hide MalnadachBot's edits by following WP:HIDEBOTS. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 09:40, 1 April 2022 (UTC)
Issue with HBC AIV helperbot5 erroneously removing reports
I have encountered an issue with HBC AIV helperbot5 while reporting IP vandals. The bot will remove reports from AIV if the reported vandal is the subject of a partial rangeblock. This is problematic in cases where an IP is vandalizing pages from which they are not blocked. See this diff as an example. Note that the range 2804:14D:BAD5:826C:0:0:0:0/64 (talk · contribs · WHOIS) is only partially blocked as of my writing this. I have attempted to contact the bot's handler, JamesR, on this matter, but they appear to be inactive as they have not edited in several months. I wasn't sure where else to raise this issue. I hope that a solution can be found for this.TornadoLGS (talk) 00:55, 5 April 2022 (UTC)
AN/I thread
If anyone's available to take a look at this situation, it would be appreciated. ~Swarm~ {sting} 11:05, 16 April 2022 (UTC)
Shortcuts for Wikipedia:Bots
Currently, the info page Wikipedia:Bots lists three shortcuts at the top: WP:BOT, WP:BOTS and WP:B. The last one is ambiguous, with very few correct existing uses, so I've proposed to turn it into a disambiguation page (see this RfD). This leaves us with the other two. Should we keep listing both in the linkbox, or pick one for simplicity? WP:BOT is by far the most widely used (with tens of thousands of incoming links, compared without 2,200 for WP:BOTS). – Uanfala (talk) 13:00, 21 April 2022 (UTC)
- 2000 incoming links? I think leaving both seems reasonable. –Novem Linguae (talk) 13:40, 21 April 2022 (UTC)
- Wikipedia has a rough standard for singular vs. plural in mainspace, but in this case "BOT" would refer to a singular bot, when it's often multiple, so a source of confusion worth disambiguating. Also 7,282 for WP:BOT unless I am missing something. And the page primary name is actually Wikipedia:Bots. -- GreenC 13:47, 21 April 2022 (UTC)
- Ah, I didn't know about the link count tool! I used "What links here" and saw there were at least several batches of 5,000 [5]. That's strange, why do we get different results there? – Uanfala (talk) 14:08, 21 April 2022 (UTC)
- Sorry my fault. 93k is the correct answer. The 7k is for "Wikipedia:Bot" which is a different page. -- GreenC 15:26, 21 April 2022 (UTC)
- It looks like a lot of the "WP:BOT" links are on user talk pages, and lot of those were generated automatically by DASHBot years ago as part of a notification system. Thus I'm not sure we can say from raw counts alone, because DASHBot was a single person decision to use BOT vs BOTS that had an outsized impact. -- GreenC 15:33, 21 April 2022 (UTC)
- Sorry my fault. 93k is the correct answer. The 7k is for "Wikipedia:Bot" which is a different page. -- GreenC 15:26, 21 April 2022 (UTC)
- Ah, I didn't know about the link count tool! I used "What links here" and saw there were at least several batches of 5,000 [5]. That's strange, why do we get different results there? – Uanfala (talk) 14:08, 21 April 2022 (UTC)
- Shortcuts for a page that has 4 characters in its primary name is kind of silly. :) Izno (talk) 16:45, 21 April 2022 (UTC)
- Ah, that's silly, yes :) And there I was initiating an important deliberation over the important question whether the 3-letter shortcut is better than the 4-letter one. – Uanfala (talk) 20:58, 21 April 2022 (UTC)
- I note WP:B was added to the notice at the top of the page unilaterally a few weeks ago. I thought it was pointless at the time (since WP:B could refer to so many other things) but I didn't feel like being the one to revert it at the time. In light of the discussion now about retargeting WP:B, I'm going to go ahead with that revert. Anomie⚔ 22:29, 21 April 2022 (UTC)
Project page move with potential functional repercussions
Per consensus in the move request for the page, I have moved Wikipedia:New pages patrol/Redirect whitelist to Wikipedia:New pages patrol/Redirect autopatrol list. I anticipate that it is possible that there may be bots or other tools that rely on the contents at the former title, and encourage anyone maintaining such properties to update them accordingly. BD2412 T 06:08, 23 April 2022 (UTC)
- @DannyS712: * Pppery * it has begun... 14:33, 23 April 2022 (UTC)
- MusikAnimal might be interested in updating User:MusikAnimal/userRightsManager.js#L-282. –Novem Linguae (talk) 15:58, 23 April 2022 (UTC)
- Thanks for the ping @Pppery. @BD2412 I had left a note in the discussion,
I propose that, if this RM is successful, instead of an admin moving the page directly, when I might not be around, the protection temporarily be lowered (to template editor) so that I can move and update the page at the same time as the bot updates. We can make it clear that the protection is temporary and only for this single purpose
but I guess that was overlooked. I'll update my code now DannyS712 (talk) 22:01, 23 April 2022 (UTC)- Overlooked? I left notices on three different project pages. BD2412 T 01:28, 24 April 2022 (UTC)
- I've updated the userRightsManager script. Thanks for the ping. — MusikAnimal talk 16:45, 25 April 2022 (UTC)
- Thanks for the ping @Pppery. @BD2412 I had left a note in the discussion,
DumbBOT
I'm hoping that someone can help me with a small problem with DumbBot and it's categorizing maintenance categories.
The daily Orphaned non-free use Wikipedia files categories (like Category:Orphaned non-free use Wikipedia files as of 29 April 2022) should be placed in the Category:Orphaned non-free use Wikipedia files category but, for some reason, after April 29th, DumbBOT began placing them in Category:All orphaned non-free use Wikipedia files category. The Image Deletion categories are organized pretty consistently the same so this is out of the ordinary and I'm not sure what caused the change in categorization last week. This is the only daily Image Deletion category that was altered of the 9 category areas that are created for daily review. It doesn't look like bot operator User:Tizio is active but I was wondering if someone who was familiar with the bot could give this a look. It's a small glitch, not a huge problem but I thought I'd bring it up here in case anyone knows of a solution or why the categorization would suddenly change. Many thanks. Liz Read! Talk! 21:12, 5 May 2022 (UTC)
- Fixed in Template:Orphaned non-free use subcat starter * Pppery * it has begun... 21:16, 5 May 2022 (UTC)
- Wow, that was fast. Thank you, * Pppery *. Liz Read! Talk! 21:58, 5 May 2022 (UTC)
Bot that fixes links to nonexistent category pages?
Just wondering: Is there a bot that currently performs edits related to nonexistent category pages, such as removing the links from articles or creating the category? (Preferably the former?) Steel1943 (talk) 18:35, 6 May 2022 (UTC)
- @Steel1943: The maintenance list Special:WantedCategories is typically very, very short these days; I think this is done by a few of our category specialists. I don't know whether they do this mostly by hand, but I wouldn't be surprised: you need to triage whether this is the result of a typo or vandal edit, a category deletion where articles or templates were not adjusted properly, or shows an actual need for the redlinked category. —Kusma (talk) 20:45, 11 May 2022 (UTC)
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
- ProcseeBot (BRFA · contribs · actions log · block log · flag log · user rights)
- ProcseeBot (talk · contribs · blocks · protections · deletions · page moves · rights · RfA)
It was raised at WP:BN (diff) that ProcseeBot has not performed any logged actions (i.e. blocks) since November 2020 (log). Given that the bot is not high-profile I'm not really surprised that its inactivity managed to pass under the radar of probably everyone except xaosflux, since they've been removing the bot's name from the inactive admins report for a while. That being said, Slakr seems to have become somewhat inactive as of late, and pppery has suggested the bot be stripped of its rights. Since its activity is primarily a bot-related task and not an admin-related task, I'm bringing it here for review. I have left Slakr a talk page note about this discussion. Primefac (talk) 07:29, 5 May 2022 (UTC)
- I feel like for security reasons we can probably apply the usual activity requirements to just the bot (rather than including if the operator is active). If an adminbot hasn't logged an admin action for a year it probably shouldn't be flagged as such and a crat can always reflag if it ever needs to be active again. Galobtter (pingó mió) 07:43, 5 May 2022 (UTC)
- @Primefac I did contact Slakr about this a few months ago (User_talk:Slakr/Archive_26#ProcseeBot); where they indicated it may be reactivated, thus why I have been skipping it during removals (as its admin access is an extension of its operators who is still an admin). So policy wise, think we are fine. Shifting off my 'crat hat and putting on my BAG hat - yes I think we should deflag inactive adminbots; their operators can always ask at BN to reinstate so long as the bot hasn't been deauthorized. — xaosflux Talk 09:36, 5 May 2022 (UTC)
- I did figure you contacted them, and from a BAG perspective "it might be reactivated soon" is always good enough to leave things be. Shifting to my own 'crat hat, though, a temporary desysop until it's back up and running is reasonable, especially since it's been 1.5 years. Courtesy ping to ST47 who runs ST47ProxyBot. Primefac (talk) 09:47, 5 May 2022 (UTC)
- I don't think we should make a redline rule on this, and that if these rare cases arise a BOTN discussion like this is the best way to deal with things. In this case, baring a response from the operator within a week, that this is going to be activated in the month, my position is that we should desysop the bot. — xaosflux Talk 09:53, 5 May 2022 (UTC)
- For the record I never intended this as any sort of rule-creating; we're discussing a singular bot. Primefac (talk) 10:06, 5 May 2022 (UTC)
- I think removing advanced perms from inactive bots is a good idea, and allowing them to be returned on-request if the botop wants to reactivate the bot (as long as the approval is still valid). ProcrastinatingReader (talk) 09:56, 5 May 2022 (UTC)
- So, does anyone intend to implement the unanimous agreement here? * Pppery * it has begun... 19:13, 11 May 2022 (UTC)
- I'll ask over at BN. — xaosflux Talk 19:18, 11 May 2022 (UTC)
- WP:BN request opened. — xaosflux Talk 19:22, 11 May 2022 (UTC)
FireflyBot is not running
User:FireflyBot has stopped. It last ran at about 1100 GMT on 2 June, and notified editors that their drafts had been untouched for 5 months. It also isn't updating the DRN case status. I left a note on the talk page of User:Firefly, who seems to be busy blocking spammers. Robert McClenon (talk) 15:12, 6 June 2022 (UTC)
- Hi @Robert McClenon the only person who can make that bot start would be its operator. You could drop a request at WP:BOTREQ to see if someone else would like to spin up a clone and take over some of those tasks. — xaosflux Talk 15:34, 6 June 2022 (UTC)
- The bot has returned from its six-day vacation. Robert McClenon (talk) 15:18, 8 June 2022 (UTC)
AnomieBOT
Keeps alerting me to the same years old problem, something about Genetic history of the Middle East. User:Primefac has been trying to fix it but it keeps going. Doug Weller talk 11:39, 9 June 2022 (UTC)
- Looks like Primefac was a little confused as to which ref was the problem, the bot was complaining about "pmid14748828" rather than ":3". Although this was also something of an odd edge case, in Special:Diff/534227170 you completely removed the reference but then it was re-added as an orphan by WikiUser4020 in Special:Diff/1092031894. Since it did exist in the article's history at one point, AnomieBOT pointed to its removal back then as the removal. Anomie⚔ 12:34, 9 June 2022 (UTC)
- Thanks. Doug Weller talk 13:44, 9 June 2022 (UTC)
- I swear I'm not going blind... I checked that page three times; the first time (removal of bot notice 1) there were no issues, then only ":3" (removal 2), and I swear the pmid reference wasn't flagged when I did the second check... thanks for the followup and apologies for the hassle. Primefac (talk) 08:23, 10 June 2022 (UTC)
- Thanks. Doug Weller talk 13:44, 9 June 2022 (UTC)
Controversy About Report Being Generated by Bot
There is a deletion discussion at MFD which is really a bot issue. A bot is generating a report that appears to be a hierarchical list of deleted categories. Another editor has requested that the list be deleted, as an evasion of deletion policy.
- User:Qwerfjkl/preservedCategories (edit | talk | history | links | watch | logs)
- Qwerfjkl (bot) (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- Wikipedia:Miscellany for deletion/User:Qwerfjkl/preservedCategories (edit | talk | history | links | watch | logs)
User:Qwerfjkl, who operates the bot and coded the task to generate the list, says that this has been coordinated with User:Enterprisey. User:Pppery says that the list should be deleted. I haven't studied the issue to have an opinion on whether the list should continue to be generated, or whether the bot task that generates the list should be turned off. However, I don't think that MFD is an appropriate forum to decide whether the bot should be generating the list. If the list is deleted, then the bot will generate a new version of the list, and Pppery says that they will tag the new version of the list as G4. Then maybe after that is done twice, the title may be salted, and the bot may crash trying to generate it. That doesn't seem like the way to solve a design issue. The question is whether there is a need for the bot to be producing the list. If so, leave it alone. If not, stop the bot. If this isn't the right forum either, please let us know where is the right forum, because it is my opinion that MFD is not the right forum. Robert McClenon (talk) 23:37, 17 May 2022 (UTC)
- Per Wikipedia:Bot policy,
if you are concerned that a bot no longer has consensus for its task, you may formally appeal or ask for re-examination of a bot's approval.
The policy links to this noticeboard for initiating an appeal discussion. I see BOTN as the appropriate venue for this. 🐶 EpicPupper (he/him | talk) 23:44, 17 May 2022 (UTC)- My first inclination was to agree, but tasks that run under the policy exemption, as this one does, seem to be outside BAG's purview. As a practical matter, I think it's better for the community to directly decide (in a non-BON area) whether the task enjoys consensus. Even in bot appeals it helps to have the result of a relevant consensus process on the task (usually RfC). Userspace tasks may not require pre-approval, but as with any editing behaviour I think consensus is still able to put a halt to it if people find it to be problematic. ProcrastinatingReader (talk) 01:04, 18 May 2022 (UTC)
- The bot appears to be operating under WP:BOTUSERSPACE. There's no approval to review. Whether a BAG member was involved in the discussion that led to the creation of the bot has no weight. I'm not sure whether MFD is the right forum (versus say reopening the VPR discussion that led to the task in the first place), but it's better than here. Anomie⚔ 01:59, 18 May 2022 (UTC)
- MFD is a silly forum in which to discuss a bot task. A Delete would mean to throw away the output from the bot, rather than to stop the bot task as such. If the editors here think that Bot noticeboard is also the wrong forum, then maybe the bot should be allowed to continue to generate the list.
- I started out not having an opinion, and now have an opinion that the MFD is misguided.
- Thank you for your comments. Robert McClenon (talk) 02:31, 19 May 2022 (UTC)
- If MFD decides that the content shouldn't exist, then WP:G4 would apply and admins would be justified in taking appropriate action to prevent the bot from recreating it. The oddness comes from whether MFD is the appropriate forum for overriding the original VPR discussion. Anomie⚔ 11:48, 19 May 2022 (UTC)
- @Anomie, I'm willing to shutdown the bot if consensus is against it. ― Qwerfjkltalk 16:05, 19 May 2022 (UTC)
- If MFD decides that the content shouldn't exist, then WP:G4 would apply and admins would be justified in taking appropriate action to prevent the bot from recreating it. The oddness comes from whether MFD is the appropriate forum for overriding the original VPR discussion. Anomie⚔ 11:48, 19 May 2022 (UTC)
- On first pass, so long as this is low volume it doesn't seem to be in direct violation of the bot policy as it is in userspace. That doesn't mean that it is appropriate, or that it isn't disruptive. Would like to hear some feedback from the operator. — xaosflux Talk 14:09, 19 May 2022 (UTC)
- Operator notified. — xaosflux Talk 14:10, 19 May 2022 (UTC)
- I'm not sure what the best venue to deal with this is, but my initial feeling is that this is a bad idea, mostly because the bot keeps making pages that it seems noone is reading, then requesting that the same page be deleted - making needless work for admins who have to constantly clean up after it. — xaosflux Talk 14:14, 19 May 2022 (UTC)
- Can someone point to the VPR discussion that is being mentioned above? — xaosflux Talk 14:16, 19 May 2022 (UTC)
- OK, seems this is Wikipedia:Village_pump_(proposals)/Archive_187#Automatically_generate_a_record_of_the_contents_of_deleted_categories_at_the_time_of_deletion - which I don't really see as representative of any strong community consensus - seems like it just sort of died out. — xaosflux Talk 14:20, 19 May 2022 (UTC)
- Comment - This seems to be getting more complicated. However, having a bot generate a report that needs to be deleted without being used sounds like a bad idea. Robert McClenon (talk) 19:47, 19 May 2022 (UTC)
- Thanks for the link to the Village Pump discussion, that makes it much clearer what this is about. "Listify and delete" outcomes in CfD discussions are rare to begin with. But, if that is the outcome, the category is kept until listification has really taken place. However, it may happen that the list is initially created but deleted later e.g. because sources were not provided. That very rare problem could be solved in a different way if (in case of a "listify and delete" outcome) closers of CfD discussions would list the category content on the talk page belonging to the discussion page. So we can stop the bot without any harm. Marcocapelle (talk) 21:12, 19 May 2022 (UTC)
- @Fayenatic london, Bibliomaniac15, and Explicit: pinging some administrators involved in CfD closing. Marcocapelle (talk) 21:15, 19 May 2022 (UTC)
MalnadachBot and watchlists
Hey folks, This has been brought up to the bot operator a number of times (e.g.[6]), but responses have been largely unhelpful. As such, I'd like to ask that the MalnadachBot be halted or its impacts on watchlists be removed. I understand it is trying to clean up pages for all sorts of reasons (e.g. accessibility) but it is a huge mess. I've had days were my watchlist in the last 24 hours was 50% from this bot. And there are pages where this bot is the only editor and has somehow found a dozen or more times where it needed to edit the page. See [7] for such a page.
We don't allow cosmetic edits for a reason. And I understand these aren't purely cosmetic, but the same issues apply. Do we really need signatures to be cleaned up in old RfAs? Its impact on watchlists is a molehill, but a really annoying one. I don't want to remove bot edits from my watchlist. And I do want to watchlist the pages I have listed. And yes, there are ways I could manage a script to remove this bot from my watchlist, but really that's not a good way forward--it only solves the problem for me, not everyone else. Perhaps we could have this particular bot not impact watchlists (basically a default script for everyone?). Or we could just halt it. Something, please. Hobit (talk) 22:44, 5 June 2022 (UTC)
- @Hobit this bot appears to be asserting the bot flag, I suggest you enable "bots" under the "hide" options on watchlist, that way you won't be bothered with such things (this is what fixes it for "everyone else" - mostly already). — xaosflux Talk 23:17, 5 June 2022 (UTC)
- Thanks Xaosflux. Above I said: "I don't want to remove bot edits from my watchlist." Yes, I could do this. But I've seen bot edits that have been problematic in the past. And I don't think ignoring bot edits is the default, so I'm not at all sure that fixes things for many, let alone most. I think you all are significantly underestimating the pain associated with this for folks. User retention is one of our primary issues and I suspect this hurts in its own small way. Hobit (talk) 23:54, 5 June 2022 (UTC)
- A "default script for everyone" is the MediaWiki feature that defines the user group "bot". Something in common.js has the same probability of being enabled as a dog has to fly.
Yes, I could do this. But I've seen bot edits that have been problematic in the past
Great! I would reccomend you keep on reading MalnadachBot's edits, then; if you find one of the fixesproblematic
, you can raise that specific task at BOTN (here). Cheers! 🐶 EpicPupper (he/him | talk) 00:10, 6 June 2022 (UTC) - Not sure what else to say here. That you want to see bot edits, but don't like these ones is your own preference. As far as halting, I'm not seeing anything technically wrong here - but if there is a consensus that this class of edits is now inappropriate, we could certainly stop this bot (and all others that do similar updates). WP:VPM may be a good place to have or advertise for such a discussion. — xaosflux Talk 00:14, 6 June 2022 (UTC)
- After 3.7 million edits and fixing over 10 million Lint errros, I have yet to have anyone bring up actual erroneous edits or malfunctions by the bot. That's because I am running it in the safest mode possible.
it only solves the problem for me, not everyone else
perhaps thats because most people dont have a problem with it and use the options available. What actually hurts user retention, or prevents people from becoming users in the first place, is knowingly letting accessibility problems lie in our pages just because some people dont want to customise their watchlists. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 06:29, 6 June 2022 (UTC)
- A "default script for everyone" is the MediaWiki feature that defines the user group "bot". Something in common.js has the same probability of being enabled as a dog has to fly.
- Thanks Xaosflux. Above I said: "I don't want to remove bot edits from my watchlist." Yes, I could do this. But I've seen bot edits that have been problematic in the past. And I don't think ignoring bot edits is the default, so I'm not at all sure that fixes things for many, let alone most. I think you all are significantly underestimating the pain associated with this for folks. User retention is one of our primary issues and I suspect this hurts in its own small way. Hobit (talk) 23:54, 5 June 2022 (UTC)
- Tbh my issue with the bot is that it makes 5-10 edits to the same page to fix a single issue like deprecated <font> tags when if it actually parsed the page instead of fixing one signature at a time using regexes it should be able to fix all of them at the same time, properly. I'd probably support stopping the bot for very low-priority issues like <font> tags in signatures (which really aren't much of an issue); we can fix them when someone writes a bot to do it properly in one edit. (I basically agree with Legoktm's comments here).
- Most of MalnadachBot's edits seem to be about <font> tags, so I think a narrowly tailored RfC on just stopping edits that fix <font> tags if they don't fix all the errors on the page, would stop a lot of the watch-list spam but allow real lint fixes to be done. Galobtter (pingó mió) 01:00, 6 June 2022 (UTC)
- One misunderstanding here is that the bot is not just fixing "a single issue like deprecated <font> tags", which makes the job at hand seem trivial. There is a useful thread about this bot task in the archive, in which I listed a selection from the wide variety of errors that this bot is attempting to fix. I put out an open call for assistance to editors who think that this bot can perform better:
Wikipedia:Village pump (technical)/Archive 70 is a sample page with approximately 180 Linter errors, nearly all of them font-tag-related. I encourage you to try to create a set of false-positive-free regular expressions that can fix every Linter font tag error on that page, and other VPT archive pages, with a single edit of each page. If you can do so, or even if you can reduce the count by 90% with a single edit, you will be a hero, and you may be able to help Wikipedia get out from under the burden of millions of Linter errors much more quickly.
Nobody has taken up the challenge, so it is still open. Meanwhile, the bot has edited that VPT archive page five times since that discussion, reducing the error count by about 25%. – Jonesey95 (talk) 03:48, 6 June 2022 (UTC)- To be clear I understand the bot is fixing a lot of issues - this is precisely why I'm only talking about limiting the bot on the specific issue of the obsolete font tag, which from looking at the bot edits is what most of the edits are about.
- Re "false-positive-free regular expressions" - that's precisely what I'm talking about is the issue. Regular expressions (in a very provable, mathematical sense - see [8]) cannot handle HTML, which is why there's so much issue with false positives. But a bot using an actual wikitext parser should be able to do a much better job. Galobtter (pingó mió) 04:00, 6 June 2022 (UTC)
- The bot operator has, multiple times, offered to accept help with programming a bot that does a better job. As far as I know, nobody has been willing to put in the work to make those improvements. This bot, with apologies to Winston Churchill, is currently the worst way to fix these millions of errors except for all of the other ways that have been tried thus far. – Jonesey95 (talk) 04:28, 6 June 2022 (UTC)
- Are we certain the errors need to be fixed? And if so, is there any rush? I'm struggling to understand why we have a dozen edits made to old RfAs. Perhaps we could either not fix them or agree to not have it edit a page more than once a year and fix all those errors at once? Hobit (talk) 11:31, 6 June 2022 (UTC)
<font>...</font>
and some other tags have been deprecated, so yes, they need to be fixed, according to the Wikimedia developers, who set up this error tracking. No, there is no rush; a small group of editors has been working on these millions of errors for almost four years, and there are still more than 11 million errors left to fix. There is also no good reason to slow down; I wish the errors could be fixed more quickly, honestly, hence my plea for help in developing better code for the bot. Re fixing all the errors at once, please read what I posted and linked to above. – Jonesey95 (talk) 16:06, 6 June 2022 (UTC)- @Jonesey95: This argument is a red herring. Yes, it would be challenging (although not impossible) to design a single regular expression that fixes all of the various linter errors in one go. But that is not necessary to address the editing behavior issues with this bot. You could have 100 separate individual regular expressions, each of which fixes a single issue, and that would be far less challenging to develop. However, that doesn't mean that you'd need to make 100 individual edits in order to fix those issues. Any minimally sophisticated bot could grab the content of a page, run in through the first regex, then take the result and pass it through the second regex, then take the result and pass it through the third regex ... then take the result and pass it through the 100th regex, and then make a single edit to the page. Instead, this bot operator insists on making upwards of 20 edits to a single page. There is no insurmountable technical hurdle that's in the way of fixing these issues with one edit per page. In other words, if you've written the code that can reliably fix all of the 180 linter errors on your example page by making 20 edits, then it would be trivial to use that code to fix all 180 linter errors in a single edit. Either this bot operator isn't sophisticated enough to do that, or doesn't want to put in the additional effort to reduce their edit count. —ScottyWong— 17:24, 6 June 2022 (UTC)
- Scottywong says that it is possible but still does not offer code to improve the bot. Please see the list of signatures I provided in the previous discussion, and develop a set of regexes to fix a decent fraction of them in one edit. If someone here can provide a false-positive-free regex or set of regexes that works as described above, I expect that the bot operator will be willing to include them in their code. – Jonesey95 (talk) 18:05, 6 June 2022 (UTC)
- I'm not going to spend hours developing code just to prove you wrong. Are you saying that it's technically impossible to combine multiple fixes into a single edit? Again, if you have the code to fix 20 different issues in 20 different edits, then you have the code to fix those 20 issues in a single edit. This shouldn't be difficult to understand. I could write an entire article in 10,000 edits by adding one word per edit, or I could write the same exact article in a single edit. The problem isn't with the bot's ability to fix the linter errors; the problem is that it makes far too many edits per page to fix those errors, and there is no legitimate reason that multiple edits are required. —ScottyWong— 01:40, 7 June 2022 (UTC)
- This was explained further down the page, but your response there indicated that it is possible you didn't understand, so I will make an attempt to address the
technically impossible to combine multiple fixes into a single edit
andno legitimate reason that multiple edits are required
concerns. Let's step through a hypothetical example:- A page has 100 linter errors
- The bot is coded so that it can fix all 100 in a single edit (be that regex, straight find/replace, etc, who cares)
- The bot/operator goes to a second page; it contains 5 errors from page 1 and 95 new errors. The bot operator does not notice this, and the page is edited.
- The bot operator now codes in regex/find/replace for those 95 other errors, and the process continues. The page is edited a second time.
- Rinse, repeat
- Now, take this scenario and scale it up to 1mil pages with errors - it's a branching tree with thousands of possibilities and iterations; not every page will have an error that has already been accounted for. So no, it is not "technically impossible" to code this, but mandating a single edit to every page to fix these basically reduces it to either a) the bot operator needs to manually edit every page, or b) has to make a list of every lint error across the entirety of Wikipedia (i.e. 11 million pages affected) before running the bot. Since, as Jonesey mentions below, no one has stepped up to help out with either of those issues, we are at an impasse - either the bot works slightly-more-effectively than a human editor (and faster) but requires multiple passes to get rid of errors, or someone helps out, or we throw our hands up in the air and do nothing. In short, it's not technically impossible, but a practical impossibility.
- Now, I am certainly aware of the concerns being brought here (I too have had to scroll through these bot edits on my watchlist and try to visually filter them out) but I do agree with the other bot operators that "I'm annoyed because 10-year-old AFDs that are <for some reason> still on my watchlist are being edited" is a smaller concern than fixing these errors.
- That being said, I definitely agree that 50 edits to a page to fix errors is a bit much, but I also find it troubling that the solution to the issue has been stated and requested multiple times with no results. That means we're stuck with their "inefficient fixes" or "no fixes", and it is a difficult sell to accept the latter as the path forward. Primefac (talk) 08:48, 10 June 2022 (UTC)
The bot operator does not notice this, and the page is edited.
That's easy enough to handle if one edit per page is the goal: "Bot detects there are unfixed lint errors, logs the situation for the operator's attention, and does not make an edit." Then when the operator implements the other 95 fixes, the bot retries, and makes the single edit desired. Anomie⚔ 11:02, 10 June 2022 (UTC)- That is a fair point, but then we're at a situation where the bot operator could potentially need to iterate through and pre-fix every error before ever being able to edit any pages (I am basing this on the hypothetical situation where every new page has at least one new lint error). This gets us onto the unfavourable possibility (b) I mentioned above, which we have not managed yet. Primefac (talk) 11:43, 10 June 2022 (UTC)
- Wouldn't have happened even if it was doing that from the start. The bot's edited some 2.8 million pages, and made only a single edit to two and a quarter million of them. (I haven't looked to see whether the bot's been making unrelated changes, but if so, it's reasonable to assume a similar ratio holds for the font tags it's been handling so poorly.) Some of those probably had font tags after the bot's edit, but by the same token, at least as many of the pages it's made second or third or whatever edits to have been responsible for removing the last font tag. The quarter of a percent of pages where it's had to make ten or more edits could easily have been left to human editors. Or to a botop who didn't need separate cases for
<font color="blue">
and<font color="purple">
. —Cryptic 12:45, 10 June 2022 (UTC)
- Wouldn't have happened even if it was doing that from the start. The bot's edited some 2.8 million pages, and made only a single edit to two and a quarter million of them. (I haven't looked to see whether the bot's been making unrelated changes, but if so, it's reasonable to assume a similar ratio holds for the font tags it's been handling so poorly.) Some of those probably had font tags after the bot's edit, but by the same token, at least as many of the pages it's made second or third or whatever edits to have been responsible for removing the last font tag. The quarter of a percent of pages where it's had to make ten or more edits could easily have been left to human editors. Or to a botop who didn't need separate cases for
- Well my aim has been to fix as many Lint errors as possible. If you see the progression here and here, and compare it with the snails pace of progress before, I would say it has been very successful in that regard. I knew the limitations of my bot and did the job to the best of my ability when no other way seemed possible. I am glad Galobtter is working on a better bot to replace font tags (which is what was most problematic and my bot is inefficient about) and we can reduce the errors even faster. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 12:27, 10 June 2022 (UTC)
- That is a fair point, but then we're at a situation where the bot operator could potentially need to iterate through and pre-fix every error before ever being able to edit any pages (I am basing this on the hypothetical situation where every new page has at least one new lint error). This gets us onto the unfavourable possibility (b) I mentioned above, which we have not managed yet. Primefac (talk) 11:43, 10 June 2022 (UTC)
- This was explained further down the page, but your response there indicated that it is possible you didn't understand, so I will make an attempt to address the
- I'm not going to spend hours developing code just to prove you wrong. Are you saying that it's technically impossible to combine multiple fixes into a single edit? Again, if you have the code to fix 20 different issues in 20 different edits, then you have the code to fix those 20 issues in a single edit. This shouldn't be difficult to understand. I could write an entire article in 10,000 edits by adding one word per edit, or I could write the same exact article in a single edit. The problem isn't with the bot's ability to fix the linter errors; the problem is that it makes far too many edits per page to fix those errors, and there is no legitimate reason that multiple edits are required. —ScottyWong— 01:40, 7 June 2022 (UTC)
- Scottywong says that it is possible but still does not offer code to improve the bot. Please see the list of signatures I provided in the previous discussion, and develop a set of regexes to fix a decent fraction of them in one edit. If someone here can provide a false-positive-free regex or set of regexes that works as described above, I expect that the bot operator will be willing to include them in their code. – Jonesey95 (talk) 18:05, 6 June 2022 (UTC)
- @Jonesey95: This argument is a red herring. Yes, it would be challenging (although not impossible) to design a single regular expression that fixes all of the various linter errors in one go. But that is not necessary to address the editing behavior issues with this bot. You could have 100 separate individual regular expressions, each of which fixes a single issue, and that would be far less challenging to develop. However, that doesn't mean that you'd need to make 100 individual edits in order to fix those issues. Any minimally sophisticated bot could grab the content of a page, run in through the first regex, then take the result and pass it through the second regex, then take the result and pass it through the third regex ... then take the result and pass it through the 100th regex, and then make a single edit to the page. Instead, this bot operator insists on making upwards of 20 edits to a single page. There is no insurmountable technical hurdle that's in the way of fixing these issues with one edit per page. In other words, if you've written the code that can reliably fix all of the 180 linter errors on your example page by making 20 edits, then it would be trivial to use that code to fix all 180 linter errors in a single edit. Either this bot operator isn't sophisticated enough to do that, or doesn't want to put in the additional effort to reduce their edit count. —ScottyWong— 17:24, 6 June 2022 (UTC)
- Are we certain the errors need to be fixed? And if so, is there any rush? I'm struggling to understand why we have a dozen edits made to old RfAs. Perhaps we could either not fix them or agree to not have it edit a page more than once a year and fix all those errors at once? Hobit (talk) 11:31, 6 June 2022 (UTC)
- The bot operator has, multiple times, offered to accept help with programming a bot that does a better job. As far as I know, nobody has been willing to put in the work to make those improvements. This bot, with apologies to Winston Churchill, is currently the worst way to fix these millions of errors except for all of the other ways that have been tried thus far. – Jonesey95 (talk) 04:28, 6 June 2022 (UTC)
- One misunderstanding here is that the bot is not just fixing "a single issue like deprecated <font> tags", which makes the job at hand seem trivial. There is a useful thread about this bot task in the archive, in which I listed a selection from the wide variety of errors that this bot is attempting to fix. I put out an open call for assistance to editors who think that this bot can perform better:
- If this specific bot annoys you, but you don't want to ignore bots in general, WP:HIDEBOTS has instructions on how to ignore a specific bot. Headbomb {t · c · p · b} 10:30, 6 June 2022 (UTC)
- Thanks. I did detail why I don't think that's a good way forward. Shutting down the bot, or taking a similar action to what you describe but making it the default for everyone, would address my issue. At the moment I think we are doing more harm than good. Hobit (talk) 10:51, 6 June 2022 (UTC)
- Support halting the bot until such time that it can be shown that the bot operator is capable of fixing all linter errors on a page with a single edit. The benefits of fixing trivial linter errors is not outweighed by the disruption caused by taking multiple edits to accomplish it. Ensuring that the signature of a long-retired editor displays correctly on a 15 year-old AfD is not an important enough problem to trash everyone's watchlist for the remainder of eternity. None of the proposed methods of filtering the bot's edits from your watchlist are suitable for all circumstances. —ScottyWong— 17:28, 6 June 2022 (UTC)
- "disruption caused by taking multiple edits to accomplish it"
- AKA little-to-no disruption, which can easily be bypassed. We had a discussion on this just last month, and there's nothing that warrants stoppage of the bot. People are welcomed to suggest concrete improvements, but a loud minority that refuse to mute the bot on their own watchlist is no reason to halt the bot. Headbomb {t · c · p · b} 18:31, 6 June 2022 (UTC)
- Lots of people seem to think it's a problem. If you've every worked with programmers (I am one), you've had the experiance where you say "this is a problem" and they say "no, it's not". And you're like "but I'm the user and yes, I understand why you think it's not a problem, but in practice, it is for me". We are having that discussion right now. Please don't assume your users are just being whiney or dumb. Hobit (talk) 19:15, 6 June 2022 (UTC)
- And lots of people have been given a solution which they refuse to use. MalnadachBot, on the whole, does more good (fixing hundreds of thousands / millions of lint errors) than harm (annoying a handful of editors). While it may not function not optimally (everyone agrees with would be great if the bot could do one edit per page and fix all the errors in one go), this is not reasonably doable / very technically challenging (while perhaps not insolvable, the issue has not yet been solved), and the bot still functions correctly and productively. Those that don't like it can solve their annoyance with single one edit, as detailed on WP:HIDEBOTS. Headbomb {t · c · p · b} 21:10, 6 June 2022 (UTC)
- I think it's a lot more people who are annoyed than you are acknowledging. But that's why we have discussions. You may well be correct. I'm really unclear on why fixing these lint errors in old discussions is worthwhile. Is there a pointer to a discussion on this? Hobit (talk) 22:30, 6 June 2022 (UTC)
- You can start at Wikipedia:Linter. There are multiple links from there if you want to take a deeper dive. – Jonesey95 (talk) 00:03, 7 June 2022 (UTC)
- I think it's a lot more people who are annoyed than you are acknowledging. But that's why we have discussions. You may well be correct. I'm really unclear on why fixing these lint errors in old discussions is worthwhile. Is there a pointer to a discussion on this? Hobit (talk) 22:30, 6 June 2022 (UTC)
- And lots of people have been given a solution which they refuse to use. MalnadachBot, on the whole, does more good (fixing hundreds of thousands / millions of lint errors) than harm (annoying a handful of editors). While it may not function not optimally (everyone agrees with would be great if the bot could do one edit per page and fix all the errors in one go), this is not reasonably doable / very technically challenging (while perhaps not insolvable, the issue has not yet been solved), and the bot still functions correctly and productively. Those that don't like it can solve their annoyance with single one edit, as detailed on WP:HIDEBOTS. Headbomb {t · c · p · b} 21:10, 6 June 2022 (UTC)
- Lots of people seem to think it's a problem. If you've every worked with programmers (I am one), you've had the experiance where you say "this is a problem" and they say "no, it's not". And you're like "but I'm the user and yes, I understand why you think it's not a problem, but in practice, it is for me". We are having that discussion right now. Please don't assume your users are just being whiney or dumb. Hobit (talk) 19:15, 6 June 2022 (UTC)
- Let the bot continue with its excellent work. Anyone who wishes to do a better job, is more then welcomed to either write their own bot or contribute better code to this one. Putting the onus on ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ to make an extremely complex code, better, while at the same time, shutting down its bot is ridiculous. If anyone has a problem with "watchlist spam", then hide bot edits. Don't want to? Then stop complaining. Gonnym (talk) 18:31, 6 June 2022 (UTC)
- No? This is making it harder for me to use Wikipedia. So I will complain. If it turns out it really is only a handful of us who find the cost/benefit analysis to be poor then I'll just put up with it (as I have for more than a year now). Hobit (talk) 22:33, 6 June 2022 (UTC)
- @Headbomb: What about the set of editors that want to continue monitoring bot edits? Or the set of editors that want to specifically monitor Malnadachbot's edits to the pages on their watchlists (without being bombarded by the bot editing the same page dozens of times)? If we're all ignoring the bot's edits, then major problems could slip through the cracks without anyone noticing. I don't understand why some of you are so adamant that there is no problem here, despite the continuous stream of complaints from various editors over the last few months. What's the rush here? Why can't we just pause for a brief moment, take a step back, and see if there is a better way to proceed? —ScottyWong— 01:51, 7 June 2022 (UTC)
- Then hide MalnadachBot and leave the others unhid. Those than want to monitor MalnadachBot's edits but don't want to see MalnadachBot's edits is a minority so nonexistent we might as well discuss the current lives of dead people. And in the off chance those exists, Special:Contributions/MalnadachBot is there for that. Headbomb {t · c · p · b} 01:55, 7 June 2022 (UTC)
- Why are you always defending this bot so aggressively? Do you really believe that it is technically difficult to combine multiple fixes into a single edit? If you make 5 edits in a row to the same article, there is no technical reason that you couldn't have combined those 5 edits into 1 edit. I truly don't understand why you (and a few others) don't see this as being even a small issue. No one should have to hide a specific bot from their watchlist because its behavior is so disruptive. Keep in mind that this bot is also filling up the revision histories of millions of pages, making them more tedious to look through for editors. After a couple years of this bot running like this, a huge portion of the page histories on WP will be infected with the Malnadachbot Virus of 2022, as evidenced by a giant block of 50+ edits in a row to every discussion page in the project that has user signatures on it. How can you not see this as a problem? —ScottyWong— 02:04, 7 June 2022 (UTC)
- Do you really believe that it is technically difficult to combine multiple fixes into a single edit? Answer to that is an unequivocal Yes. See the stackoverflow link Galbotter has shared above. As someone who has spent hundreds of hours fixing html errors, I fully agree with the pinned answer. Eveybody Gangsta Until they try to build a bot that can fix all Lint errors in a single edit. You can sooner build a bot to write featured articles than a bot to fix all Lint errors of all type in a single edit. If this doesn't make sense to you, you have no idea what we are actually dealing with. With this bot run in future, people will not notice anything unusal when reading pages, which is exactly this point of these edits. Users have a greater necessity to read old discussions than to check its page history. Page histories are irrelevant to readers, whose needs come first. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:58, 7 June 2022 (UTC)
- @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ: You're misunderstanding the point. I understand the difficulties in detecting and fixing a wide variety of linter errors. It's not a simple problem to solve. However, the issue that is being brought up here is not that your fixes are inaccurate or invalid. The issue is that you're making far too many edits per page to solve these problems. Regardless of the complexity of the problem, if you have written code that can accurately correct 100 different linter errors by making 100 separate edits to the same page, then it should be trivial to amend that code so that it fixes all 100 errors in a single edit. There is no reason that your code has to actually commit an edit after fixing each individual error, before moving on to the next error. Your code analyzes the wikitext of a page, modifies it, and then submits the modified wikitext as an edit. Instead, your code should submit the modified wikitext to the next block of code, which can analyze it and modify it further, and then the next block of code, etc., until all possible modifications have been made, and then submit the final wikitext as a single edit. Instead, you are fixing one error, making an edit, fixing another error on the same page, making another edit. This is the problem. No one is asking you to develop code that magically fixes every linter error known to humanity in a single edit. All we're saying is that your code should fix all of the issues that it is capable of fixing within a single edit. I don't understand why that is difficult for you. —ScottyWong— 15:30, 7 June 2022 (UTC)
- I don't have code to fix 100 patterns in a page at the same time. Rather what I usually have is code to fix 100 patterns spread across 50,000 pages. Individual pages among the 50,000 will have at least 1 and usually not more than 5 of the patterns that are being checked in that batch. All patterns are checked sequentially and then saved in a single edit. In the (highly unlikely) case if a page has all 100 patterns in a batch, it matches the first pattern, the output of which will be matched with the second pattern and so on upto the 100th. Only the final output is saved in a single edit, it does not save 100 times in hundred edits. Once the 100 patterns in this batch across 50,000 pages is cleared, 100s more issues that were previously burried deep come out to the top of Linter reports and the process is repeated. In this page that Cryptic has brought up below, the bot has made 21 edits over 11 months from 2 tasks. At the time of edits from task 2, I did not have approval for edits that were made from task 12. Basically MalnadachBot is very good at fixing Lint errors by "breadth" (35-40k per day) but is bad at fixing them by "depth". Hope this is clear. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 19:20, 7 June 2022 (UTC)
- I understand that your code has developed over time, and perhaps 6 months ago you didn't have the code developed to fix an issue that you can fix today. And that might account for multiple edits on certain pages. Sure, that is reasonable. However, that doesn't explain why, for example on this page, you made 26 different edits that were linked to task 2 of your bot, and another 24 separate edits that were linked to task 12 of your bot. In total, you made 50 edits to the same page to carry out fixes for two approved tasks. If it was true that you simply hadn't gotten approval for task 12 at the time that you were running task 2, then I would have expected a maximum of 2 edits to this page (one for task 2, one for task 12), not 50. Much of your explanation above doesn't make any sense to me. You say that "all patterns are checked sequentially and then saved in a single edit", but then how can you explain why you made 50 edits to a single page? It doesn't add up. I'm honestly losing confidence that you have the technical expertise to carry out these tasks in a responsible manner. I'm glad you're getting some help from other editors in the conversations below, and I look forward to a day when you can make one edit per page to carry out all fixes at once. Until that day arrives, I believe that you should stop your bot from making edits. There is no rush to fix these issues, and it shouldn't be done until it can be done correctly. —ScottyWong— 22:25, 7 June 2022 (UTC)
- "Task 2" is approval for any user signature that creates high priority Lint errros. "Task 12" is approval for anything that creates any Lint error. They are hundreds of smaller tasks all rolled into a broad scoped approval. The page you linked has 2,347 signatures! Pages with so many signatures are what any bot will struggle to get in few edits. What has happened here is the bot has checked 100 users' signatures in 50 passes i.e it has checked a total of 5,000 users' signatures, of which about 80 of them happened to have their signature in that page. Legoktm's code looks promising and can replace general signature patterns instead of specific users' signatures. @Galobtter and Xaosflux: I will stop the bot if it is soley fixing font tags for now and focus on other Lint errors. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 04:27, 8 June 2022 (UTC)
- I understand that your code has developed over time, and perhaps 6 months ago you didn't have the code developed to fix an issue that you can fix today. And that might account for multiple edits on certain pages. Sure, that is reasonable. However, that doesn't explain why, for example on this page, you made 26 different edits that were linked to task 2 of your bot, and another 24 separate edits that were linked to task 12 of your bot. In total, you made 50 edits to the same page to carry out fixes for two approved tasks. If it was true that you simply hadn't gotten approval for task 12 at the time that you were running task 2, then I would have expected a maximum of 2 edits to this page (one for task 2, one for task 12), not 50. Much of your explanation above doesn't make any sense to me. You say that "all patterns are checked sequentially and then saved in a single edit", but then how can you explain why you made 50 edits to a single page? It doesn't add up. I'm honestly losing confidence that you have the technical expertise to carry out these tasks in a responsible manner. I'm glad you're getting some help from other editors in the conversations below, and I look forward to a day when you can make one edit per page to carry out all fixes at once. Until that day arrives, I believe that you should stop your bot from making edits. There is no rush to fix these issues, and it shouldn't be done until it can be done correctly. —ScottyWong— 22:25, 7 June 2022 (UTC)
- I don't have code to fix 100 patterns in a page at the same time. Rather what I usually have is code to fix 100 patterns spread across 50,000 pages. Individual pages among the 50,000 will have at least 1 and usually not more than 5 of the patterns that are being checked in that batch. All patterns are checked sequentially and then saved in a single edit. In the (highly unlikely) case if a page has all 100 patterns in a batch, it matches the first pattern, the output of which will be matched with the second pattern and so on upto the 100th. Only the final output is saved in a single edit, it does not save 100 times in hundred edits. Once the 100 patterns in this batch across 50,000 pages is cleared, 100s more issues that were previously burried deep come out to the top of Linter reports and the process is repeated. In this page that Cryptic has brought up below, the bot has made 21 edits over 11 months from 2 tasks. At the time of edits from task 2, I did not have approval for edits that were made from task 12. Basically MalnadachBot is very good at fixing Lint errors by "breadth" (35-40k per day) but is bad at fixing them by "depth". Hope this is clear. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 19:20, 7 June 2022 (UTC)
- @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ: you misunderstand what I mean by the stackoverflow link. The issue is that you are using regex to fix a problem that regex is not meant to solve. As the stackoverflow link states, using an actual parser makes doing HTMl stuff easy. Galobtter (pingó mió) 21:39, 7 June 2022 (UTC)
- @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ: You're misunderstanding the point. I understand the difficulties in detecting and fixing a wide variety of linter errors. It's not a simple problem to solve. However, the issue that is being brought up here is not that your fixes are inaccurate or invalid. The issue is that you're making far too many edits per page to solve these problems. Regardless of the complexity of the problem, if you have written code that can accurately correct 100 different linter errors by making 100 separate edits to the same page, then it should be trivial to amend that code so that it fixes all 100 errors in a single edit. There is no reason that your code has to actually commit an edit after fixing each individual error, before moving on to the next error. Your code analyzes the wikitext of a page, modifies it, and then submits the modified wikitext as an edit. Instead, your code should submit the modified wikitext to the next block of code, which can analyze it and modify it further, and then the next block of code, etc., until all possible modifications have been made, and then submit the final wikitext as a single edit. Instead, you are fixing one error, making an edit, fixing another error on the same page, making another edit. This is the problem. No one is asking you to develop code that magically fixes every linter error known to humanity in a single edit. All we're saying is that your code should fix all of the issues that it is capable of fixing within a single edit. I don't understand why that is difficult for you. —ScottyWong— 15:30, 7 June 2022 (UTC)
- Do you really believe that it is technically difficult to combine multiple fixes into a single edit? Answer to that is an unequivocal Yes. See the stackoverflow link Galbotter has shared above. As someone who has spent hundreds of hours fixing html errors, I fully agree with the pinned answer. Eveybody Gangsta Until they try to build a bot that can fix all Lint errors in a single edit. You can sooner build a bot to write featured articles than a bot to fix all Lint errors of all type in a single edit. If this doesn't make sense to you, you have no idea what we are actually dealing with. With this bot run in future, people will not notice anything unusal when reading pages, which is exactly this point of these edits. Users have a greater necessity to read old discussions than to check its page history. Page histories are irrelevant to readers, whose needs come first. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:58, 7 June 2022 (UTC)
- Halt. If you're not competent to handle even trivial cases like combining these edits [9][10][11][12] then you're not competent to be running a bot, let alone touching its code. —Cryptic 23:39, 6 June 2022 (UTC)
- Yeah the bot seems to fix every page with a certain signature in one go rather than any form of batching which means many many edits to the same page. At the very least stop for a few months, accumulate a big list of signatures to fix and apply the fixes at the same time, rather than rushing. Galobtter (pingó mió) 01:53, 7 June 2022 (UTC)
- Exactly. I could maybe understand if the bot was making separate edits to fix distinctly different issues. But your diffs show that the bot is making multiple edits to fix different instances of the same exact issue on a single page. Combine that with the fact that these are purely cosmetic issues, and that's over the line for me. —ScottyWong— 01:56, 7 June 2022 (UTC)
- @Galobtter: I combine multiple signatures and run the bot in batches. If multiple signatures in a batch are present in a page, the bot replaces them in a single edit. The problem is that there are only so much signatures you can collect from the Linter lists before you start running into the same signatures again and again. To get new signatures, I would have run a batch and remove a few hundred thousand entries from the lists. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:58, 7 June 2022 (UTC)
- Agree to a halt, to solidify my position into a bold one. 🐶 EpicPupper (he/him | talk) 04:11, 8 June 2022 (UTC)
- Look at this, one of the examples linked above. For 10 years, nobody edited that page except for bots fixing Lint errors. The page has a daily average of 1 page view. But there are like 50+ bot edits, mostly from MalnadachBot. That's just crazy and we should stop. Don't browsers already fix Lint errors anyway? Do we really need to fix Lint errors on pages from 10+ years ago that nobody reads? If upstream devs aren't focused on this "problem", maybe it's not a problem? (I know, I know, that last one was funny.) But seriously, even if we want to fix Lint errors, that can be done in one edit per page--it's shocking that more than one person in this discussion has claimed that these edits would be difficult to combine. I don't know if that was incompetent or disingenuous, both options kinda alarming. I guess the bot has already been halted and someone's already come up with new code -- that's awesome, thank you -- but please let's be mindful of edit history pollution and watchlist pollution... I know the WMF likes to brag about how any billions of edits are made, but it's more efficient to have fewer edits, fewer entries in the database, etc., and never have a page history like the one I just linked to... and maybe we should think about a rule that a bot shouldn't edit a page multiple times without confirmation of specific consensus for that, similar to cosmeticbot. Levivich[block] 17:47, 21 June 2022 (UTC)
- Specifically as to whether it's a problem if we don't fix it here: well, it is and it isn't. Browsers have entirely dropped support for tags before (<blink>, for example). If they did that for <font> - and I'd be frankly astounded if they did, at least for the next decade or so - they'd almost certainly just strip out the invalid tags, making, for example, the wikitext from the left hand side of this edit I linked above show up as Starblind. (Try making an html document with font tags like that, and renaming them all to "<fnot>".) If the Wikimedia developers drop support for it - and I think the last time they did anything comparable was in 2001ish when we moved away from CamelCase page links - we'd see the invalid tags, like <FONT COLOR="#FF0000">St</FONT><FONT COLOR="#FF5500">ar</FONT><FONT COLOR="#FF8000">bli</FONT><FONT COLOR="#FFC000">nd</FONT>, the same way a bare "<fnot>sic</fnot>" renders visibly. Honestly, I wouldn't be too upset with either of those in non-content namespaces if our only alternative truly was fix-one-tag-per-edit. —Cryptic 18:21, 21 June 2022 (UTC)
- @Cryptic: Thank you for that explanation. Can I bother you for a few more:
- 1. If either browsers or MediaWiki dropped support, then we could go about updating the old HTML, right? There's no reason to think support will be dropped anytime soon?
- 2. Maybe this is a "explain it to me like I'm 5 years old" situation, but if we're going to update old HTML, <font> isn't the only tag that would need to be updated, right? So why aren't we just making one edit to each page, calling it "update to HTML 5.0", and doing all the updates at once, and never having to edit any page a second time (to update old HTML)? I feel like I'm missing something. Levivich[block] 18:31, 21 June 2022 (UTC)
- Well, for 1, the consequence is that our pages would look ugly between support being dropped and them all being fixed. That could be a long time - we've been getting bot fixes of font tags for, what, a year now? - and there'd probably be a lot more pages to fix by then.For 2, yes, there are lots and lots of other classes of lint errors, most of them more problematic. Font tags aren't likely to start rendering badly by accident, only if they're deliberately disabled; but unclosed tags, for example, which used to get automatically tidied, broke spectacularly when we switched internal software. Which is why old revisions of my RFA are mostly green. Fixing every possible html error in one edit isn't practical, and I don't think anyone has a real problem with bots fixing one kind of error at a time. The issue with MalnadachBot is that, to all appearances, it's been fixing one specific instance of an error at a time. —Cryptic 18:47, 21 June 2022 (UTC)
- To answer the
why aren't we just making one edit to each page
question, see my explanation above, which can be summarised as "it's possible but also highly impractical to do so unless other folks help out". Primefac (talk) 08:31, 22 June 2022 (UTC)
- To answer the
- Well, for 1, the consequence is that our pages would look ugly between support being dropped and them all being fixed. That could be a long time - we've been getting bot fixes of font tags for, what, a year now? - and there'd probably be a lot more pages to fix by then.For 2, yes, there are lots and lots of other classes of lint errors, most of them more problematic. Font tags aren't likely to start rendering badly by accident, only if they're deliberately disabled; but unclosed tags, for example, which used to get automatically tidied, broke spectacularly when we switched internal software. Which is why old revisions of my RFA are mostly green. Fixing every possible html error in one edit isn't practical, and I don't think anyone has a real problem with bots fixing one kind of error at a time. The issue with MalnadachBot is that, to all appearances, it's been fixing one specific instance of an error at a time. —Cryptic 18:47, 21 June 2022 (UTC)
- Specifically as to whether it's a problem if we don't fix it here: well, it is and it isn't. Browsers have entirely dropped support for tags before (<blink>, for example). If they did that for <font> - and I'd be frankly astounded if they did, at least for the next decade or so - they'd almost certainly just strip out the invalid tags, making, for example, the wikitext from the left hand side of this edit I linked above show up as Starblind. (Try making an html document with font tags like that, and renaming them all to "<fnot>".) If the Wikimedia developers drop support for it - and I think the last time they did anything comparable was in 2001ish when we moved away from CamelCase page links - we'd see the invalid tags, like <FONT COLOR="#FF0000">St</FONT><FONT COLOR="#FF5500">ar</FONT><FONT COLOR="#FF8000">bli</FONT><FONT COLOR="#FFC000">nd</FONT>, the same way a bare "<fnot>sic</fnot>" renders visibly. Honestly, I wouldn't be too upset with either of those in non-content namespaces if our only alternative truly was fix-one-tag-per-edit. —Cryptic 18:21, 21 June 2022 (UTC)
Code
As requested:
import mwparserfromhell as mwph
def fix_font(wikitext: str) -> str:
code = mwph.parse(wikitext)
for tag in code.filter_tags():
if tag.tag == "font":
# Turn it into a <span>
tag.tag = "span"
# Turn color into style="color: ...;"
if tag.has('color'):
attr = tag.get('color')
attr.name = "style"
attr.value = f"color: {attr.value};"
# TODO: face, size
return str(code)
Using this replacement as an example:
>>> print(fix_font("""[[User:Ks0stm|<font color="009900">'''Ks0stm'''</font>]] <sup>([[User talk:Ks0stm|T]]•[[Special:Contributions/Ks0stm|C]]•[[User:Ks0stm/Guestbook|G]]•[[User:Ks0stm/Email|E]])</sup> 15:48, 13 December 2015 (UTC)"""))
[[User:Ks0stm|<span style="color: 009900;">'''Ks0stm'''</span>]] <sup>([[User talk:Ks0stm|T]]•[[Special:Contributions/Ks0stm|C]]•[[User:Ks0stm/Guestbook|G]]•[[User:Ks0stm/Email|E]])</sup> 15:48, 13 December 2015 (UTC)
It's entirely possible I've missed something, but seems like the general approach should work. Legoktm (talk) 20:49, 6 June 2022 (UTC)
- I think it is ridiculous that upstream dev's wont just keep support for these tags (convert them in the parser or whatever) - but that's not the bot's fault. In general, I don't see any specific edit this bot is making is bad, that is if an editor made it it would be OK. Now, could the bot be "better", sure - but as long as we are getting threatened by the software folks that our pages will be disrupted if we don't change the wikitext I'm not too opposed to people slowly going through them. I don't like extra edits for sure, but I don't really have any sympathy for the When I don't hide bots in my watchlist, my watchlist is busy complaint. We keep putting up with unflagged editors spamming recent changes/watchlists and can't get support to get them to stop. — xaosflux Talk 23:52, 6 June 2022 (UTC)
- Legoktm, thank you! for being the first person to actually suggest code improvements. You have illustrated the complexity of constructing code that will work on the wide variety of font tags that are out there in the wild, because the code as written above does not work (a # symbol is needed before color specs in span tags, even though it wasn't needed in font tags). You also got lucky that the font tag was inside the wikilink; font tags outside of wikilinks sometimes need different processing. If you are willing to create code that works on a significant fraction of the list of signatures in the archived discussion that I linked above, I expect that the bot operator would be willing to engage with you, or perhaps another brave editor would be willing to create a new bot with your new approach. I think we can all agree that fewer edits per page, as long as the edits are error-free, is the optimal outcome, and the above code snippet, with some development, seems to promise some of that. – Jonesey95 (talk) 00:12, 7 June 2022 (UTC)
- I think that a potential solution would be using regex replaces sequentially. They could be stored in a dictionary, so the code would loop through it, do a replace, then do the next for the already-replaced text. 🐶 EpicPupper (he/him | talk) 00:29, 7 June 2022 (UTC)
- I'm pretty sure the bot does this already, applying multiple regexes to each edit, when applicable. The challenge is that in order to best avoid false positives, it is my understanding that editor signatures and other specific patterns are added one by one to the list of regexes. You can't just add a general pattern and hope to avoid false positives, because you could end up changing examples that are deprecated or invalid on purpose. You'd have to ask the bot operator to be sure. – Jonesey95 (talk) 01:36, 7 June 2022 (UTC)
- I think you're misunderstanding; the regexes can be applied sequentially, rather than all together. Right now, I believe what you mean by "applying multiple regexes" is doing, say, 5 together, all at once, but rather this can be applied one after the other (first replaces the first regex, then takes the result of that replace and triggers the second, etc). 🐶 EpicPupper (he/him | talk) 01:40, 7 June 2022 (UTC)
- I do apply changes sequentially. i.e match for one pattern in the whole page and use the output to match the next pattern one after the other for all patterns in a batch. After it has matched all of them, it will save the final output in a single edit. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:58, 7 June 2022 (UTC)
- @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ, so why does your bot need 100+ edits to fix Lint errors? 🐶 EpicPupper (he/him | talk) 16:07, 7 June 2022 (UTC)
- Which page has 100+ edits by MalnadachBot? If you see the example given by Hobit, it shows about 16 edits spread across 11 months and 2 seperate tasks. All of those edits are from different batch runs, it does not make any edits in rapid succession. A typical edit by MalnadachBot would be something like this. Most pages have only one or two Lint errors that is fixed using direct signature replacement and it will not have to visit it again. RFA pages tend to acccumalate a lot of different signatures which why they have 10+ edits. Now that I think of it, maybe I should just skip highly watched pages like RFAs since that is what at least 3 poeple have mentioned. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 16:29, 7 June 2022 (UTC)
- 100+ is probably an exaggeration. Pages with 50+ are easy to find. Example. Example. Example example example. —Cryptic 17:46, 7 June 2022 (UTC)
- Wow! That is just egregious. I didn't know pages like this existed. There's no legitimate reason for a bot to behave like this. —ScottyWong— 22:13, 7 June 2022 (UTC)
- 100+ is probably an exaggeration. Pages with 50+ are easy to find. Example. Example. Example example example. —Cryptic 17:46, 7 June 2022 (UTC)
- Which page has 100+ edits by MalnadachBot? If you see the example given by Hobit, it shows about 16 edits spread across 11 months and 2 seperate tasks. All of those edits are from different batch runs, it does not make any edits in rapid succession. A typical edit by MalnadachBot would be something like this. Most pages have only one or two Lint errors that is fixed using direct signature replacement and it will not have to visit it again. RFA pages tend to acccumalate a lot of different signatures which why they have 10+ edits. Now that I think of it, maybe I should just skip highly watched pages like RFAs since that is what at least 3 poeple have mentioned. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 16:29, 7 June 2022 (UTC)
- @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ, so why does your bot need 100+ edits to fix Lint errors? 🐶 EpicPupper (he/him | talk) 16:07, 7 June 2022 (UTC)
- I do apply changes sequentially. i.e match for one pattern in the whole page and use the output to match the next pattern one after the other for all patterns in a batch. After it has matched all of them, it will save the final output in a single edit. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:58, 7 June 2022 (UTC)
- I think you're misunderstanding; the regexes can be applied sequentially, rather than all together. Right now, I believe what you mean by "applying multiple regexes" is doing, say, 5 together, all at once, but rather this can be applied one after the other (first replaces the first regex, then takes the result of that replace and triggers the second, etc). 🐶 EpicPupper (he/him | talk) 01:40, 7 June 2022 (UTC)
- I'm pretty sure the bot does this already, applying multiple regexes to each edit, when applicable. The challenge is that in order to best avoid false positives, it is my understanding that editor signatures and other specific patterns are added one by one to the list of regexes. You can't just add a general pattern and hope to avoid false positives, because you could end up changing examples that are deprecated or invalid on purpose. You'd have to ask the bot operator to be sure. – Jonesey95 (talk) 01:36, 7 June 2022 (UTC)
- @Jonesey95: heh, I should've previewed before posting :p yeah, it needs to check if the value is hex, and if so, prefix it with a hash if it isn't already. I would prefer if someone else took over my proof-of-concept and ran with it, but given that I do find the current behavior slightly annoying I might if no one else picks it up. The main point I wanted to make is that using a proper wikitext/HTML parser instead of regex makes this much more straightforward, I think a comprehensive font -> span function should be about 100 lines of Python. Legoktm (talk) 04:33, 7 June 2022 (UTC)
- In the past few weeks, I have been building a bunch of general regexes using the same proof-of-concept as in Legoktm's code above. These regexes work with near 100% accuracy for very rigid sets of replacements applied sequentially. I am sure that if I use them, I can fix about 7 million Lint errors, fixing most of the common cases in a single edit. It will greatly decrease the number of revists. The reason I have not used them is that it ignores Lint errors other than font tags. The bot will still have to revist pages for other types of Lint errors, which is what some people have a problem with. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:58, 7 June 2022 (UTC)
- I have listed some safe regexes in User:MalnadachBot/Task 12/1201-1250. I am currently running the bot with it. If the wikitext parsing method mentioned above can handle nested font tags, misnested tags, mutiple tags around the same text, tags both inside and outside a link etc, I am willing to give it a try. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 13:26, 7 June 2022 (UTC)
- What's the reason for using regexes instead of a wikitext/HTML parser? Another strategy I've been thinking about since yesterday is to use some form of visual diffing to verify the output is unchanged before saving. Then you could have code that doesn't handle literally every single edge case run and clean up most pages in bulk, and then go back and write in code to handle pages with edge cases in smaller passes. Legoktm (talk) 17:15, 7 June 2022 (UTC)
- As for why I use regexes, that's because this is an AWB bot and is easy to use with it. I just tried mwparserfromhell, got it to work fine for font tags with a single attribute. However am stuck at font tags with two attributes. I tried
if tag.tag == "font": tag.tag = "span" if tag.has('color') and tag.has('face'): attr1 = tag.get('color') attr2 = tag.get('face') attr1.name = "style" attr2.name = "style" attr1.value = f"color:{attr1.value};" attr2.value = f"font-family:{attr2.value};"
- For this if I pass
<font color="red" face="garamond">abc</font>
, it returns<span style="color:red;" style="font-family:garamond;">abc</span>
. How can I get them in a single style? ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 18:22, 7 June 2022 (UTC)- @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ try phab:P29487. Maybe we should move to a Git repo for easier collaboration? Also we want fixers for all the other deprecated tags. Legoktm (talk) 19:00, 7 June 2022 (UTC)
- I updated the paste to include my screenshot code in it and various other deprecated tags, just got a bit stuck with opencv getting uprightdiff to build. Maybe tomorrow. Legoktm (talk) 02:55, 8 June 2022 (UTC)
- @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ try phab:P29487. Maybe we should move to a Git repo for easier collaboration? Also we want fixers for all the other deprecated tags. Legoktm (talk) 19:00, 7 June 2022 (UTC)
- As for why I use regexes, that's because this is an AWB bot and is easy to use with it. I just tried mwparserfromhell, got it to work fine for font tags with a single attribute. However am stuck at font tags with two attributes. I tried
- What's the reason for using regexes instead of a wikitext/HTML parser? Another strategy I've been thinking about since yesterday is to use some form of visual diffing to verify the output is unchanged before saving. Then you could have code that doesn't handle literally every single edge case run and clean up most pages in bulk, and then go back and write in code to handle pages with edge cases in smaller passes. Legoktm (talk) 17:15, 7 June 2022 (UTC)
- I think that a potential solution would be using regex replaces sequentially. They could be stored in a dictionary, so the code would loop through it, do a replace, then do the next for the already-replaced text. 🐶 EpicPupper (he/him | talk) 00:29, 7 June 2022 (UTC)
- @Xaosflux afaik there's no indication that browser devs are actually going to drop support for the tags, isn't it just in linter because font tag are obsolete in HTML5? Or did I miss a discussion. Galobtter (pingó mió) 01:57, 7 June 2022 (UTC)
- @Xaosflux: that's a big [citation needed] from me. Please provide links to where developers (MediaWiki or browser) have made these "threats" about <font> and other deprecated HTML tags being removed - I'm not aware of any. The tag is marked as deprecated because it shouldn't be used in new code, and it would be nice to clean up old uses of it, but there's no deadline to do so. Certainly if browsers did even propose to remove it there would be huge blowback given how much legacy content on random geocities type websites that would be broken, I doubt it'll ever happen. That said, how people choose to spend their time on Wikipedia is up to them, and if people are interested in cleaning this stuff up, more power to them. Legoktm (talk) 04:10, 7 June 2022 (UTC)
- @Legoktm good point, I'd have to go find more on that - so are you saing that "font" deprecation is basically useless and we need not worry about it really - because I'd certainly rather nobody bother with any of the Special:LintErrors/obsolete-tag's if doing so is going to be useless. — xaosflux Talk 10:01, 7 June 2022 (UTC)
- I would put it one or two steps above useless. I think it is nice that people are cleaning this stuff up, but there's absolutely no urgency to do so. I'm struggling to come up with an analogy to other wiki maintenance work at the moment that clearly articulates the value of this work without overstating its importance...I'll try to get back to you on that. Legoktm (talk) 17:12, 7 June 2022 (UTC)
- @Legoktm good point, I'd have to go find more on that - so are you saing that "font" deprecation is basically useless and we need not worry about it really - because I'd certainly rather nobody bother with any of the Special:LintErrors/obsolete-tag's if doing so is going to be useless. — xaosflux Talk 10:01, 7 June 2022 (UTC)
I don't see any specific edit this bot is making is bad
- then you're not thinking it through. I. See. (No exaggeration.) HUNDREDS OF THOUSANDS. Of. Bad. Edits. If a bot's reason for editing a page is to eliminate font tags, then it is an unambiguous error to save that page with a font tag remaining in it. If it can't yet handle all the cases on a given page, then the blindingly obvious solution is to log it for later inspection and then either skip that page or halt entirely. And yes, if a human made fifty edits in a row to a given page fixing a single font tag per edit and continued to do so over millions of pages for years while dozens of editors begged them to stop, we'd block them in a heartbeat. —Cryptic 17:46, 7 June 2022 (UTC)- If I edit a page and fix something on it - the page is better than it was before. Lack of not also fixing something else doesn't make my first edit "bad". If an article had two misspelled words and you fixed one should you be cautioned for not fixing the other? My comment was in the broadest, along the lines of If a human editor made this edit should it be reverted? - and I'd say no. And yes, if someone without a bot flag was flooding recent changes we'd have issue - because of that. All that being said, following from my last comment: if these edits are useless then they shouldn't really be getting made by any bots at all - and that is something that we can measure community support for to make a decision. — xaosflux Talk 18:16, 7 June 2022 (UTC)
- Add me to any list of editors who would like a solution to this. I’m tired of seeing them. Doug Weller talk 18:47, 7 June 2022 (UTC)
- @Doug Weller: et al: I'm kind of agnostic on the need for this bot, and the level of headache it's causing, and the extent of support and opposition to the bot, and the ease/necessity of reducing the number of edits per page, and whether it makes sense to pause this until it is more efficient. But just a reminder (the bot's userpage says this, and Headbomb says it somewhere up above, but this is a long thread now) that WP:HIDEBOTS works well if you want to hide just one bot's contribs (or, I'm surprised to find out, one user. I didn't know that was possible). I'm
an idiot"very non-technical", and I just now set it up by myself, and now I see no Malnawhateverbot edits in my watchlist, but all the other bots are still there. You're all of course free to keep arguing - and I'll be watching out of curiosity - but that does solve the immediate problem for those who find this to be ... an immediate problem. --Floquenbeam (talk) 19:33, 7 June 2022 (UTC)- Correction: it seems to work fine on my computer, but apparently it doesn't work on my phone (even when in desktop mode). So not quite the perfect solution I marketed it as above. --Floquenbeam (talk) 19:39, 7 June 2022 (UTC)
- Using the mw.loader.load version of this aka {{lusc}} is likely to sort that on mobile. Izno (talk) 19:46, 7 June 2022 (UTC)
- Thanks, looks useful. I wonder if it will work on my iPad. Doug Weller talk 19:57, 7 June 2022 (UTC)
- Correction: it seems to work fine on my computer, but apparently it doesn't work on my phone (even when in desktop mode). So not quite the perfect solution I marketed it as above. --Floquenbeam (talk) 19:39, 7 June 2022 (UTC)
- @Doug Weller: et al: I'm kind of agnostic on the need for this bot, and the level of headache it's causing, and the extent of support and opposition to the bot, and the ease/necessity of reducing the number of edits per page, and whether it makes sense to pause this until it is more efficient. But just a reminder (the bot's userpage says this, and Headbomb says it somewhere up above, but this is a long thread now) that WP:HIDEBOTS works well if you want to hide just one bot's contribs (or, I'm surprised to find out, one user. I didn't know that was possible). I'm
- @Xaosflux and ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ: I saw one recent edit where a fix was made to one instance of an issue but 4 other near identical instances weren't fixed. That's just silly - we wouldn't want this kind of behavior from any other bot. Galobtter (pingó mió) 04:09, 8 June 2022 (UTC)
- Add me to any list of editors who would like a solution to this. I’m tired of seeing them. Doug Weller talk 18:47, 7 June 2022 (UTC)
- If I edit a page and fix something on it - the page is better than it was before. Lack of not also fixing something else doesn't make my first edit "bad". If an article had two misspelled words and you fixed one should you be cautioned for not fixing the other? My comment was in the broadest, along the lines of If a human editor made this edit should it be reverted? - and I'd say no. And yes, if someone without a bot flag was flooding recent changes we'd have issue - because of that. All that being said, following from my last comment: if these edits are useless then they shouldn't really be getting made by any bots at all - and that is something that we can measure community support for to make a decision. — xaosflux Talk 18:16, 7 June 2022 (UTC)
- Legoktm, thank you! for being the first person to actually suggest code improvements. You have illustrated the complexity of constructing code that will work on the wide variety of font tags that are out there in the wild, because the code as written above does not work (a # symbol is needed before color specs in span tags, even though it wasn't needed in font tags). You also got lucky that the font tag was inside the wikilink; font tags outside of wikilinks sometimes need different processing. If you are willing to create code that works on a significant fraction of the list of signatures in the archived discussion that I linked above, I expect that the bot operator would be willing to engage with you, or perhaps another brave editor would be willing to create a new bot with your new approach. I think we can all agree that fewer edits per page, as long as the edits are error-free, is the optimal outcome, and the above code snippet, with some development, seems to promise some of that. – Jonesey95 (talk) 00:12, 7 June 2022 (UTC)
Replacement bot for obsolete font lint errors
I wrote some code for User:Galobot to fix obsolete font lint errors here. It adds the # as needed, maps sizes to the css equivalent, and adds color styling to inner wikilinks to try to match previous Tidy behaviour (though I'm not sure exactly what that behaviour was and there's an issue of duplicating existing styling I need to fix). Sample diff. I'm sure there's other edge cases that people know of that have to be handled too. @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ: I think it's best if you pause task 12 on this lint error so that it can be fixed better in this way; I'll try to file a BRFA soon. Galobtter (pingó mió) 04:19, 8 June 2022 (UTC)
- That looks promising. Would you be willing to try it on a copy of this version of Wikipedia:Village pump (technical)/Archive 70 in your bot's user space? That page has an amazing array of signatures with Linter errors. – Jonesey95 (talk) 04:31, 8 June 2022 (UTC)
- @Jonesey95
That's exactly the sample diff I linked :)Galobtter (pingó mió) 04:40, 8 June 2022 (UTC)- I used the current version of that page but I can try it on the old version. Galobtter (pingó mió) 04:40, 8 June 2022 (UTC)
- @Jonesey95
- I have stopped it. Note that WOSlinkerBot is working on font tags with color attribute outside links, so you can ignore those. I will gladly stop font tag fixes and leave it for others. You can browse Special:PrefixIndex/User:MalnadachBot/Task 12 to see the numerous variations to consider for coding. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 04:42, 8 June 2022 (UTC)
- tl;dr Would it be possible to create an option in watchlist to hide edits by particular editors? —usernamekiran (talk) 08:50, 10 June 2022 (UTC)
- @Usernamekiran: WP:HIDEBOTS Headbomb {t · c · p · b} 10:15, 10 June 2022 (UTC)
- Only if you're using the "enhanced" watchlist that shows every edit instead of the latest. The developers have refused to make the existing filters work properly with the normal watchlist, so the chances of them letting you hide specific editors is slim indeed. —Cryptic 10:30, 10 June 2022 (UTC)
- Hi all, I'm sorry to say that I'm swamped in real life and no longer can commit the time to finish the code, push this through BRFA, and run the bot. Apologies for leaving this issue in limbo. I personally think this issue is not that urgent and can wait for someone to fix it properly without making so many edits to the same page (just because it's a lint error doesn't mean that it's a big issue or something that has to be fixed now - I feel like real reasons need to be there to fix it urgently, like a rendering or parsing issue). I hope someone can eventually do something similar to the code I wrote, but I think there's enough opinions either way that there should be some well-written RfC/formal discussion on this. Galobtter (pingó mió) 04:59, 28 June 2022 (UTC)
Issue with User:ImageTaggingBot
I have observed an issue with User:ImageTaggingBot. The bot is operated by User:Carnildo, but it appears that he very seldom edits any more. Is there an alternate bot operator? The issue has to do with Isha Gurung, which was just copied into article space. The article has an image of the subject, which apparently doesn't have a proper license template. The bot has put the notice about the template at the bottom of the article, which has the anomalous effect of causing two more copies of the image to appear in the article. I don't think that is what the bot is supposed to do. Robert McClenon (talk) 02:31, 30 June 2022 (UTC)
- I see what has happened. The user or users who created the BLP for some reason created it in User Talk space and then moved it to article space, which left a redirect from user talk to the article. The bot was trying to put the notice on the user talk page, where it belongs. "Interesting". Robert McClenon (talk) 02:48, 30 June 2022 (UTC)
- The bot actually did put it on the user's talk page; the move happened afterwards. —Cryptic 02:52, 30 June 2022 (UTC)
- Oh. So the bot was not the problem. The user was the problem. Robert McClenon (talk) 03:24, 30 June 2022 (UTC)
- The bot actually did put it on the user's talk page; the move happened afterwards. —Cryptic 02:52, 30 June 2022 (UTC)
- I can confirm Cryptic's explanation. The bot's action log doesn't make any mention of edits to Isha Gurung, but it does show a notification being added to User talk:LS Dhami. (And although I don't edit very much, I read quite a bit, and I'm usually logged in when I do so. Messages on my user talk page will usually be seen reasonably quickly, though if it's urgent, "email this user" is probably faster.) --Carnildo (talk) 18:40, 30 June 2022 (UTC)
- The extra images were because of the user's edits, too. They're currently deleted, but you can see here and here where I manually undid them after pasting the (still-mangled) notification back onto User Talk:. —Cryptic 19:16, 30 June 2022 (UTC)
Global bot approval request for Dušan Kreheľ (bot)
Hello!
I apologize for sending this message in English. Please help translate to other languages..
In accordance to the policy, this message is to notify you that there is a new approval request for a global bot.
The discussion is available at Steward requests/Bot status#Global bot status for Dušan Kreheľ (bot) on Meta.
Thank you for your time.
Best regards,
—Thanks for the fish! talk•contribs 21:07, 11 July 2022 (UTC)
- Note that, per local policy, that bot task may not be run here without separate local approval via WP:BRFA. Courtesy ping: @Dušan Kreheľ: Anomie⚔ 11:35, 12 July 2022 (UTC)
- @Anomie: Thanks to inform. Dušan Kreheľ (talk) 12:19, 12 July 2022 (UTC)
- There is already Wikipedia:Bots/Requests for approval/Dušan Kreheľ (bot), but if it's for a different task, we will need a new BRFA yes. Headbomb {t · c · p · b} 13:46, 12 July 2022 (UTC)
- Oh goody, another URL tracking bot task request. Primefac (talk) 13:54, 12 July 2022 (UTC)
- There is already Wikipedia:Bots/Requests for approval/Dušan Kreheľ (bot), but if it's for a different task, we will need a new BRFA yes. Headbomb {t · c · p · b} 13:46, 12 July 2022 (UTC)
- @Anomie: Thanks to inform. Dušan Kreheľ (talk) 12:19, 12 July 2022 (UTC)
Something called ApiFeatureUsage will disappear soon
I don't know if this will affect any of you, but "ApiFeatureUsage" will be removed soon (end of the month?) as part of an Elasticsearch upgrade. It appears that the official announcement on wikitech-l on May 5th might not have been received by everyone. If you know what this thing is, please panic at your earliest convenience have a look at the linked task and keep an eye on Tech/News in the coming weeks. Whatamidoing (WMF) (talk) 23:59, 18 July 2022 (UTC)
- See Wikipedia talk:Bots/Archive 22#How to use Special:ApiFeatureUsage. See also User:AnomieBOT/header. I'm under the impression that this tool was Anomie's thing. I'm curious to know whether the Search team was involved in any of the aforementioned office politics. Is this tool obsolete because future breaking changes have been deprecated or because the WMF could care less whether bot operators were alerted to breaking changes before they were implemented? – wbm1058 (talk) 04:03, 19 July 2022 (UTC)
- No, it was mainly management and HR. There were a lot of different things going on, some of which seem to still be going on. One big one was that WMF had a spate of growing more layers of management that seemed to have no purpose other than "management" and so had to throw their weight around to try to justify their existence and generally kiss up kick down. I've heard that the manager primarily responsible for my troubles has since "left" after finally trying and failing to get rid of someone else who might have been a threat to his rise to power, and turnover has taken care of some others. Whether the new ED and CPTO will turn things around there, we'll have to wait and see.
- The problem with ApiFeatureUsage, as far as I can tell, is that the Search team doesn't want to maintain the little bit of code that puts the sanitized data into an index on the public search cluster anymore. As far as I know it was pretty standard logstash and ElasticSearch stuff, just not the same stuff that's needed for MediaWiki's Search functionality. For that matter the ElasticSearch storage is not actually necessary for ApiFeatureUsage, the data just has to be somewhere that can support the necessary queries (and then someone needs to implement two functions to return the data to the extension). But when there are more important things that it seems no one officially cares about, we can't really expect them to care about something like ApiFeatureUsage that as far as I know never really took off in the first place. Anomie⚔ 12:48, 19 July 2022 (UTC)
- Update: It appears that they decided yesterday to migrate this instead of turning it off. Whatamidoing (WMF) (talk) 17:41, 21 July 2022 (UTC)
RM notice 20 July 2022
An editor has requested for Wikipedia:History of Wikipedia bots to be moved to another page. Since you had some involvement with Wikipedia:History of Wikipedia bots, you might want to participate in the move discussion (if you have not already done so). — Preceding unsigned comment added by 2601:647:5800:1a1f:4144:3a67:f74d:9f8c (talk) 18:11, 21 July 2022 (UTC)
HBC AIV helperbot5 down?
This bot is usually really on top of removing dealt with reports at UAA, but isn't doing so and hasn't done anything in the last six hours. It's maintainer has been inactive the last seven months. Beeblebrox (talk) 15:59, 20 July 2022 (UTC)
- It seems to now be back up and running, which probably doesn't answer any questions. -- zzuuzz (talk) 19:47, 20 July 2022 (UTC)
- That's fine, if it had been explained I probably wouldn't have understood a word of it anyway. Beeblebrox (talk) 19:16, 21 July 2022 (UTC)
Help!
Hi, Bot folks,
We have a category that many editors working in the AFC area check, Category:AfC G13 eligible soon submissions. It contains drafts and user pages that will become eligible for CSD G13 status in the next 30 days. Sometimes editors check the category to tag drafts when they become eligible, other times they "rescue" promising drafts that would otherwise be deleted. Typically, the total number of pages is around 3,000-5,000 drafts and user pages. The number can sometimes go down to 2,000 but the category always refills itself. We use to have a problem with the category becoming empty about a year or year and a half ago but I worked with ProcrastinatingReader and he had some way of making sure that the category periodically refilled.
Any way, there haven't been any problems with this category for a long time but I periodically check it and the category is down to 1,426 pages! We typically have 200-225 drafts expire each day so that isn't even a week's worth of pages. I went to ProcrastinatingReader's talk page but he has been away since May 2022. I'm not sure what he did to refill the category but I'm guessing it had something to do with bots categorizing drafts and I'm hoping someone here will know something or at least a good guess.
We do have a back-up system, SDZeroBot's lists, but some editors prefer to use this category because it has had the most up-to-date listing. Any ideas or fixes you can think of? Thanks in advance for any help you can offer. Liz Read! Talk! 05:38, 18 August 2022 (UTC)
- Proc was around two weeks ago (c.f. T314688) so they're not totally inactive, but I'm not sure what the best way to get in contact with them is. Legoktm (talk) 21:15, 18 August 2022 (UTC)
- I'll try emailing them. I think I was kind of a nuisance about this issue when it was a constant problem but everything has been working fine for over a year now. Liz Read! Talk! 01:38, 21 August 2022 (UTC)
- As there didn't seem to have been much progress on this matter I wrote and ran this script to "purge" the relevant pages at a reasonably sedate pace. That seems to be causing the relevant modules to re-evaluate which categories to add to the pages, thereby populating the category.
- Reading through past requests, BAG don't seem particularly concerned about bots that just do purges, because they can't tell whether they are being run anyway. William Avery (talk) 23:54, 29 August 2022 (UTC)
- This thread was cross-posted to Wikipedia:Village pump (technical)#Problem with G13 eligible soon category a few days ago, and I filed Wikipedia:Bots/Requests for approval/NovemBot 5 to try to solve the problem. Looks like your algorithm is similar to the one I intended to use. Nice work refilling the category. If you want to make your script a bot that runs weekly, you are welcome to take over the BRFA if you want. –Novem Linguae (talk) 00:21, 30 August 2022 (UTC)
- Sorry. I'm afraid intellectual curiosity got the better of me and I got a bit carried away. The burden of your BRFA is obviously correct. I was going to use the queries at https://github.com/siddharthvp/SDZeroBot/blob/master/reports/g13-soon.js to generate the page list to purge, extending it to leave out pages that are already in the maintenance category. However, you probably understand the draft article workflow better than me, who's just rocked up and thought "let's try purging some pages." I assume you are intending to publish code and hopefully run the task as a cron job on Toolforge, in which case I'd be very happy to leave it to you.
- Thank you for being so gracious about it. William Avery (talk) 01:23, 30 August 2022 (UTC)
- Nah you're fine, no toes stepped on. And @Liz will probably be very happy to have the category filled. I'll move forward with my BRFA so that the category stays replenished. –Novem Linguae (talk) 02:50, 30 August 2022 (UTC)
- This thread was cross-posted to Wikipedia:Village pump (technical)#Problem with G13 eligible soon category a few days ago, and I filed Wikipedia:Bots/Requests for approval/NovemBot 5 to try to solve the problem. Looks like your algorithm is similar to the one I intended to use. Nice work refilling the category. If you want to make your script a bot that runs weekly, you are welcome to take over the BRFA if you want. –Novem Linguae (talk) 00:21, 30 August 2022 (UTC)
Dams articles
Hello, I am involved in creating articles in dams manually (not bot), most of them are stubs. Currently, I am working for Japanese dam. I am starting this discussion to get community consensus to create dam stubs around the world. For information, there is a separate wiki project on dams. Best! nirmal (talk) 01:13, 28 July 2022 (UTC)
- Thank you nirmal for opening this. For context, I asked them to do so based on WP:MASSCREATE after noticing they were engaged the good faith creation of large numbers (slightly over 1000) of similar articles on dams based on a single source, damnet. These articles take the following form:
NAME (Japanese: JAPANESE NAME) is a TYPE dam located in LOCATION in Japan. The dam is used for PURPOSE. The catchment area of the dam is AREA km2. The dam impounds about AREA ha of land when full and can store SIZE thousand cubic meters of water. The construction of the dam was started on YEAR and completed in YEAR.
- A few examples can be seen at Yukiyagawa Dam, Yanagawa Dam (Iwate, Japan), and Bicchuji-ike Dam.
- I am not certain certain whether such creations, as opposed to inclusion in a list, are beneficial to the reader, particularly since I am not convinced that all of them meet WP:GNG - I note there is no presumed notability for dams, per WP:GEOFEAT. As such, I am hoping the community will discuss this and decide whether to endorse their continued creation in this manner. BilledMammal (talk) 03:28, 28 July 2022 (UTC)
- Before we get too deep into a discussion on the merits of geostubs, I feel it might be worth noting that I am inclined to a positive (or at worst neutral) view on the subjects, having procedurally written a number stubs based on database entries for islands in California. This was how I created Chain Island, Tinsley Island, Bull Island (California), Kimball Island, Joice Island, Island No. 2, Russ Island, Atlas Tract, Empire Tract, Brewer Island, Spud Island, Hog Island (San Joaquin County), Tule Island, Headreach Island, Bradford Island, Van Sickle Island, and Hooks Island. All of these articles are now Good Articles. I also did so for a number of islands in Michigan (Fordson Island, Stony Island, Fox Island, and Powder House Island). All of these are GAs as well, with the exception of Powder House Island, which lost its GA designation when it became a Featured Article. jp×g 04:53, 28 July 2022 (UTC)
- -A similar approach was used to create some of the individual articles of List of gaunpalikas of Nepal and members of parliaments of Nepal. nirmal (talk) 07:10, 28 July 2022 (UTC)
- I fully support creation of such articles. They can always be expanded later, as Jpxg points out above. NemesisAT (talk) 09:40, 28 July 2022 (UTC)
- nirmal, what is your intent for the scope of these creations? Do you intend to create articles on every dam listed in that index, or is it only a subset? BilledMammal (talk) 11:02, 28 July 2022 (UTC)
Thanks to nirmal for opening this. My opinion is that it would be better to have these be lists of dams (or maybe a table), rather than individual pages, if the pages are going to consist of the five-sentence template quoted above. List of dams in Japan is too long, but I think it could be split into regional lists, and the list entries can have all of the information currently contained in the stubs. For example, it seems to me everything currently contained in articles like Ameyama Dam and Gokamura-ike Dam could be included in two rows in a table in List of dams in Aichi Prefecture. The individual pages could become redirects, and then if somebody wanted to expand the article on a particular dam, they could easily do that (and then link to it on the list/table). Levivich (talk) 03:06, 29 July 2022 (UTC)
- I'm not sure if independent articles on dams are always the best idea if there are already articles on the lakes they sustain. I'd also advise gearing this towards dams that do more than simply hold water (something more important like a hydroelectric dam would be preferable). Bicchuji-ike Dam is a mound of packed dirt which helps area farmers. I doubt its notability unless WP:GNG is eventually demonstrated. -Indy beetle (talk) 08:00, 29 July 2022 (UTC)
- -Note that almost all of them are large dams.nirmal (talk) 13:17, 29 July 2022 (UTC)
For the record, I have in the past thanked Nirmaljoshi for their work on these articles, but also asked him/her to slow down a bit and take care to link new articles to Wikidata or existing articles in other languages, which they have not yet done. I agree that improving the existing list articles would be a good idea, and I stand ready to help. List of dams in Saga Prefecture might be a reasonable example of what can be done quite easily using existing data. — Martin (MSGJ · talk) 19:36, 31 July 2022 (UTC)
- I would like to thank nirmal for coming here. Every other editor who has mass-created articles has refused even to acknowledge that we had a policy on mass creation, or that any consultation at all was needed before creating huge numbers of articles.
- My question to Nirmal is this: why not just use a bot? Wouldn’t it make far more sense to do so than trying to do this stuff by hand? I mean, I’m not a programming genius but I think using excel/Notepad++ I could populated a whole bunch of templates with the requisite data in one go.
- All of this WP:MEATBOT stuff would be much better if it were just done automatically. FOARP (talk) 21:28, 31 July 2022 (UTC)
- @FOARP: Thanks for asking. Actually, I find pleasure to check the size of dams and imagine the age when it was constructed etc. Bot will take away that emotional parts :P . Having said that, I use a locally running python script to gather data. nirmal (talk) 04:53, 1 August 2022 (UTC)
- At worst, these dams should all exist as blue links, with a redirect to information about the dam in either a list or the adjacent lake article, right? And there's been no dispute over correctness of the information? So this is an introduction of new, reliably sourced information into the English Wikipedia on a topic we underrepresent (Japan), and turning red links into blue. From perusing List of dams and reservoirs in the United States and Category:Dams in the United States by state, it seems roughly inline for us to have articles on dams of this nature. I'm inclined to say that these mass creations are appropriate. — Bilorv (talk) 22:06, 3 August 2022 (UTC)
- I agree with your point about parity, but looking at the US dams, not every dam has a stand-alone article, instead they are listed in lists of dams-by-state. From a quick glance at the Alabama dams, the ones that do have stand-alone articles have multiple sections or at least several paragraphs of prose. I think Japanese dams should be similarly organized, but that would mean not creating short stand-alone pages for every dam, and instead having lists of dams-by-prefecture (or some other regional subdivision). Levivich 23:29, 3 August 2022 (UTC)
As discussed above, I started splitting List of dams in Japan into separate articles by prefecture, but Fram has moved these into draft space which has broken all the links from List of dams in Japan. I intend to revert these moves shortly, but just letting people know — Martin (MSGJ · talk) 11:28, 4 August 2022 (UTC)
- These were moved to draft because they were Wikidata lists, not lists with local content. That this has broken links is not a reason to have these, and reinstating such lists which were disallowed per RfC is not a wise move. Fram (talk) 12:40, 4 August 2022 (UTC)
- It is irresponsible to leave other artivles in a broken state. By the way I would be interested to see a link to an exact RfC by which you are claiming this consensus. As explained on the AfD this has nothing to do with ListeriaBot. — Martin (MSGJ · talk) 20:04, 4 August 2022 (UTC)
- The link to the RfC has been provided at the AfD. And redlinks in an article is not the same as "leaving articles in a broken state". What any of this has to do with the bot noticeboard is also not clear. Fram (talk) 11:18, 5 August 2022 (UTC)
- It is irresponsible to leave other artivles in a broken state. By the way I would be interested to see a link to an exact RfC by which you are claiming this consensus. As explained on the AfD this has nothing to do with ListeriaBot. — Martin (MSGJ · talk) 20:04, 4 August 2022 (UTC)
- I suggest a more technical guideline for inclusion or exclusion, instead of deciding subjectively. Having worked in dam and electricity related articles for long, I think I have good understanding on which dams should be included or and which should not. I suggest the followings:
- - Include dams that are defined as large dam by ICOLD. Because this is the most renowned body in the dam world, I think no one will disagree with their regulations. They have clear guidelines as listed here. By applying this rule most of the tiny dams will be excluded automatically. For example, among 92,017 dams of USA (Ref:https://nid.usace.army.mil/#/), if we apply filter of height>=50ft(15m) critera only 6853 will qualify to be included in the list.
- - Any smaller dams can be included if it has extensive coverage and passes GNG.
- - Also I would like to clear some misundersding (or shortsighteness) in the above discussion. ① Lay Dam、for example, is in the state of stub since 2012. The stubs I have been creating are of similar nature. ②I think the pile of 20 m high (7 stories high) and 100 m (football field) dirt is surely notable. I plan to add more sources. Note that, the older the dam, its difficult to get sources in mass media. Excuse me for that.
Best regards!nirmal (talk) 07:53, 8 August 2022 (UTC)
- English Wikipedia has had many problems with mass creation of stubs. Just because other stubs exist (and I am not speaking of your efforts) doesn't mean that we should proceed with automated creation. If there is merit, base it on your own work and potential output. Izno (talk) 01:57, 15 August 2022 (UTC)
- I support the idea of consolidating these in lists/tables and creating standalones articles only if GNG is demonstrated. Right now it's unclear what criteria is being used to choose which dams get articles (are we duplicating the entire database?) and I've already found a few that are just small irrigation ponds (Yamanashi Dam, Higashibaru Choseichi Dam) that will have to be cleaned up eventually. This seems to be an area were the encyclopedia should be expanded, but mass-creating articles from a single database source with no evidence of notability is not the proper way to go about it. –dlthewave ☎ 05:57, 30 August 2022 (UTC)
- I want to thank nirmal for seeking consensus before going ahead with this. I support creation of a list article, as others have said, consolidating the information there without needing to create Geostubs. Articles can certainly be created when there is sufficient coverage to meet WP:GNG and jp has demonstrated how stub articles can be created and iterated to good articles, but I note that this was because they diligently did this themself rather than relying on the stubs to be noticed and expanded by anyone else. I would caution against using bots to do the creation of stubs - if the task of creation is overwhelming, then no doubt the task of expansion would be sisyphean. Much better to create a list and then expand out from the list wherever WP:GNG is met. Sirfurboy🏄 (talk) 08:09, 30 August 2022 (UTC)
Two Bot-related RFCs at Wikipedia:Village pump (miscellaneous)/Archive 71#Mass addition of Cleanup bare URLs template
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Please opine. This is a follow up to Wikipedia:Bots/Requests for approval/BHGbot 9 and Wikipedia:Bots/Noticeboard/Archive 16#BHGbot 9 review sought. Headbomb {t · c · p · b} 01:07, 16 August 2022 (UTC)
- Note that both RFCs were disruptive revenge exercises, and were speedily closed as such.
- They were both opened by Headbomb out of pique that my work had not been derailed by Headbomb's multiply-abusive antics at WP:Bots/Requests for approval/BHGbot 9, and by BAG's disgraceful endorsement of those antics at WP:Bots/Noticeboard/Archive 16#BHGbot 9 review sought.
- It's long past time for BAG to eject Headbomb, and to take a long hard look at what led it to repeatedly endorsed his antics. There are some great people on BAG, but collectively BAG was unable to see some severe problems. For example:
- Nobody in BAG could even see a problem in a BAG member describing 100 lines of bot code as
dumb as a brick
. That was patently false, and its only purpose was to try to insult and humiliate the editor who had written the code. - Nobody in BAG saw a problem in Headbomb's decision to spurn my request for further input, by bombarding the discussion and the closing it before any further input was offered.
- Nobody in BAG saw any problem in Headbomb's aggressive disdain for WP:CLEANUPTAG
- Nobody in BAG could even see a problem in a BAG member describing 100 lines of bot code as
- That's only the start of it: a full list of the misconduct by Headbomb would run to several pages. But either of items 1 & 2 above should have been enough for the rest of BAG to suspend Headbomb and intervene to restore proper, civil scrutiny of BRFAs. BAG members should be utterly ashamed of the fact that BAG instead endorsed such misconduct. BrownHairedGirl (talk) • (contribs) 11:25, 29 August 2022 (UTC)
- "disruptive revenge exercises" No, they were not (and revenge for what even?). You were repeatedly told so, and you repeatedly assume bad faith of everyone involved. Headbomb {t · c · p · b} 11:32, 29 August 2022 (UTC)
- Not true, Headbomb. You repeatedly told me that, and your atrocious track record on this issue makes your statements worthless.
- Nobody else at VPM endorsed your actions. Absolutely nobody. Nada, zilch, zero, nothing. An uninvolved editor speedily closed those revenge RFCs.
- And as I told you before, your bogus RFCs were clearly opened as revenge for my disclosure that your abuse of BRFA had not prevented me from using much the same code to remove ~10,000 redundant cleanup banners over 8 months, with no objections or concerns from anyone at all ... at least until you objected at VPM, where you were supported by nobody.
- And no, I do not
repeatedly assume bad faith of everyone involved
. I assume bad faith only of you, because of your long history of it. - Your record is clear, and you are utterly shameless. I see no reason to expect that you will mend your ways, but I have some small sliver of hope that other BAG members will call time on your bullying antics in pursuit of your personal hobbyhorse. BrownHairedGirl (talk) • (contribs) 12:08, 29 August 2022 (UTC)
- PS for the record, here is the point at which I realised that I could no longer sustain any assumption of good faith in Headbomb's conduct towards me.
- On 28/29 December 2021, Headbomb twice accused[13][14] me of seeking to
kneecap
Citation bot (CB). (The full discussion is most easily read in the archives at User talk:Citation_bot/Archive_30#Feature:_usurped_title: search the page for "kneecap" to find the relevant exchange). - Kneecapping is a severe form of malicious wounding, often used as torture. Its objective is to permanently disable the victim.
- However, my proposal wrt CB was not to reduce the bot's capacity in any way. I merely proposed to postpone the addition of extra functionality to CB unless and until CB had the capacity to handle this big extra task.
- So even as metaphor, Headbomb's allegation was wholly misplaced: I proposed no wounding of CB, let alone a malicious wounding, and in no way anything intended as torture.
- It was also highly offensive to equate my civil, reasoned objection with an act of violent torture. That was in no way whatsoever a civil or reasonable response: it was a form of hyperbolic smearing designed to deligitimise my concerns ... a more extreme version of Headbomb's use of aggressive hyperbole at BRFA, when he described my 100 lines of bot code as
dumb as a brick
. - In the CB case, there was an even nastier element to Headbomb's conduct. I am Irish (it's openly noted on my talk page), and kneecapping is a form of torture practised especially by the paramilitary groups which emerged in Northern Ireland during The Troubles, and still practised by some residual thugs. So for an Irish person, that is a particularly offensive allegation ... and given Headbomb's history of aggressive bullying of me, I think it is possible that his choice of such a loaded and geographically-specific term was racially motivated.
- But just as Headbomb has never in any way backed off from his three rounds of unfounded and unprovoked aggression towards me at BRFA, he has not in any way backed off from his smearing of me at User talk:Citation_bot/Archive_30#Feature:_usurped_title. Instead, he is here defending his disruption at VPM, and repeating his bogus denials.
- I did try there to civilly explain to him why this language was completely out of order, but his response[15] was pure gaslighting:
You really need to have a major WP:AGF/reality check
. - 8 months later, Headbomb remain unrepentant about that too.
- Why on earth does WP:BAG still tolerate this vicious person on BAG pages, let alone retain him as a member of BAG and endorse his misconduct? BrownHairedGirl (talk) • (contribs) 13:32, 29 August 2022 (UTC)
- Re: "Revenge", I'll remind you you specifically asked me to open an RFC on the matter. For the rest, you're taking completely innocuous remarks completely out of context. All bots are dumb. This isn't a slight on anyone, it's a reality that all bot coders have to deal with. Likewise, saying 'let's not kneecap citation bot' isn't something that's remotely a slight against the Irish or a threat of violence or whatever it is that you're making it to be. It's an exceedingly common metaphor, used all the time in politics, sports, software etc. (e.g. [16], [17]).
- I'll note that you have a long-established history of ABF. Headbomb {t · c · p · b} 02:16, 30 August 2022 (UTC)
- @Headbomb: there is nothing innocuous about your choices in that BRFA.
- You had made your points, and you had been asked to allow other BAG members to comment.
- Your first act of bad faith was your failure to do that. Instead you stepped in with a pile of aggressive hyperbole designed to insult and demean me. That is classic bullying behaviour, a clear demonstration of your bad faith.
- Even now, 8 months later, you still stand by your blatantly false claim that 105 line of code is
dumb as brick
. No brick runs any code, so that if utterly false. - And your claim that "kneccapping" an
exceedingly common metaphor
is false. It is linked to a section on Wikipedia article which was unsourced. And no, it is notused all the time in politics, sports, software
; your individual examples to not in ay way demonstrate widespread usage, let alone the extreme case you claim that it isused all the time
. - In each case, it was made abundantly clear to you at the time that your remarks were offensive and unfounded. But you persisted, and you have again persisted here.
- That persistence is itself clear evidence of your bad faith: a person acting in good faith would amend their language and terminology to avoid offence damaging the discussion.
- And of course, your language and terminology were only part of the problem. You repeatedly claimed that guidance on cleanup tag usage was my personal view, even tho it was in the template documentation; and when corrected you did retract or strike or apologise, you repeated such false claims in relation to other templates.
- That's your conduct, Headbomb: smear, insult, attack, abuse your power, attack, insult, never concede even when wrong, never apologise.
- Now back to those RFCs.
- Headbomb writes that I
specifically asked me to open an RFC on the matter
. That is only partially true, and mostly false. Yet again, Headbomb is being deceitful. - The linked diff shows that I wrote
The RFC question is simple: "Should we retain at the top of an article a big banner about filling bare URL refs which has already been tagged as unsuitable, and which should therefore be replaced or removed"
- That is the core question, the core of Headbomb's objection ... but it was not asked in either of Headbomb's fake RFCs. Instead Headbomb used the RFCs to challenge my doing this without a bot flag, which was transparently an act of revenge ... which was why the RFCs were speedily closed.
- Note that the speedy closure of these revenge RFCs was appealed by Headbomb to WP:AN, where no action was taken See WP:Administrators' noticeboard/Archive345#Closure_review:_Wikipedia:Village_pump_(miscellaneous)#Mass_addition_of_Cleanup_bare_URLs_template.
- And please note that after over 10,000 edits of me removing the {{Cleanup bare URLs}} banner as an AWB, not one person other than Headbomb has objected. Not even when I posted a list of contribs at VPM for scrutiny.
- This whole saga is all about Headbomb being a bully who abused his position on BAG to try to block my work, and who continues to attack and smear despite having no support at all.
- That is the substance of the whole thing. All of Headbomb's aggression and bullying and violent imagery and insults and abuse of process are about work to which NOBODY else objects ... yet even after 8 months, Heambomb cannot let go. BrownHairedGirl (talk) • (contribs) 07:28, 30 August 2022 (UTC)
- smear, nope haven't done that
- insult, nope haven't done that
- attack, nope haven't done that
- abuse your power, haven't done that either
- attack, see above
- insult, see above
- never concede even when wrong, there's nothing to 'concede' to start with
- never apologise, apologize for what? While you maybe feel you've been attacked, smeared, insulted or whatever, I haven't done so and have acted in good faith towards you the whole time.
- You, however, have one of the worst WP:BATTLEGROUND mentality I've ever seen anywhere on Wikipedia, both at the original BRFA, and here, and your failure to assume good faith is why you perceive everything as an attack, rather than as an attempt to help (however successful I've been at it). This is why we have WP:AGF in the first place and this is why I always and repeatedly invited you to reread what was written in light of it. Headbomb {t · c · p · b} 08:50, 30 August 2022 (UTC)
- In context, the phrase "a bot, i.e. dumb-as-a-brick-no-context-mindless-automaton" is clearly a statement about the limitations of all actually-existing bots as a class, rather than a particular piece of code. William Avery (talk) 02:31, 30 August 2022 (UTC)
- @William Avery
- if it is a statement about bots as a class, it is clearly false. No brick runs 105 lines of code.
- it is wholly inappropriate hyperbole, designed to insult and to inflame rather to inform
- it was posted in direct response to my clear request for
feedback from other BAG members
.
- Instead of stepping back and allowing others to comment, Headbomb piled in to attack me and to bludgeon the discussion.
- Having done several rounds of aggressively dismissing all my comments and questions, Headbomb then engaged in a classic piece of gaslighting: he projected his own aggression and abuse of process onto me, falsely accusing me of having a
WP:BATTLEGROUND mentality
. - After 8 months, neither Headbomb nor any of BAG members who shamefully endorsed Headmbomb's antics has yet provided a clear answer to the simple question at the core of this: "Why should we retain at the top of an article a big banner about filling a bare URL ref which has already been tagged as unsuitable, and which should therefore be replaced or removed". BrownHairedGirl (talk) • (contribs) 07:45, 30 August 2022 (UTC)
- That question as been answered multiple times. The bare URL tag is to flag that bare URLs remain. If bare URLs remain, then the underlying issue hasn't been fixed and the tag should remain and not be removed by bots (and I would argue, meatbots too). This is independent of other issues which may be present in the article. That you don't like that answer doesn't mean you weren't given one. Headbomb {t · c · p · b} 15:33, 30 August 2022 (UTC)
- @William Avery
- "disruptive revenge exercises" No, they were not (and revenge for what even?). You were repeatedly told so, and you repeatedly assume bad faith of everyone involved. Headbomb {t · c · p · b} 11:32, 29 August 2022 (UTC)
I note that this discussion is closed. I just want to place on the record that I reject all of Headbomb's responses. --BrownHairedGirl (talk) • (contribs) 20:22, 30 August 2022 (UTC)
Request review for possible bot task for fixing dead URLs
...at User:FABLEBot/New URLs for permanently dead external links. — Qwerfjkltalk 20:41, 29 August 2022 (UTC)
- Thank you @Qwerfjkl!
- Everyone else: For more context, please see Wikipedia:Village pump (miscellaneous)/Archive 72#Request for feedback on research project for fixing dead links HarshaMadhyastha (talk) 18:09, 31 August 2022 (UTC)
Issue with File:NWA 74 Anniversary.jpg
There is currently an issue with File:NWA 74 Anniversary.jpg. Namely, an edit by DeltaQuadBot is followed by an edit by Liz, which is then followed by an edit by JJMC89 bot, and the pattern has been repeated two more times for a total of three. If we do nothing, then the pattern will just keep on repeating every week. Perhaps, Liz tried using User:Legoktm/rescaled.js three times, but the script failed to revision delete the 00:06, 29 June 2022 and 18:12, 18 August 2022 versions all three times. If this fails a fourth time, then either a Phabricator task will need to be created, or an administrator should try revision deleting the older versions manually. GeoffreyT2000 (talk) 04:57, 12 September 2022 (UTC)
- I've deleted the 18 August version, which might be enough to stop the warring bots. Not much confidence in that.The problem here's the 29 June version, which produces a File not found: /v1/AUTH_mw/wikipedia-en-local-public.2e/archive/2/2e/20220818181232%21NWA_74_Anniversary.jpg error when clicked on, and a 'Revision visibility could not be updated: The file "mwstore://local-multiwrite/local-public/archive/2/2e/20220818181232!NWA_74_Anniversary.jpg" is in an inconsistent state within the internal storage backends' error when I try to delete the file content. —Cryptic 05:10, 12 September 2022 (UTC)
- The file looks like it was overwritten which is generally not a good thing to do in the case of non-free content (exceptin the case of a relatively minor change) because older unused versions are eventually going to be deleted per WP:F5. Maybe the thing to do here would be to split the files, tagged the old version with {{npd}} and then see what happens. If nobody tries to use the older version, it will be deleted, leaving only the new version to be used in the article. -- Marchjuly (talk) 07:12, 12 September 2022 (UTC)
- Oh, my. Here is where my technical ignorance about files becomes apparent. Each night, I respond to files tagged by User:DeltaQuadBot that show up in the Category:Non-free files with orphaned versions more than 7 days old needing human review category. I have installed the tool that deletes previous revisions. That's about all I know about this situation. I do not know why this sequence would be repeating itself. But if I don't "rescale" the photos, there are other admins who will, I just usually get to them first. Has what you've done, Cryptic and Marchjuly, fixed this problem? I've been handling these files for over a year now without this problem coming up before. The only problem I run into is that the script doesn't handle .ogg files which have to be deleted manually. Liz Read! Talk! 07:21, 12 September 2022 (UTC)
- I'm not an admin; so, I can't really split a file. I'm not even sure if splitting is even possible, but it's the first thing I thought of when I saw the versions were fairly different. Maybe Graeme Bartlett can help sort this out since he's helped me before regarding overwritten non-free files. -- Marchjuly (talk) 07:27, 12 September 2022 (UTC)
- Well I can split the file if you like. Another way to go is to actually delete the whole thing and start again under a new name. Then corrupt entries should not cause a stuff up. ANd if it is just bots playing up, we can add {{nobots}} and just correct issues manually.Graeme Bartlett (talk) 07:50, 12 September 2022 (UTC)
- I have deleted old revision rather than hiding, so see what bots do about this. Graeme Bartlett (talk) 07:56, 12 September 2022 (UTC)
- Thanks for taking a look at this Graeme. -- Marchjuly (talk) 11:38, 12 September 2022 (UTC)
- Aha, I hadn't thought to try that.FWIW, the image does display ok from the Special:Undelete/File:NWA 74 Anniversary.jpg interface; perhaps unwisely, I attempted to undelete it so that it could be hidden properly, on the theory that it had been fixed, and got the same "inconsistent state" error message as when I tried to hide the file version. Oh well. I am confident that the bots will now leave it be, at least. —Cryptic 15:00, 12 September 2022 (UTC)
- I'm not an admin; so, I can't really split a file. I'm not even sure if splitting is even possible, but it's the first thing I thought of when I saw the versions were fairly different. Maybe Graeme Bartlett can help sort this out since he's helped me before regarding overwritten non-free files. -- Marchjuly (talk) 07:27, 12 September 2022 (UTC)
- This is a known issue: phab:T291137, phab:T244567. As usual, no interest from the WMF devs in providing a fix. -FASTILY 09:35, 12 September 2022 (UTC)
- Thanks for that info Fastily. -- Marchjuly (talk) 11:38, 12 September 2022 (UTC)
- Oh, my. Here is where my technical ignorance about files becomes apparent. Each night, I respond to files tagged by User:DeltaQuadBot that show up in the Category:Non-free files with orphaned versions more than 7 days old needing human review category. I have installed the tool that deletes previous revisions. That's about all I know about this situation. I do not know why this sequence would be repeating itself. But if I don't "rescale" the photos, there are other admins who will, I just usually get to them first. Has what you've done, Cryptic and Marchjuly, fixed this problem? I've been handling these files for over a year now without this problem coming up before. The only problem I run into is that the script doesn't handle .ogg files which have to be deleted manually. Liz Read! Talk! 07:21, 12 September 2022 (UTC)
Misplaced BRFA
Hello, I just found Wikipedia:Bots/Requests for approval/Santali MessageDeliveryBot which was created a couple of days ago but wasn't transcluded onto the main approval page. What's the accepted way of dealing with this kind of thing, the request is on the wrong project. 163.1.15.238 (talk) 14:36, 9 September 2022 (UTC)
- If the bot is meant to operate on the Santali Wikipedia, then the bot operator can add
{{BotWithdrawn}}
to the request. If the bot is meant to run on the English Wikipedia, the bot operator would follow the instructions at Wikipedia:Bots/Requests_for_approval section II: "Your request must now be added to the correct section of the main approvals page". GoingBatty (talk) 19:54, 11 September 2022 (UTC)- I concur with the latter part of GoingBatty's statement - there are all sorts of reasons why a BRFA subpage/request would exist but not yet be transcluded onto WP:BRFA. In other words, just leave well enough alone; it's not like it's doing any harm sitting there. Primefac (talk) 09:16, 21 September 2022 (UTC)
bot password
Hello. I have been running a pywikibot (on Marathi wikipedia) from toolforge since a few months now. Using the same bot (which is unapproved on enwiki), I successfully made two edits in userspace (Special:Contributions/KiranBOT_II). But when I am trying to make edits using other bot account (which is approved here — User:KiranBOT), I am getting an error stating "incorrect bot password" even though it was correct. Just to be sure, I changed the Special:BotPasswords, but I am still getting the same error. Any help would be appreciated a lot. —usernamekiran (talk) 07:55, 24 September 2022 (UTC)
- @Usernamekiran sorry I'm a little lost. You are talking about 2 different accounts, correct? (User:KiranBOT II and User:KiranBOT). Which one is having a log on problem, and on which project? When you log on to the webui, as the bot account and go to Special:BotPasswords - you should see the grant you made. Click on the grant and validate the username (it will look like:
Fluxbot@FluxbotAWB
. Make sure you are using that entire part as the username, and it is case sensitive. — xaosflux Talk 14:53, 24 September 2022 (UTC)- @Xaosflux: Yes, two different user accounts. I'm trying to edit from toolforge. KiranBOT is approved on both wikis, and KiranBOT II is approved only on mrwiki, but I can perform edits through that account on enwiki as well. For editing through KiranBOT on enwiki, I kept all the format/syntax exactly the same except for changing the username, botname, and bot password. When I enter the command on terminal/shell (using putty), I get the message that my password is incorrect, and then it prompts me to enter the password for "KiranBOT@AWB", even after entering the correct password (tried both the formats), it says password is incorrect, and login gets "aborted". Or is there some process that I completely forgot? I went through pywikibot guide, but it seems like I'm doing everything right. —usernamekiran (talk) 16:07, 24 September 2022 (UTC)
- @Usernamekiran sorry some of these pronouns still seem ambiguous (You have a sentence with 2 usernames in it, and then say "that account"). Exactly which account are you unable to log on to, on which project? For the account you can not log on to via the api, are you able to manually log on to it on that project via the webui? — xaosflux Talk 16:26, 24 September 2022 (UTC)
- @Xaosflux: ouch.
- KiranBOT_II: approved on mrwiki, but not approved on enwiki. I can login, and edit with this bot on both wikis using botpassword.
- KiranBOT: approved on both wikis, just edited on enwiki using webui. I kept all the settings same as of KiranBOT II, but still can't login to KiranBOT. —usernamekiran (talk) 17:30, 24 September 2022 (UTC)
- @Usernamekiran ok, while logged in as User:KiranBOT via webui, go to Special:BotPasswords; go to your grant, just delete it and recreate it. When it is recreated make sure in the "Allowed IP ranges" at the bottom it has:
0.0.0.0/0
::/0
- This is probably the "quickest" way to see if it is working. You can try something besides your script to log on as well (like AWB for example). — xaosflux Talk 17:50, 24 September 2022 (UTC)
- already changed, then deleted/recreated the botpass before posting here. Will login to AWB tomorrow from computer. Ciao. —usernamekiran (talk) 18:19, 24 September 2022 (UTC)
- @Xaosflux: Hi. I dont know what happened, but I logged in to AWB, where I had to enter the username as
KiranBOT@AWB
. I did not perform any edits at that time. I then tried to edit through toolforge/terminal again, and without any changes at all, I edited successfully. So I don't understand what that logging in problem was. The password line in my file is in('Fluxbot', BotPassword('FluxbotAWB', 'botpassword'))
format. In short, yesterday it was saying incorrect password with the exact same settings as of today. —usernamekiran (talk) 17:16, 25 September 2022 (UTC)- @Usernamekiran the only thing that comes to mind right now is that your pywiki instance had a bad cookie that was stuck and may have finally been replaced. — xaosflux Talk 17:21, 25 September 2022 (UTC)
- yes, that could be possible. —usernamekiran (talk) 17:53, 25 September 2022 (UTC)
- @Usernamekiran the only thing that comes to mind right now is that your pywiki instance had a bad cookie that was stuck and may have finally been replaced. — xaosflux Talk 17:21, 25 September 2022 (UTC)
- @Xaosflux: Hi. I dont know what happened, but I logged in to AWB, where I had to enter the username as
- already changed, then deleted/recreated the botpass before posting here. Will login to AWB tomorrow from computer. Ciao. —usernamekiran (talk) 18:19, 24 September 2022 (UTC)
- @Usernamekiran sorry some of these pronouns still seem ambiguous (You have a sentence with 2 usernames in it, and then say "that account"). Exactly which account are you unable to log on to, on which project? For the account you can not log on to via the api, are you able to manually log on to it on that project via the webui? — xaosflux Talk 16:26, 24 September 2022 (UTC)
- @Xaosflux: Yes, two different user accounts. I'm trying to edit from toolforge. KiranBOT is approved on both wikis, and KiranBOT II is approved only on mrwiki, but I can perform edits through that account on enwiki as well. For editing through KiranBOT on enwiki, I kept all the format/syntax exactly the same except for changing the username, botname, and bot password. When I enter the command on terminal/shell (using putty), I get the message that my password is incorrect, and then it prompts me to enter the password for "KiranBOT@AWB", even after entering the correct password (tried both the formats), it says password is incorrect, and login gets "aborted". Or is there some process that I completely forgot? I went through pywikibot guide, but it seems like I'm doing everything right. —usernamekiran (talk) 16:07, 24 September 2022 (UTC)
FYI BernsteinBot
BernsteinBot's operator has decided to stop running this bot, it mostly makes reports (Wikipedia:Bots/Requests for approval/BernsteinBot). — xaosflux Talk 14:19, 11 October 2022 (UTC)
Can a BAGger please get around to looking at this request? It has been sitting in the queue for a month and a half at this point. If it needs additional rationale, I am happy to provide. Izno (talk) 21:35, 20 October 2022 (UTC)
- I have added a {{BAG assistance needed}} tag to the request, mentioning this comment. William Avery (talk) 18:19, 24 October 2022 (UTC)
A question
I was asking here if AWB could do this but I would also love to know if there is any bot that can help me add a template to all the articles in a specific category. I have been searching tool hub to no avail.
Danidamiobi (talk) 11:57, 28 October 2022 (UTC)
- I have answered your question there. I have also changed your link to link directly to the discussion in question; if this is not desired please feel free to revert me. Primefac (talk) 13:58, 28 October 2022 (UTC)
Cewbot malfunctioning
There is currently an ANI thread regarding Cewbot malfunctioning and and demoting around 6,000 articles. ― Blaze WolfTalkBlaze Wolf#6545 19:12, 11 November 2022 (UTC)
- Thanks, looks like it was blocked, that the operator has been contacted, and is responsive to working on a repair. — xaosflux Talk 22:12, 11 November 2022 (UTC)
Global bot approval request for JhsBot
Hello!
In accordance to the policy, this message is to notify you that there is a new approval request for a global bot.
The discussion is available at Steward requests/Bot status#Global bot status for JhsBot on Meta-Wiki. All Wikimedia community members are invited to participate.
Thank you for your time.
Best regards,
--Martin Urbanec (talk) 23:25, 13 November 2022 (UTC)
- For the record: As far as I can tell, this bot would not do anything on enwiki. It's intended for fixing links from newly-created wikis to Wikidata. Anomie⚔ 01:19, 14 November 2022 (UTC)
BRFA is backlogged
Hey folks, just a friendly reminder that there's a growing backlog at BRFA. Several requests have completed trial period and there are a few in-trial requests that appear to be stale. Additional eyes on these would be appreciated. Courtesy pings: @Cyberpower678, @Enterprisey, @Headbomb, @HighInBC, @MBisanz, @MusikAnimal, @Primefac, @ProcrastinatingReader, @SD0001, @TheSandDoctor, @Xaosflux, @Anomie, @Hellknowz, @Jarry1250, @MaxSem, @Slakr, @The Earwig, @Tawker -FASTILY 02:05, 14 November 2022 (UTC)
- As explanation (not excuse), I generally deal with BAG-related matters on the weekends, and I have been away for the last few. Should hopefully get time this weekend. Primefac (talk) 13:39, 14 November 2022 (UTC)
- IMO more BAG members never hurts. If anyone (maybe you?) is interested in bots and reviewing some BRFAs: Wikipedia_talk:Bot_Approvals_Group#Requests_for_BAG_membership
- (I imagine Primefac will be especially happy to have more reviewers, and a few more hours in his weeks/months ;) ) ProcrastinatingReader (talk) 22:27, 15 November 2022 (UTC)
- I'm sure the dog will be happy for that too :-p Primefac (talk) 08:43, 16 November 2022 (UTC)
Does anyone know why this discussion was not advertised here? I know I would have liked to be made aware of it. Headbomb {t · c · p · b} 19:35, 17 November 2022 (UTC)
- I find it helpful to have {{Centralized discussion}} on my watchlist. – Jonesey95 (talk) 21:08, 17 November 2022 (UTC)
- It was advertised at Wikipedia talk:Bot policy though. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 02:21, 18 November 2022 (UTC)
Unauthorized bot making WP:NOTBROKEN edits
User:Alaa Aly is a unauthorized bot and should be blocked. They are unnecessarily fixing wikilinks to redirects and making edits every 10 seconds, which is not possible for a human. THe edit summary also says "Bot edit" RoostTC(ping me!) 09:56, 12 December 2022 (UTC)
- I brought it to Wikipedia:Administrators'_noticeboard/Incidents#Please_block_User:Alaa_Aly_(currently-active_unapproved_bot) which is where you should go for quick intervention of that sort. The bot has been shut down but Alaa Aly would be well-inspired to explain what they thought they were doing. TigraanClick here for my talk page ("private" contact) 10:23, 12 December 2022 (UTC)
Do large language models and chatbots (like ChatGPT) fall under the bot policy?
See these current discussions:
- Wikipedia:Village pump (policy)#Wikipedia response to chatbot-generated content
- Wikipedia:Village pump (policy)#Large language models: capabilities and limitations
There's disagreement as to whether they fall under the bot policy and BAG's jurisdiction. — The Transhumanist 12:05, 14 December 2022 (UTC)
- Seems like you could as well ask "Does Python fall under the bot policy?". Some things done with it do fall under the policy, but it's possible to use it in ways that don't too. Or are we concerned that these AI projects are going to start editing on their own, rather than being copy-pasted by a human or used by a human-written bot? Anomie⚔ 13:01, 14 December 2022 (UTC)
- Agreed. The extent to which BAG has jurisdiction over users' actions depends on the scale and frequency of the operation. Although ChatGPT is a "chatbot", content produced by it is not necessarily under bot policy. 0xDeadbeef→∞ (talk to me) 13:25, 14 December 2022 (UTC)
- I posted a couple times on the thread trying to elaborate, but it's a confusing discussion because it throws together a lot of concepts. It's not so much disagreement there, as people are talking about different things and doing a bit of unintended synthesis. Questions like "does LLM fall under bot policy" is indeed--like Anomie well put it--meaningless without specifics. It's not a blanket binary choice. The term "bot" is also irrelevant because bot policy talks about automated, semi-automated and large-scale manual bot-like editing, regardless if you call it "bot", "script" or "very fast copy-pasting". The policy is also deliberately vague because it's impossible to account for every novel way editors manage to mechanize their editing. LLM is just a tool and it falls under the bot policy like any other tool does - when it starts becoming editing beyond what a human could (and, practically, would) review. In practice, large-scale editing gets noticed, issues get discovered, it goes through places like ANI and then ends up here. BAG's jurisdiction is actually very limited beyond BRFA approvals. Any sysop can enforce bot policy, but in practice it's all very much "if it doesn't hurt anyone..." — HELLKNOWZ ∣ TALK 14:31, 14 December 2022 (UTC)
- Yeah, I think I should have used "extent to which the bot policy applies" instead of about BAG jurisdiction which should just be about BRFAs and handing out bot flags. 0xDeadbeef→∞ (talk to me) 14:55, 14 December 2022 (UTC)
- For the most part, I'd say no. The bot policy is largely about how we manage trusted, useful, repeated, automated processes that have community support. Now, should someone want to use such a framework to make contirbutions - they would need to show that those criteria are all met. — xaosflux Talk 14:59, 14 December 2022 (UTC)
Mistakenly ran semi-automatic edits on bot account
- @Tol thanks for the note; I don't see anything wrong with the actual edits themselves that would require other editors to go give them extra recent changes scrutiny this time - glad you caught this and have a plan to prevent it from reoccurring. — xaosflux Talk 03:47, 18 December 2022 (UTC)
- No problem; I thought (& hope) this would be the most appropriate place to let other people know. Thanks! Tol (talk | contribs) @ 03:54, 18 December 2022 (UTC)
How to make it obvious that you are logged in to your bot account
The post above makes me think that there could be some standard way to set up a bot account so that it is more clear to the bot owner that they are logged in to their bot account instead of their regular account. Do any bot owners use custom CSS or other styling to alert themselves that they are not logged in to their personal account? – Jonesey95 (talk) 14:49, 18 December 2022 (UTC)
- It sounds like the above was done via the API, so CSS wouldn't have helped there. — xaosflux Talk 21:37, 18 December 2022 (UTC)
- I use the Modern skin on my bot account so that I'm reminded it's not my main account on the rare occasions when I do a web login to that account. – SD0001 (talk) 13:11, 19 December 2022 (UTC)
Bots, AWB, and 'crats
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
There is a discussion at Wikipedia talk:Requests for permissions § AWB and bot access about the appropriate venue for requesting AWB access for bots. Your input is requested. Primefac (talk) 16:28, 8 January 2023 (UTC)
Remove bot flag from two bots
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
- LaraBot (t · th · c · del · cross-wiki · SUL · edit counter · pages created (xtools · sigma) · non-automated edits · BLP edits · undos · manual reverts · rollbacks · logs (blocks · rights · moves) · rfar · spi · cci)
- BernsteinBot (t · th · c · del · cross-wiki · SUL · edit counter · pages created (xtools · sigma) · non-automated edits · BLP edits · undos · manual reverts · rollbacks · logs (blocks · rights · moves) · rfar · spi · cci)
Please remove the bot flags from LaraBot and BernsteinBot. I don't know what the passwords for these accounts are off-hand and I have no intention of using them again. We should independently fix the bot policy, but my request got a bit buried by me noticing that the bot policy is goofy and not in alignment with actual practice. --MZMcBride (talk) 16:01, 20 January 2023 (UTC)
Another bot flag removal request
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
- WelcomerBot (t · th · c · del · cross-wiki · SUL · edit counter · pages created (xtools · sigma) · non-automated edits · BLP edits · undos · manual reverts · rollbacks · logs (blocks · rights · moves) · rfar · spi · cci)
As prompted by Izno, please can you remove the bot flag from User:WelcomerBot? The functionality of the bot has been completely replaced within the ACC tool and edits are now done via OAuth instead stwalkerster (talk) 23:31, 20 January 2023 (UTC)
- Done thanks for the note @Stwalkerster:. — xaosflux Talk 23:58, 20 January 2023 (UTC)
AnomieBOT Disruptive bot edits and dismissive operator
- AnomieBOT (t · th · c · del · cross-wiki · SUL · edit counter · pages created (xtools · sigma) · non-automated edits · BLP edits · undos · manual reverts · rollbacks · logs (blocks · rights · moves) · rfar · spi · cci)
User:AnomieBOT has a pattern of "resucing" references by finding a similarly named reference anchor in the history of an article, or in some related article, and injecting it as a replacement for an undefined reference. Sometimes, this works fine. Other times, it intorudces a referenes that's completely irrelevant or inappropriate for the context. Of course, any reader, it just looks like a reference.
As far as I can tell, edits made by this bot aren't reviewed by any human. The bot churns out these edits mechanically, and they're unlikely to be caught unless someone happens along to check the "resuceing" edits the bot has made.
I've cataloged errors made by this bot and others, but received no response from pings in those notes. I have also directly reported these problems to the operator and received a very curt and dismissive reply, inviting me to -- myself! -- review the edits the bot was making.
For sure, the bot is trying to do something very aggressive because it's hard to verify its correctness. But because it is operating unsupervised and trusted to be helpful, it should be held to a high standard. If it can't guarantee its correct behavior, it shouldn't take action at all.
If the bot is making bad edits and the author isn't interested in addressing the issue or even directly monitoring the bot's actions, what can be done? -- Mikeblas (talk) 19:26, 15 January 2023 (UTC)
- @Mikeblas the page you linked to shows rare, rather stale, edits. Is this bot currently making bad edits (i.e. an edit that would be reverted immediately if it were made by a non-bot editor)? Please provide a few very recent diffs. — xaosflux Talk 19:30, 15 January 2023 (UTC)
- Operator notified of discussion. — xaosflux Talk 19:33, 15 January 2023 (UTC)
- I've linked several pages. You must mean the catalog? "Directly reported" references an edit I fixed this morning, along with another that I brought to the operator's attention last month. This issue is ongoing and has been happening for at least a couple of years, which is part of the point. -- Mikeblas (talk) 19:35, 15 January 2023 (UTC)
- @Mikeblas please provide a few recent diffs showing "disruptive" edits here so we are all on the same page. Don't need a big explanation right now, just a few diffs from this year will be fine. — xaosflux Talk 21:51, 15 January 2023 (UTC)
- I've linked several pages. You must mean the catalog? "Directly reported" references an edit I fixed this morning, along with another that I brought to the operator's attention last month. This issue is ongoing and has been happening for at least a couple of years, which is part of the point. -- Mikeblas (talk) 19:35, 15 January 2023 (UTC)
- Here are the ones I linked above, explicitly copied here
- Here's one from Harry Styles: irrelevant reference injected
- From Anuradha Koirala: user erroneously edited reference name; AnomieBOT claims a fix, but just removed a space from the bogus text and copied bogus text to subquent invocation of that ref
- Again from Anuradha Koirala: disrupts reference name again ; AnomieBOT fixes by removing some spaces but error remains; user correctly removes bogus reference def
- From Iranian Revolution: user copies text with references (but not reference definitions) from another article; AnomieBOT arbitrarily copies an irrelevant reference previously deleted from this article; Proper fix gets reference from source of original copy
- Here are the ones I linked above, explicitly copied here
- Here are some new ones:
- From Lewisville, Texas: user removes reference for 2019 stats, adds 2021 stats without references; AnomieBOT replaces the 2019 reference, which doesn't support the 2021 numbers
- At Lviv Oblast: user partially updates population and reference-generating template causing undefined ref error; AnomieBOT replaces missing reference with dead link to oudated stats from another article that doesn't support the content
- At List of best-selling albums of the 21st century in the United Kingdom: user mistakenly (?) removes several entries, references; AnomieBOT replaces only the reference to help the mistake (?) blend in
- At Steamboat Willie: user removes IMDB as a non-RS, but incompletely; AnomieBOT replaces it
- At Snoqualmie Valley School District: user updates stats with undefined source; AnomieBOT places reference for stats -- from a completely different school district
- At 2023 Nigerian Senate election: user tries to update, provides results without defining reference; AnomieBOT places reference from a different Nigerian state, no mention of this state and no support for this state's election results
- at 2023 24 Hours of Daytona: User tries to add driver and team, gives bad reference; AnomieBOT chooses a reference from another article that mentions neither that driver, nor this race. The user probably meant "TurnerBMWLineups".
- At Crystal Mountain, Michigan: user slams an update by changing stats, then renaming the reference anchor name; rescue just slams another use of the old name to the new name, so the new stats are falsely referenced and the old stats have an anachronistic anchor name
- At Indian Armed Forces: user forces an update with new stats but no references, just renames existing reference anchor; AnomieBOT finds another use of the old reference name and updates it to match the new one, even though it still refers to the old stat
- At Holmdel Township, New Jersey: User tries to update population and density, makes a mistake or two; AnomieBOT replaces partially deleted references with old links that don't support the 2019 population they claim to reference, probably should've deleted the invocation of the reference and the outdated values instead
- -- Mikeblas (talk) 04:08, 16 January 2023 (UTC)
- Here are some new ones:
- Also, thanks for notifying the operator. The header here says
by following the steps outlined in WP:BOTISSUE
, but I didn't see any specific steps (certainly not any kind of numbered list), and wasn't sure if notification was required or how it was required to be done. -- Mikeblas (talk) 19:37, 15 January 2023 (UTC)
- Also, thanks for notifying the operator. The header here says
- I note Mikeblas has been fairly tendentious on this topic; for some time he was in the habit of edit-summary pinging me every time he reverted an edit by the bot that he did not like, even simple reversions of the bot's edit concurrent with reversion of IP vandalism. This is also the first I've heard of this "User:Mikeblas/Robots Behaving Badly" page, nor do I see any evidence that I was ever pinged on that page. Overall the tenor of complaints I've seen from this user suggests to me that they expect human-level AI from the bot, which is IMO an unreasonable request. The request here and at the bot's talk page to "stop the bot" or convert it to a semi-automated task similarly seems far beyond what is reasonable considering how often other editors have specifically thanked me or the bot for the service it provides.On the other hand, I have been considering looking into the error rate related specifically to orphaned references rescued under Wikipedia:Bots/Requests for approval/AnomieBOT 6, to decide whether that specific portion of the task should be stopped or modified to also check for the source article containing text similar to adjacent text in the article being processed. I haven't yet found time to actually conduct this analysis or look into the feasibility of such modification. If anyone unbiased (e.g. not Mikeblas) feels like conducting that analysis, I'd appreciate the data. Anomie⚔ 20:01, 15 January 2023 (UTC)
- I usually mention whomever's work I'm editing in edit summaries. This gives them the opportunity to review my revisions of their work and re-check my changes. I also think that Wikipedia will automatically notify a user when their contributions are reverted or undone.
- My expectation for the bot (and any other, as well as any other user) is that they'll do what's possible to avoid making regressive or disruptive edits. Mistakes happen when people make edits, and we forgive those and try to repair them; but we do rely on policies to decide what to do when that behaviour isn't corrected or appears malicious. Bots are held to a higher standard because we expect them to do work for us, and correct them promptly if they're doing something wrong.
- Aside from the inappropriate reference insertion I raise here, I often discover the bot is cleaning errors in vandalism edits rather than cleaning the vandalism itself. And so it only follows that the bot's edits are reverted along with the vandalism.
- This bot is somehow (I don't know how) tasked with "rescuing" referencing errors. This task itself requires far more than simple algorithmic replacement or reaction, and instead demands that the bot know what a reference means in the context of the article, gauge its propriety, understand the previous editor's intention about editing or removal, choose an appropriate replacement (or other action), and evaluate: does it make the situation any better?, and so on. Nothing like fixing a date format or deleting hyphens, or even rebalancing parenthesis. The "rescuing" task is an ambitious order, but certainly is not something that I established or expected. OTOH, since the robot is trying to do it, anyone should expect it to do it correctly and reliably.
- Fact is this bot often violates that standard and I've provided feedback to that effect. Perhaps worse, it's making edits that have subjective results (or, at least, unclearly defined outcomes) and its efficacy and safety aren't being monitored or checked. In what percentage of its edits is it being productive or disruptive? We don't objectively know. Maybe I'm wrong in that, but I don't think it's a tendentious position.
- When pointing out these errors, I've met with excuses (just responding to "garbage input") and dismissal as in the current examples. That the bot's owner actively explains they're not listening to me (biased? because I've reported and documented problems?) and themselves won't monitor the bot at all unless someone else volunteers to do it indicates they're not a responsible bot operator.
- Letting the bot make automated edits, dismissing and ignoring feedback, and "considering looking into" problems until some volunteer offers to own the task on their behalf doesn't seem like a responsible way to run a bot. Instead, the position should be to not release wide-spread active behaviour until that behaviour can be verified reasonably correct. Or, to set aside their own bias and face the realization (and feedback) that their goal is too ambitious to be completed reliably without human intervention or any monitoring of its outcome and instead abandon it. -- Mikeblas (talk) 20:57, 15 January 2023 (UTC)
- You have just posted another 513 words of generic complaint without the diffs of recent problems that you were asked for above. Johnuniq (talk) 01:05, 16 January 2023 (UTC)
- After reviewing the so called 'bad' edits, I can't say I see any problem with them. AnomieBot is functionally exactly as intended. E.g. someone removes the 'declared' instance of a reference that's reused multiple times in an article, causing issues like [18] without removal all references to it. AnomieBot then rescues the reference, solving the issue. This is neither disruptive no regressive. Headbomb {t · c · p · b} 05:45, 16 January 2023 (UTC)
- That specific edit is a valid complaint, and is why I'm considering disabling or changing the AnomieBOT 6 task. The problem is that the generic reference name "Canada" was used for an unrelated reference in just one linked article; no idea where the IP got the reference name from in Special:Diff/1124983363. One of the other linked complaints is the same. OTOH, other of the complaints are less valid. For example Special:Diff/1102679930 did fix an error, just not in the "right" way; in Special:Diff/1133658542 Mikeblas would rather have had the orphaned copy removed instead of rescued; and in Special:Diff/1133439631 he'd rather the IP edit had been reverted (I guess). Anomie⚔ 13:22, 16 January 2023 (UTC)
- I'll once again dubtifuly AGF in face of the sneering response and spell out the details:
- Special:Diff/1102679930 made an error message go away, but didn't fix or objectively improve anything. It's pretty obvious that the original root cause was the addition of article prose to the name of a reference anchor. Instead of making the prose visible, the bot codified the error by reusing the article prose in other invocations of the anchor:
<ref name="Anuradha Koirala named CNN Hero of the Year 2010 In 1993, Maiti Nepal started with two rooms to protect women from abuse and trafficking. After establishing Maiti Nepal, Ms. Koirala plunged into the service of humanity. Her first work was setting up a home so that women and girls who have nowhere else to turn could find themselves a place to call theirs. After almost three decades today, Maiti Nepal has one Prevention Home, sixteen Transit Homes, two Women Rehabilitation Homes, one Child Protection Home, two Hospice Centers, one Information and Surveillance Center at Ministry of Foreign Affairs (MoFA) and a Formal School (Teresa Academy). More than 1000 children receive direct services from Maiti Nepal every day. All of these were possible because of Ms. Koirala’s firm determination and unprecedented leadership." />
- Ignoring the more obvious cause, WP:ILCLUTTER explains that long anchor names aren't appropriate. Why would the bot enforce them? Now, a human must come along and clean up both the robot's meddling and the original error.
- In Special:Diff/1133439631, the anonymous edit was indeed unreferenced -- the new 2020 stats use a reference from 2010 that doesn't support their verifiability. AnomieBOT abetted the addition of unreferenced material by correcting the newly mismatched anchor name. Why is it acceptable to update objective statistics but not update their sources?
- In Special:Diff/1133658542 a user was trying to remove a reference to IMDB, which is a listed WP:USERGENERATED source. But AnomieBOT insisted on keeping the reference present and subverted them.
- Special:Diff/1102679930 made an error message go away, but didn't fix or objectively improve anything. It's pretty obvious that the original root cause was the addition of article prose to the name of a reference anchor. Instead of making the prose visible, the bot codified the error by reusing the article prose in other invocations of the anchor:
- In every one of these cases, AnomieBOT demonstrates that it does not (because it largely can not) consider user intent or surrounding context when making edits. Instead of realistically acknowledging that it doesn't have a high probability of making a constructive edit, it instead brashly assumes it knows best and takes action without regard to the outcome.
- I strenuously object to this pattern. (And, sorry for the awkward anthropomorphic wording. I just can't figure any better way to write it just now.) -- Mikeblas (talk) 16:37, 16 January 2023 (UTC)
- I'll once again dubtifuly AGF in face of the sneering response and spell out the details:
- In the Harry Styles example, the article claimed the Harry's House album was certified platinum by Music Canada, but used an undefined reference anchor name to do so. AnomieBOT inserted a new reference to an article about Amazon using Alexa brining its Prime Music service to Canada -- with no mention of Harry's House or its certifications, or even Music Canada itself.
- Can you help me understand why you think mechanically adding completely irrelevant references is acceptable? -- Mikeblas (talk) 16:05, 16 January 2023 (UTC)
- That specific edit is a valid complaint, and is why I'm considering disabling or changing the AnomieBOT 6 task. The problem is that the generic reference name "Canada" was used for an unrelated reference in just one linked article; no idea where the IP got the reference name from in Special:Diff/1124983363. One of the other linked complaints is the same. OTOH, other of the complaints are less valid. For example Special:Diff/1102679930 did fix an error, just not in the "right" way; in Special:Diff/1133658542 Mikeblas would rather have had the orphaned copy removed instead of rescued; and in Special:Diff/1133439631 he'd rather the IP edit had been reverted (I guess). Anomie⚔ 13:22, 16 January 2023 (UTC)
- After reviewing the so called 'bad' edits, I can't say I see any problem with them. AnomieBot is functionally exactly as intended. E.g. someone removes the 'declared' instance of a reference that's reused multiple times in an article, causing issues like [18] without removal all references to it. AnomieBot then rescues the reference, solving the issue. This is neither disruptive no regressive. Headbomb {t · c · p · b} 05:45, 16 January 2023 (UTC)
- You have just posted another 513 words of generic complaint without the diffs of recent problems that you were asked for above. Johnuniq (talk) 01:05, 16 January 2023 (UTC)
I have provided the requested examples and answered other questions, but haven't heard anything back. What are the next steps? -- Mikeblas (talk) 18:50, 21 January 2023 (UTC)
- Walk away and find an issue that can actually be solved? No bot is perfect, and expecting perfection is unreasonable. Your primary concerns have been responded to, and now you are just nitpicking. Primefac (talk) 19:25, 21 January 2023 (UTC)
- It's hard for me to see either the injection of irrelevant references or the automated undoing of user-intended beneficial edits as "nitpicking". 130-word reference anchor names aren't any kind of "fix".
- The solution is easy: if a bot can't make edits reliably, then it shouldn't attempt them ... particularly when no human is actively monitoring the edits to try and improve the behaviour of the bot.
- Let's try it another why: why is your position is that the status quo can't be improved upon at all, and asking for improvement is to be met with outright dismissal? Why isn't it important to conisder that bots could be made better -- both more accurate and less invasive? -- Mikeblas (talk) 23:31, 21 January 2023 (UTC)
- Bots can be made better - to a point. Errors, especially when dealing with this particular subject, are inevitable. Most of the examples you give in the "Here are some new ones" list are GIGO situations; predicting, let alone fixing, these types of errors is unreasonable. Additionally, the point of this bot is not to evaluate orphaned references, it is to fix them, so the argument that "restoring non-RS isn't acceptable" is also unreasonable (specifically w.r.t. the IMDb ref you mention). The "horribly long named reference" issue also falls under that umbrella: it's not the bot's job to evaluate the names of refs.
- Out of curiosity I looked at the bot's contributions for the period you pulled your examples from (13-15 January). Out of 172 orphaned references that were fixed, you listed 10, which is a 5.8% failure rate, which to me is perfectly acceptable for a task such as this.
- So yes, the status quo can be improved, to a point. Asking for improvement is acceptable, to a point. Telling a bot operator to
turn off the robot until its correctness can be verified
(source) when it is already operating at a 94.2% accuracy rating is not acceptable. To paraphrase/build on what Anomie said on the bot's talk page - if there is a specific or often-repeated issue that needs to be looked at, that's one thing, but "this bot isn't perfect so shut it down" is not. Primefac (talk) 13:19, 22 January 2023 (UTC)- It's hard to overcome this defeatist attitude -- that the problem can't be fixed or the errors are inevitable or that these issues are "just GIGO". They aren't unfixable, obviously: turning off the bot stops the errors completely. Of course, the bot does do desirable changes and does them reliably. My suggestion is to turn off the unpredictable parts until they can be made acceptably reliable.
- GIGO is a cop-out. Properly-programmed computers evaluate their input. In the face of invalid input, good software raises an error and does not take further action. If we truly think these are GIGO cases, then we're describing a bot that either doesn't care to validate its input and acts regardless. Or are we describing a bot that does evaluates its input, finds it invalid and still acts anyway? A great improvement would be to do away with the unconditional action and instead either evaluate further or quit trying to do the impossible when it's so likely to be a debatable change.
- Here, I was asked to provide "some a few recent diffs", and I did. I did not set out to comprehensively evaluate all the edits made in a particular period. Had I tried to, I'd undoubtedly find more problems than I reported and would've driven your computed error rate higher. Note that nobody -- and most irresponsibly, the bot operator -- hasn't evaluated the accuracy rate either and that's another thing that I suggest be done.
- But why would you deliberately misapply my list in order to compute statistics in this way?
- I also wonder: why is it your position that any bot should be free to run before its correctness can be verified? That seems contrary to anything we know about automation in IT. Yet from the same talk page, the bot's owner {{diff2||1125613668|1125568193|dismissively invited me to monitor the bot for problems}} rather than themselves consider any solution or monitor their bot's edit themselves. -- Mikeblas (talk) 04:32, 23 January 2023 (UTC)
- Do you even know how BRFA or bots operate? Every bot task gets evaluated before it is approved, and you can't just "turn off the unpredictable parts" of a bot's code. Code can be updated if something specific is found to be a problem (see for example my Task 17 and its code updates), but not everything can be planned for or prevented. As I believe multiple BAG members have said in this discussion - if you find a repeated, often-problematic bug, it can likely be fixed, but 100% is simply not feasible, and 90-95% is a pretty decent target to shoot for. Additionally, Anomie has said they will look into some of the coding in the Task 6 expansion. Stop expecting bot operators to bend over backwards to fix these edge cases and you might actually get somewhere; flies, honey, etc. Primefac (talk) 09:37, 23 January 2023 (UTC)
- These aren't edge cases, and I've provided plenty of examples. Is the code for AnomieBOT publicly available?
- Meanwhile, you've decided to ignore the inconvenient questions I've asked you. -- Mikeblas (talk) 14:31, 23 January 2023 (UTC)
- Feel free to ask it again if I've missed a question that still needs a reply, instead of playing coy. Primefac (talk) 14:33, 23 January 2023 (UTC)
- Nothing coy here: the questions are above and you know which you didn't answer. To me, the most intersting ones are about your misinterpretation of my list of issues as an exhaustive evaluation; and about your expectation that bots can run before they're demonstrated correct -- and further that raising concerns and evidence about bots making errors "is unacceptable" -- are outstanding.
- You also didn't answer my question about the availability of source code. But I found it anyway!
- After reviewing the code for just a few minutes: Looks like OrphanReferenceFixer is the "task" that implements some of the dubious edits. I'm not much of a perl guy, but it seems like the bot starts in bot-instance.pl by reading all the tasks in the supplied tasks directory.
- Seems like we could, then, remove this task just by removing its package code from the tasks directory. Or, by adding code in the enumeration of that directory that finds that file and skips creating the task object for it.
- Looks like each task has an "approved" method that returns -- well, maybe there's some way to modify "approved" for the OrphanReferenceFixer task to have it not run without actually removing the code. But I couldn't find code that uses the "approved" value in a task. (Maybe that's because I'm surfing a bunch of web pages instead of browsing a directory full of files, so searching isn't so facile.) Probably could also disable a single task by returning failure from its
init()
method. - Maybe we want to keep the ORF task but modify its behaviour more granularly. Within the task, it's pretty easy to see that more direct replacements are done with a series of regex replacements in
process_page
. After the regexes are applied, I don't see any checks that test the results of the change. That is, anything that trips up the regex is going to produce undesirable output, and the code won't notice. This is how we end up with reference anchor names that are 600-something characters long. - This code could check for new errors after its changes are applied and abort its intended change if new errors appear. Such a check would help make the code resilient against input that it wasn't prepared to work. That is, it's at this point in the code where the "it's just GIGO" argument falls over: the code itself doesn't test the validity of its input, or of its own output. GIGO is avoidable.
- These regex substitutions could be simple, but since they're implemented with regexes they're quite involved.[19] But more complicated is the reference copying code -- which we find is sometimes injecting irrelevant references and unconditionally claiming victory. This behaviour could be selectively disabled by not executing the code starting at the loop commented
# Any orphaned refs?
. - And so I think your claim that we
can't just "turn off the unpredictable parts" of a bot's code
is entirely false. -- Mikeblas (talk) 15:26, 23 January 2023 (UTC)- If I had known which question I hadn't answered, I would have answered it, and you still haven't told me so I am done with this farcical conversation. Ask for a question and get eight paragraphs of response... jeez. Primefac (talk) 19:39, 23 January 2023 (UTC)
- I directly answered:
To me, the most intersting ones are about your misinterpretation of my list of issues as an exhaustive evaluation; and about your expectation that bots can run before they're demonstrated correct -- and further that raising concerns and evidence about bots making errors "is unacceptable" -- are outstanding.
-- Mikeblas (talk) 23:52, 24 January 2023 (UTC)
- I directly answered:
- If I had known which question I hadn't answered, I would have answered it, and you still haven't told me so I am done with this farcical conversation. Ask for a question and get eight paragraphs of response... jeez. Primefac (talk) 19:39, 23 January 2023 (UTC)
- Feel free to ask it again if I've missed a question that still needs a reply, instead of playing coy. Primefac (talk) 14:33, 23 January 2023 (UTC)
- Do you even know how BRFA or bots operate? Every bot task gets evaluated before it is approved, and you can't just "turn off the unpredictable parts" of a bot's code. Code can be updated if something specific is found to be a problem (see for example my Task 17 and its code updates), but not everything can be planned for or prevented. As I believe multiple BAG members have said in this discussion - if you find a repeated, often-problematic bug, it can likely be fixed, but 100% is simply not feasible, and 90-95% is a pretty decent target to shoot for. Additionally, Anomie has said they will look into some of the coding in the Task 6 expansion. Stop expecting bot operators to bend over backwards to fix these edge cases and you might actually get somewhere; flies, honey, etc. Primefac (talk) 09:37, 23 January 2023 (UTC)
- What is the error rate, how bad is it? No one expects 100% perfection due to the great benefit of the bot, but at the same time we might expect at least 95% accuracy or something and be quite concerned if it was 80% because the 80/20 Rule suggests the first 80% are easy and will be error free, the remaining 20% are very hard - does it at least get beyond the 80/20 Rule? IMO any bot that is 80% accurate should not be running. -- GreenC 14:38, 22 January 2023 (UTC)
- That's part of the problem -- we don't know, since nobody is monitoring the bot for correctness. This bot is very active, so manually tallying its actions would be quite a chore. The "some new ones" list I present above spans about 25 hours, but I certainly did not exhaustively examine every edit the bot made in that time span. -- Mikeblas (talk) 04:04, 23 January 2023 (UTC)
- I don't have the time to read all the above, but I skimmed it and have one driveby comment: It seems to me that an article which this bot task has to edit already contains an error, the orphaned reference, or some very weird wikitext which confuses the bot. Without the bot's edits, far more articles would contain errors. Some of the nitpicks, like internal reference names containing vandalism obscured by bot edit, are far less problematic than that. It's a task that's been running for over a decade with success, I'm not convinced a strong case has been made here for the approval to be revoked or modified. ProcrastinatingReader (talk) 15:51, 23 January 2023 (UTC)
- I think people would like to believe that, but there's no collected evidence to support it. That is, I don't think there's a strong case to comprehensively claim these tasks are running "with success". Meanwhile, not thoroughly reading the evidence can't mean you're upholding your duties as a BAG member in addressing concerns raised about a bot and its operator. Doesn't your voicing a decision about the same mean that you've ignored the presented facts and are acting on bias instead of reason? -- Mikeblas (talk) 23:51, 24 January 2023 (UTC)
Doesn't your voicing a decision about the same mean that you've ignored the presented facts and are acting on bias instead of reason?
Or discarding them because they don't think they're sufficiently valid to be of concern. Please try to avoid presenting false dichotomies.- At this point, with multiple BAG members, including the bot author, responding in the negative to changing how the bot functions in the way you would prefer, you should consider an WP:RFC. Otherwise, you should drop the stick. I have some doubt an RFC would resolve in the way you would prefer also, but it does remain an option available, and who knows, maybe people would indeed prefer that the ref-saving task be turned off. Izno (talk) 00:25, 25 January 2023 (UTC)
- I think people would like to believe that, but there's no collected evidence to support it. That is, I don't think there's a strong case to comprehensively claim these tasks are running "with success". Meanwhile, not thoroughly reading the evidence can't mean you're upholding your duties as a BAG member in addressing concerns raised about a bot and its operator. Doesn't your voicing a decision about the same mean that you've ignored the presented facts and are acting on bias instead of reason? -- Mikeblas (talk) 23:51, 24 January 2023 (UTC)