Wikipedia:Bots/Requests for approval/Lightbot 13
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Lightmouse (talk · contribs)
Automatic or Manually assisted: Automatic supervised
Programming language(s): AWB, monobook, vector, manual
Source code available: Source code for monobook or vector are available. Source code for AWB will vary but versions are often also kept as user pages.
Function overview: Janitorial edits to units that contain at least one unit of length, area, or volume e.g. 50 mph, 30 cubic feet per minute.
Links to relevant discussions (where appropriate):
This request duplicates the 'units of measure' section of Wikipedia:Bots/Requests for approval/Lightbot 3. That BRFA was very similar to the two previous approvals: Wikipedia:Bots/Requests for approval/Lightbot and Wikipedia:Bots/Requests for approval/Lightbot 2. This is an extension of the existing approval Wikipedia:Bots/Requests for approval/Lightbot 5.
Lightbot 5 is permitted to convert miles, but not miles per hour. It is permitted to convert cubic inches but not fluid ounces. This request seeks to correct that. A relevant guideline is at:
- mosnum - Unit symbols "Where English-speaking countries use different units for the same measurement, follow the "primary" unit with a conversion in parentheses."
The guideline is stable and has existed in various forms for a long time. Other editors and I have done many edits along these lines over a long period. Examples of such conversions exist in contributions list but it would be easier just to demonstrate with new edits.
Edit period(s): Multiple runs. Often by batch based on preprocessed list of selected target articles.
Estimated number of pages affected: Individual runs of tens, or hundreds, or thousands.
Exclusion compliant (Y/N): Yes, will comply with 'nobots'
Already has a bot flag (Y/N): No
Function details:
For units that contain at least one unit of length, area, or volume:
- Edits may add conversions to units e.g. "50 mph" -> "50 miles per hour (80 km/h)"
- Edits may edit the format or spelling e.g. "5 gal(US)" -> "5 US gallons (19 L)"
- Edits may add or remove links e.g. "50 miles per hour (80 km/h)" -> "50 miles per hour (80 km/h)". This will be in accordance with Wikipedia:Link#What_generally_should_not_be_linked.
Discussion
editPlease can we move to a 50 edit trial? Lightmouse (talk) 10:28, 16 April 2011 (UTC)[reply]
Some questions -
- How will you determine the precision of the conversion to use?
- Where the unit is ambiguous (e.g. "gallons") how will you determine which is meant?
- Why would you change the format of units, and what would you do if there was an objection to a change you make?
- What will you do if you find an article that uses a mix of metric and imperial units. I've seen this, including ones that use the convert template with 'imperial (metric)' and 'metric (imperial)'? See for example Minimum gauge railway; other articles about railway gauges will be similar as some are defined in imperial and others in metric. I understand it is standard practice on railways in Northern Ireland to measure distances in miles and metres - what will your bot do if it encounters this? Thryduulf (talk) 12:14, 23 April 2011 (UTC)[reply]
- Precision: The 'convert template' will usually be used with default precision. If you want to know more details about how it works, feel free to ask at Template talk:Convert. In many cases, this is a match, or +/- 1 significant figures. With the template conversion in place, it's easy for an editor to adjust precision.
- Ambiguous units. In many cases of 'ambiguous units', there is no ambiguity in the context. For example, it's almost always easy to see when the author write 'gallon' but means the US gallon because it's in a US article about a US topic using US sources. Ambiguity will be avoided where the ambiguity is real.
- Reasons for format change. It's not possible to use the convert template *without* adopting a standard format in accordance with guidelines. Non-template and template conversions will be (as far as I know) consistent. If somebody disagrees with the format used by the convert template, then the convert template will have to come out. But the issues are usually trivial or esoteric for example the addition of 'US' to gallon and/or the use of upper and lower case. From time to time, new variations on these issues do crop up and I've started many discussions myself in the relevant guideline pages following feedback on a conversion.
- Mix of metric and non-metric. The bot isn't designed to resolve mixed units and has no provision for it in the code. Articles often contain primary metric alongside primary non-metric - sometimes it's for a good reason (such as mixing miles and metres on transport, as you suggest. I'm aware of this), sometimes it's not. Over the years using automation and seeing lots of articles pass in front of me, I have noticed suboptimal unit sequences and responded with human edits in either the Lightbot or Lightmouse accounts. So the option is useful, but is a low priority for Lightbot.
- {{BAGAssistanceNeeded}} Please can we move to a 50 edit trial? Lightmouse (talk) 09:52, 25 April 2011 (UTC)[reply]
- Small point: In the United States, the 'United States' is most commonly abbreviated "U.S." (with dots). Therefore, if you are editing only American articles/topics, it should probably use the more common, U.S.
- I am in favor of determining the meaning of which gallon is meant. Ambiguity would be real in non-American articles. I'd stay away from (or at least be very, very aware and cautious of) any articles, like British Empire/Commonwealth related, where the imperial gallon is most likely meant. Twelve miles from my house, when someone says gallon without saying U.S., they mean imperial gallon. Something to think about... It will be interested to see the trial runs. —MJCdetroit (yak) 02:36, 1 May 2011 (UTC)[reply]
- 'U.S. gallon' versus 'US gallon': The Manual of style states 'US gallon' and the convert template complies. I'll do the same. I really don't care either way as long as it's unambiguous and consistent.
- Regional variation/overlap for gallon: Ironically, this is why work is required to eliminate the ambiguity for readers. The automation is required to simplify the work but it will be the human that makes the decision. It hasn't been my main focus but over the years I've done enough to know that the 'gallon' is highly suitable for human-overseen-automation (either as Lightbot or Lightmouse). Note that there is also a time dimension: references to 'gallon' in the British Empire/Commonwealth outside North America have declined massively as the metric system has been adopted. It's easy to worry too much about the ambiguity, but in practice the naked gallon can almost always be specified beyond reasonable doubt. This is trivial to demonstrate. As you suggest, I'll certainly avoid making a decision where the time-frame or region makes the ambiguity difficult to resolve.
As you say, a trial will demonstrate how it can be done. Please can we proceed to a trial without further delay? Regards Lightmouse (talk) 14:17, 1 May 2011 (UTC)[reply]
- The manual of style was written in part by many people and I know that outside of the United States the abbreviation US is more popular. I remember when we wrote that. People didn't want 'U.K.' and didn't want 'UK' with 'U.S.', therefore US was used (see WT:MOSABBR). Convert complies with U.S. as well: {{convert|10|u.s.gal|impgal}}-->10 U.S. gallons (8.3 imp gal). Good Luck. —MJCdetroit (yak) 14:53, 1 May 2011 (UTC)[reply]
It doesn't surprise me. I think the following quote is apt for style guides:
- Laws, like sausages, cease to inspire respect in proportion as we know how they are made. [1]
Regards Lightmouse (talk) 15:50, 1 May 2011 (UTC)[reply]
- Recused MBisanz talk 01:49, 4 May 2011 (UTC)[reply]
Approved for trial (50). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Task is a suitable expansion of Lightbot 5. Note that a successful trial will not necessarily result in however, since I still have some reading to do with the ARBCOM mess, and I'm also wondering if Lightbot 5 should have been approved at all... But let's trial it for now. Headbomb {talk / contribs / physics / books} 15:47, 4 May 2011 (UTC)[reply]
Trial complete. Lightmouse (talk) 22:38, 5 May 2011 (UTC)[reply]
- Yes. See Trial edits. Edit summary is 'Fix units that include length, area, volume' Lightmouse (talk) 09:30, 10 May 2011 (UTC)[reply]
Seems all in order. However, here [2] it missed a conversion for 0-60 MPH. Not really a problem, but should be handled before approval nonetheless (this way, the bot doesn't have to run twice over the same articles). Could you add fixes such as km² → km2 per the last bullet of Wikipedia:MOSUNIT#Unit_symbols? Headbomb {talk / contribs / physics / books} 19:18, 20 May 2011 (UTC)[reply]
- Ranges like '0-60 MPH' '0 to 60 MPH', 'between '0 and 60 MPH' are a big challenge. It's not just mph, it's all units. There are so many variations of how ranges are expressed that I spend a lot of effort trying to avoid false positives and changes that look odd. It's a lot more work to make a change that looks good. I don't think I can deliver high performance on ranges right now. Could I appeal to you and your colleagues to allow me to tackle ranges later? Lightmouse (talk) 21:59, 20 May 2011 (UTC)[reply]
- With respect to km² → km2, I agree. It's tackled by default in the template. I'll update the code and that should address many of the non-template instances in text. Lightmouse (talk) 22:03, 20 May 2011 (UTC)[reply]
- When I come across the likes of "0 to 60 mph in 10 seconds" I do this
{{convert|0|to-|60|mph|abbr=on}} in 10 seconds
. JIMp talk·cont 22:55, 13 June 2011 (UTC)[reply]
- When I come across the likes of "0 to 60 mph in 10 seconds" I do this
Yes. The correct convert template isn't the issue. The issue is correctly detecting target text while avoiding false positives. Believe me, I'd prefer to have a static Wikipedia and a single process and a single piece of code. My conservative approach is to do the easy stuff (e.g. unambiguous units, non-ranges) together where possible. The scope of that expands with experience. Special issues like ranges, ambiguous units, grouped units are best done in special runs where code and human can be optimised to reduce risks.
- We're in the desirable position of discussing task priorities and permutations within the bot scope. In order to get started, can we move to approval? Lightmouse (talk) 16:09, 14 June 2011 (UTC)[reply]
- The scopes of Wikipedia:Bots/Requests for approval/Lightbot 5 (already approved) and Wikipedia:Bots/Requests for approval/Lightbot 10 (the addition of 'inch') are entirely contained within this application. If this application is approved, Lightbot5 and Lightbot10 will be withdrawn. Lightmouse (talk) 10:14, 15 June 2011 (UTC)[reply]
Approved. Headbomb {talk / contribs / physics / books} 16:24, 15 June 2011 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.