Module talk:Citation/CS1/Archive 10
This is an archive of past discussions about Module:Citation. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 5 | ← | Archive 8 | Archive 9 | Archive 10 | Archive 11 | Archive 12 |
Checking for invalid |lccn=
This one won't come up very often, but an invalid |lccn=
can make it so that the automatic link to lccn.loc.gov does not work. I came across one in a citation today. I'm guessing that there are approximately a few dozen invalid LCCN parameters in all of WP, but they should be easy to detect.
Here is a straightforward explanation (scroll to "identifier-syntax") of valid LCCN syntax that will work with the LCCN web site. – Jonesey95 (talk) 05:16, 17 March 2014 (UTC)
- length=10
- pass. LCCN aa12345678.
- pass. LCCN 9912345678.
- fail. LCCN a912345678.
{{cite book}}
: Check|lccn=
value (help) - fail. LCCN 9a12345678.
{{cite book}}
: Check|lccn=
value (help)
- length=11
- pass. LCCN aaa12345678.
- pass. LCCN a9912345678.
- fail. LCCN 99912345678.
{{cite book}}
: Check|lccn=
value (help) - fail. LCCN 9aa12345678.
{{cite book}}
: Check|lccn=
value (help) - fail. LCCN aa912345678.
{{cite book}}
: Check|lccn=
value (help) - fail. LCCN a9a12345678.
{{cite book}}
: Check|lccn=
value (help)
- length=12
- pass. LCCN aa9912345678.
- fail. LCCN 0a9912345678.
{{cite book}}
: Check|lccn=
value (help) - fail. LCCN a09912345678.
{{cite book}}
: Check|lccn=
value (help)
- length=13
- fail. LCCN aa99123456789.
{{cite book}}
: Check|lccn=
value (help)
- If retained, error category will be Category:CS1 errors: LCCN.
- length=12
- fail. LCCN 779912345678.
{{cite book}}
: Check|lccn=
value (help)
- Looks good to me. – Jonesey95 (talk) 03:55, 21 March 2014 (UTC)
Well, not quite right. The check needs to be improved so that lccns with hyphens are normalized before they are checked.
—Trappist the monk (talk) 14:20, 30 March 2014 (UTC)
- Ok, I think that I've fixed the issue. New function
normalize_lccn()
normalizes the lccn according to the Normalization of LCCNs procedure. These test citations all work correctly.normalize_lccn()
is able to normalize them all; the two fails are because there are spaces in the lccn that cause improper display of the lccn link- pass. LCCN n78-890351.
- pass. LCCN n78-89035.
- fail (white space). LCCN 78890351 n 78890351.
{{cite book}}
: Check|lccn=
value (help)[http://lccn.loc.gov/n 78890351 n 78890351]
- pass. LCCN 85000002.
- pass. LCCN 85-2.
- pass. LCCN 2001-000002.
- pass. LCCN 75-425165//r75.
{{cite book}}
: External link in
(help)|lccn=
- fail (white space). LCCN /AC/r932 79139101 /AC/r932.
{{cite book}}
: Check|lccn=
value (help)[http://lccn.loc.gov/79139101 /AC/r932 79139101 /AC/r932]
- —Trappist the monk (talk) 17:41, 30 March 2014 (UTC)
- Should the red error message be set to "hidden=true" in the live module until this bug fix is rolled out? I recommend doing so in order to avoid false positives. – Jonesey95 (talk) 00:03, 31 March 2014 (UTC)
Possible small bug in new year range code
I think I have found a small bug in the new year range code (which is great, by the way!). Here's what I have so far:
Author (1901–02). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help); Check date values in: |year=
(help)
Author (1901–04). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help); Check date values in: |year=
(help)
Author (1909–10). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help); Check date values in: |year=
(help)
Author (1911–12). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help); Check date values in: |year=
(help)
Author (1918–20). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help)
Author (1921–22). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help)
Author (1931–36). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help)
Author (1984–86). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help)
Author (2001–02). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help); Check date values in: |year=
(help)
Author (2001–04). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help); Check date values in: |year=
(help)
Author (2009–10). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help); Check date values in: |year=
(help)
Author (2011–12). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help); Check date values in: |year=
(help)
Author - future year (2018–20). "Foo Title". Journal Name. 23: 4. {{cite journal}}
: |author=
has generic name (help)
The year ranges are changed and are in |year=
. The citations are otherwise identical except for the future year. – Jonesey95 (talk) 00:17, 31 March 2014 (UTC)
- Not a bug. All of those errors occur where the two-digit year is less than 13 which makes for a possibly ambiguous date: YYYY-MM? or YYYY-YY?
- The warning violates WP:MONTH which reserves the
YYYY-YYYYYY–YY format for years with this statement: "Do not use YYYY-MM format (e.g. 2001-03 for March 2001, which may be confused with the year range 2001–2003)." Jc3s5h (talk) 01:27, 31 March 2014 (UTC) Corrected format; year ranges should use an n-dash, not a hyphen. 14:55, 31 March 2014 UT.
- The warning violates WP:MONTH which reserves the
- What is the proposed way to remove the date error for "date=1901–02"?
- The RFC on the YYYY-MM date format was just closed in favor of YYYY-MM remaining as a proscribed format, so 1901–02 can only be a year range.
- If I recall correctly, the date-checking code is picky about hyphens (required for YYYY-MM-DD, marked as an error in other formats) versus endashes (required for all ranges, marked as an error in other formats). The examples above all use endashes, so they should be valid, whereas the YYYY-MM format, if it were acceptable, would use a hyphen. Since the above examples (a) are all in
|year=
and (b) all use endashes, I propose that they should be acceptable to the date-checking code. – Jonesey95 (talk) 02:44, 31 March 2014 (UTC)
- If I recall correctly, the date-checking code is picky about hyphens (required for YYYY-MM-DD, marked as an error in other formats) versus endashes (required for all ranges, marked as an error in other formats). The examples above all use endashes, so they should be valid, whereas the YYYY-MM format, if it were acceptable, would use a hyphen. Since the above examples (a) are all in
- Yes, Module:Citation/CS1 is picky about hyphens and endashes. Hyphens are only allowed in YYYY-MM-DD dates; ranges require endashes.
- I'm not sure that we can rely on endashes, or on particular date-holding parameter names, or on editors adhering to the strictures of WP:MONTH, to divine intended meaning when we find AAAA-BB format in a date-holding parameter. You've been sorting through the gibberish that editors dump into CS1 template parameters long enough to know that editors aren't all that careful. I, for one, would like to see
|year=
go the way of|day=
and|month=
.
- I'm not sure that we can rely on endashes, or on particular date-holding parameter names, or on editors adhering to the strictures of WP:MONTH, to divine intended meaning when we find AAAA-BB format in a date-holding parameter. You've been sorting through the gibberish that editors dump into CS1 template parameters long enough to know that editors aren't all that careful. I, for one, would like to see
- I don't think that readers (who aren't versed in the intricacies of WP:MOS) can easily determine by inspection if Journal Name 23, 1901–02, is the February issue or covers the period 1901–1902. Interpretation of such date ranges occurring in article text at least has some possibility of context to aid the reader; context in an isolated citation is much more limited and may not exist.
- So, the fix for
|date=1901–02
is|date=1901–1902
.
- So, the fix for
- I'll add this to the CS1/WP:DATESNO compliance table.
- "So, the fix for
|date=1901–02
is|date=1901–1902
." This is wrong on a few levels. First, this thread is about a bug in checking code. The checking code doesn't fix anything, an editor does. Next, since MOSNUM allows 1901–02 (that's an n-dash) this style of range should be acceptable by the checking code. I think the code already would disallow a hyphen in any date expression except YYYY-MM-DD, so even after the code is fixed to accept 1901–02 the expressions 1901-02 and 1901-99 (with hyphens) would still be flagged as errors. Jc3s5h (talk) 15:11, 31 March 2014 (UTC)
- "So, the fix for
- As I wrote before, not a bug, the code was intentionally written to exclude AAAA–BB dates where BB is less than 13. AAAA-BB dates are invalid because of the hyphen, regardless of the value in BB.
- Yep, the code doesn't fix anything; never has, never will, and I never said that it would. Please stop putting words in my mouth that I have never written nor spoken.
- I have noted before that CS1 is compliant with a subset of WP:DATESNO, itself a subset of WP:MOSNUM, CS1 will never be fully compliant with either.
- YYYY-MM-DD is the only date format where hyphens are allowed and in fact required by CS1.
- We seem to have a breakdown in terminology. There is a Citation Style 1, as described in Help:Citation Style 1, which is the way citation should be entered into templates and be displayed. Then there is the implementation of Citation Style 1 by the various bits of template code. Sometimes the template implementation is deficient. For example, if a source were written in the year 46, it would have to be so described in the citation, but the implementation doesn't support it. So the editor most likely would write a hand-coded citation that resembles Citaton Style 1 as much as possible.
- Help:Citation Style 1#CS1 compliance with Wikipedia's Manual of Style describes certain aspects of dates that, at present, are not feasible to implement, or are still on the to-do list for implementation. It should not be a description of free choices to create differences between CS1 and MOSNUM, because no such free choices were agreed to by the community. It would be inappropriate for a template coder to chose to implement templates in defiance of WP:MOSNUM if it is reasonably feasible to follow WP:MOSNUM. Jc3s5h (talk) 17:47, 31 March 2014 (UTC)
- Perhaps it should be a lower class of warning, but we are used to ignoring certain date errors, already. I see how this range can be ambiguous, even though it is compliant. Warning that it could be a problem is not actively trying to fix something that isn't broken. I'd support adding the word possibly to this detection, or a different differentiation, if the code became that sophisticated. (It is already quite complicated.) —PC-XT+ 00:47, 1 April 2014 (UTC)
- Help:Citation Style 1#CS1 compliance with Wikipedia's Manual of Style describes certain aspects of dates that, at present, are not feasible to implement, or are still on the to-do list for implementation. It should not be a description of free choices to create differences between CS1 and MOSNUM, because no such free choices were agreed to by the community. It would be inappropriate for a template coder to chose to implement templates in defiance of WP:MOSNUM if it is reasonably feasible to follow WP:MOSNUM. Jc3s5h (talk) 17:47, 31 March 2014 (UTC)
PC-XT, I think this is the wrong place to argue that YYYY–YY is too ambiguous to use in a citation, because a citation has less context than other parts of an article. If that's what you believe, you should bring it up at Help talk:Citation Style 1, and argue that page should declare that Citation Style 1 permanently rejects YYYY–YY, as an exception to the general acceptance of WP:MOSNUM date guidance, and it should be spelled out the rejection is on the basis of ambiguity (which is permanent) rather than it being unfeasible to check (which may be temporary). Jc3s5h (talk) 01:13, 1 April 2014 (UTC)
- I don't mean to argue. I simply think there is room for compromise, at least for the moment. Some people find it ambiguous. I'm not sure I understand the arguments of whether it should be ambiguous, as I haven't read most of the other discussions about it. I may propose that the help page mention that some find it ambiguous, but I don't plan to try to outlaw it. I also have no opinion on whether this check remains the way it is due to ambiguity or turns into something else, other than that I prefer that templates, modules, userscripts, etc. follow consensus. I don't think this change is as bad as it could be. We have already had date errors against the MOS for a while, so I expect editors are used to them by now, and will generally use proper judgement. It's not perfect, but it's generally improving, and there is time. —PC-XT+ 01:56, 1 April 2014 (UTC)
- CS1 does not do repairs. It can't. Besides, we have editors and robots to do the menial labor.
- I'm sure that it's possible to have different levels of errors but what I think that amounts to is hidden error messages and perhaps different categorization. These particular error message are currently hidden so there isn't much to be gained there. Where would the word 'possibly' go?
- —Trappist the monk (talk) 01:09, 1 April 2014 (UTC)
- I meant that the current implementation doesn't do anything wrong. It simply alerts editors to possible ambiguity, without changing anything in the display. I don't think it really needs different levels, but since it may be controversial, it could possibly be separated into a different category or use some kind of lesser error orange color, though I don't really see the point. As far as the word possible, it could go into the documentation at Help:CS1 errors#bad date. I prefer templates to follow documentation, but this case is not going to be so straightforward. —PC-XT+ 01:56, 1 April 2014 (UTC)
- —Trappist the monk (talk) 01:09, 1 April 2014 (UTC)
- At present the code not only issues a warning message, but also throws a page into Category:CS1 errors: dates. One goal of more precise date checking is to reduce the number of false alarms in this category. So the code would have to have a level of warning which would issue a warning, but not put the page in the category. Also, eventually this is supposed to be fully turned on so all readers see it. I don't think we want to be showing warnings for things that might be wrong, rather than a clear violation of Citation Style 1. Jc3s5h (talk) 02:07, 1 April 2014 (UTC)
- I've added a section at Help talk:CS1 errors with more details. I agree that reducing false alarms is a goal worth pursuing. I also agree that when the messages are turned on for everyone, each message should ideally link to help text with MOS links supporting its statements. It seems to be too soon to try for that, at the moment. I might support removing the category from this error, but leaving the hidden red text for now. That way, pages with less contentious errors can be given priority over those with this error only. —PC-XT+ 02:26, 1 April 2014 (UTC)
- Regretably, the development of CS1 was not a neatly planned, well organized engineering project. It was/is a mess. In the best of all possible worlds, the developers of CS1 would have started with a specifications document. That document would have directed the implementation; it would have served as the basis for the user documentation. Alas, twas not to be. This is Wikipedia. Multiple authors created multiple templates that evolved into multiple templates using the common
{{citation/core}}
. Continuing evolution is bringing them all into Module:Citation/CS1; new features are added, old features are pared away, and somehow, somehow, it is beginning to coalesce into a single entity. There is no plan for this, it just happens because Wikipedia happens. Documentation in this kind of environment lags behind the implementation; it will always lag behind.
- Regretably, the development of CS1 was not a neatly planned, well organized engineering project. It was/is a mess. In the best of all possible worlds, the developers of CS1 would have started with a specifications document. That document would have directed the implementation; it would have served as the basis for the user documentation. Alas, twas not to be. This is Wikipedia. Multiple authors created multiple templates that evolved into multiple templates using the common
- Help:Citation Style 1 is not a specifications document, nor a design guide, nor is it even a style guide; it is not
a description of free choices to create differences between CS1 and MOSNUM
, though it does reflect those choices; it isn't even good user documentation. It is a mess. Help:Citation Style 1 is merely a collection of writings that attempts to describe how CS1 works and how to use it. Expecting more from it than that will lead to despair.
- Help:Citation Style 1 is not a specifications document, nor a design guide, nor is it even a style guide; it is not
- I choose think that this particular coding choice was made not in
defiance of WP:MOSNUM
, but rather, to the benefit of CS1.
- I choose think that this particular coding choice was made not in
It is a mess, and documentation isn't going to be perfect. Nevertheless, when it is clear what the documentation says, and it is reasonably feasible to write checking code that follows the documentation, that should be done. It is inappropriate to use a position as a code writer to ignore the consensus process and just implement whatever the coder prefers. If you think 1901–02 is ambiguous in the context of a citation, get consensus to modify Help:Citation Style 1 accordingly, instead of throwing it on a list of stuff that is infeasible or in the queue. Jc3s5h (talk) 01:21, 1 April 2014 (UTC)
origyear --> origdate?
Should we deprecate origyear and make it a synonym of a new parameter called origdate so that the naming format is consistent. Jason Quinn (talk) 03:43, 3 April 2014 (UTC)
- No.
|origyear=
is mainly used for books, where a second or subsequent edition is common, but the exact publication date is unimportant; for a non-first edition of a book, the actual and original year of publication are both useful. The exact publication date is mainly of use for periodicals, where although there is almost always more than one issue, there is rarely more than one edition. Some newspapers do have one or more pages re-set during a print run, in order to cover breaking news; but the cover date doesn't change, and so the "original date" is pretty much a non-existent concept. --Redrose64 (talk) 10:02, 3 April 2014 (UTC)
No consensus on whether YYYY-MM is acceptable or unacceptable
An RFC that sought to determine whether YYYY-MM was an acceptable date format was recently closed.
The YYYY-MM format is currently in the Unacceptable column of the table at WP:BADDATEFORMAT, but I expect that to change soon. It was added there because the initial RFC closure said that there was "no consensus to change anything", implying that the state of the table at the opening of the RFC (YYYY-MM was in the Unacceptable column at that point) was how it should remain. The closure was subsequently revised to read: "There is no consensus that YYYY-MM is an acceptable format, nor any consensus that it is an unacceptable format. I would recommend against any mass changes being made purely on the basis of this RfC."
Based on this reasonably-attended RFC, despite the lack of consensus, it appears that the CS1 module's date-checking code should stop flagging YYYY-MM as an invalid date format. Thoughts? – Jonesey95 (talk) 00:48, 3 April 2014 (UTC)
- The RfC pretty much starts off with:
The recent (29 Nov 2013) banning of the yyyy-mm format ...
which apparently arises from this conversation and this change to the table at WP:BADDATEFORMAT.
- The table at WP:BADDATEFORMAT was then subjected to quite a few edits to change its format but YYYY-MM remained in the table until this edit (3 Feb 2014) when it was hidden pending the outcome of the RfC.
- Another version of the ban was added 31 Mar 2014. That same day, the ban was modified a bit and then hidden only to be almost immediately unhidden following the closure of the RfC.
- It doesn't appear to me that the ban on YYYY-MM was added to WP:BADDATEFORMAT because of the closure of the RfC but rather, was restored following the closure.
- The ban was always there in the "Month" section, it was just repeated in the unacceptable date format table for convenience. I think it was a mistake to comment it out from the unacceptable date format table while leaving the "Month" section alone. I think we have always taken the position that when we mention the YYYY-MM-DD format, we mean exactly that, and do not allow any of the related forms mentioned in ISO 8601 such as 2014-04, 20140403, 2014-04-03T13:12, etc.
- I think this puts the English Wikipedia in an analogous position to the UK House of Lords[1]; a bill was introduced to clarify whether the legal time in the UK, which is called in the law "Greenwich Mean Time", was UTC or UT1. The lords had a debate but left the question unanswered. Wikipedia editors had a debate but couldn't come to a conclusion about whether YYYY-MM is acceptable. Jc3s5h (talk) 13:18, 3 April 2014 (UTC)
- Sorry if I wasn't clear above. Here's the timeline:
- The ban was added to the table (by me, after a discussion with clear consensus against YYYY-MM on the MOS/Dates Talk page) in November.
- After the RFC started, someone commented out the ban from the table.
- When the RFC was initially closed with the statement that there was "no consensus to change anything", the ban was restored (since "no consensus to change" implied that the ban should stay, i.e. no change from the state of the table when the RFC started).
- After the ban was reinstated, the resolution of the RFC was edited to its current state. After reading the comments above and rereading the Talk page discussion, it appears that the proper path may be to leave the YYYY-MM prohibition in the CS1 module code, remove YYYY-MM from the Unacceptable list, and leave the recommendation against YYYY-MM in the "Month" section of MOS/Dates. That would restore everything to its pre-RFC state, I believe. – Jonesey95 (talk) 17:20, 3 April 2014 (UTC)
- Sorry if I wasn't clear above. Here's the timeline:
- I think that the current state of things is just as it was prior to the RfC. So, I agree with all of item 4 except:
remove YYYY-MM from the Unacceptable list
, which would would leave us in a state different from that which existed at the initiation of the RfC. If you are looking to undo this change to the table at WP:BADDATEFORMAT, then I think that Module talk:Citation/CS1 is the wrong forum.
- I think that the current state of things is just as it was prior to the RfC. So, I agree with all of item 4 except:
- I think leaving the "Month" section alone, removing YYYY-MM from the unacceptable format table, and leaving it in the CS1 module would be sweeping the problem under the rug. Advice in MOSNUM should be easy to find; it is fundamentally sneaky to keep controversial advice in there and hope nobody notices it. As for CS1 warnings, if the community can't decide what they want, they don't deserve help from automated tools, so remove the warning. Jc3s5h (talk) 18:04, 3 April 2014 (UTC)
- Re item 4 above: I'm not saying that I am going to modify the date format table. There's too much kerfuffle on that page and its Talk page for my taste. I tried to be helpful once, after a clear consensus, and it got me nowhere. I'll stick to what I'm good at. Personally, I think the error message should stay, because the YYYY-MM guidance has been in the "month" section and because YYYY-MM will either be "unacceptable" or not displayed in the date format table.
- I think leaving the "Month" section alone, removing YYYY-MM from the unacceptable format table, and leaving it in the CS1 module would be sweeping the problem under the rug. Advice in MOSNUM should be easy to find; it is fundamentally sneaky to keep controversial advice in there and hope nobody notices it. As for CS1 warnings, if the community can't decide what they want, they don't deserve help from automated tools, so remove the warning. Jc3s5h (talk) 18:04, 3 April 2014 (UTC)
- And because YYYY-MM is fundamentally ambiguous and less clear than it should be, but I recognize that this amounts to the same thing as me just not liking it. – Jonesey95 (talk) 18:10, 3 April 2014 (UTC)
Validating |mr=
This is pretty esoteric, so it's OK if it goes onto the Feature Requests list, but I think we can validate |mr=
. I haven't found a spec for the number, but it appears to be seven numeric digits, optionally preceded by "MR".
We might want to consult with Wikipedia_talk:WikiProject_Mathematics about the preferred formatting for this link in the citation templates. We could show it as "MR1234567" or "MR MR1234567" or "MR 1234567". We show one of the latter two now, depending on whether someone puts "MR" in the value of the parameter. The first "MR" is linked to Mathematical Reviews. The number is linked to the cited source at mathscinet.org.
The article Uniform module contains links to a number of MR citations (some of them recently fixed so that they link to the right cited source). I haven't played around with case sensitivity, spaces, or other formatting to see how good the mathscinet.org processor is at handling what you throw at it. – Jonesey95 (talk) 23:49, 8 April 2014 (UTC)
- Seems immune to leading zeros and the MR if included. Seems to start at 1 and the current end seems to be 3117748; so, monotonically increasing list of numbers. How pragmatic. Error checking would seem to be pretty simple: nothing but digits and the value of the number must be greater than zero and less than say, 4000000.
- Goldie, A. W. (1960), "Semi-prime rings with maximum condition", Proc. London Math. Soc. (3), 10: 201–220, doi:10.1080/00927877908822364, ISSN 0024-6115, MR MR111766 (22 #2627), (full bolded text copied from citation page fails)
{{citation}}
: Check|mr=
value (help)
- Goldie, A. W. (1960), "Semi-prime rings with maximum condition", Proc. London Math. Soc. (3), 10: 201–220, doi:10.1080/00927877908822364, ISSN 0024-6115, MR MR111766 (22 #2627), (full bolded text copied from citation page fails)
- The web site looks pretty tolerant, but not infinitely so. – Jonesey95 (talk) 03:24, 9 April 2014 (UTC)
Cite book with period for title makes everything bold
OK, it's an edge case, but it looks like a tiny little bug that may be manifesting itself in other situations.
Cite book with only period in |title=
makes everything after it bold and adds a single quote mark before the title. Something about the wikimarkup difference between using single quotes for bold and using single quotes for italics, perhaps.
Wikitext | {{cite book
|
---|---|
Live | Author (2001). p. 3. {{cite book}} : |author= has generic name (help)
|
Sandbox | Author (2001). p. 3. {{cite book}} : |author= has generic name (help)
|
Using |url=
makes the problem go away.
Wikitext | {{cite book
|
---|---|
Live | Author (2001). p. 3 http://www.example.com. {{cite book}} : |author= has generic name (help); |url= missing title (help)
|
Sandbox | Author (2001). p. 3 http://www.example.com. {{cite book}} : |author= has generic name (help); |url= missing title (help)
|
- When would you need to set
|title=.
? --Redrose64 (talk) 16:13, 11 April 2014 (UTC)
- I think that the real flaw is that a raw title containing only a terminating character that matches the separator character is not flagged as a missing title error. If a terminating character matches the separator character, it is removed in
safejoin()
. Change to|title=;
and|separator=;
and the same thing occurs:
- The
|url=
fix 'works' because the last character in the title string is not the separator but is the closing]
of the assembled external link:[http://www.example.com ''.'']
. You can see in your second example that there is a linked period followed by an unlinked period.
- Is it worth the effort needed to fix it? Unless there are untold thousands of these peculiar citations out there, probably not.
- —Trappist the monk (talk) 16:20, 11 April 2014 (UTC)
- I have never actually seen this format in the wild, and I've seen a lot of crazy, crazy stuff in my travels through Category:Articles with incorrect citation syntax. I created a citation with this format by accident and noticed the weird formatting when I previewed it.
- I just figured I'd report it here to see if anyone else had noticed it or could think of why it might happen. I like the explanation. Probably not worth fixing, but if it comes up again, we'll have this discussion in the archives. – Jonesey95 (talk) 18:49, 11 April 2014 (UTC)
New alias, "lang"?
Would it be possible to make |lang=
an alias for |language=
? It Is Me Here t / c 11:33, 13 April 2014 (UTC)
- Reason? -- Gadget850 talk 13:08, 13 April 2014 (UTC)
- Because IMO it's easy to think that this already exists and so to put it in (as I did earlier), and I can't think of any other uses someone might have for typing "lang=", so it won't be misleading. It Is Me Here t / c 13:25, 13 April 2014 (UTC)
|language=
is clear, straightforward, and unambiguous.|lang=
is an abbreviation that is not as clear. I believe that we generally avoid abbreviations for parameter names or aliases, except where the full name of the parameter would be absurdly long, e.g.|internationalstandardbooknumber=
. – Jonesey95 (talk) 04:25, 14 April 2014 (UTC)- Well, re. "lang" specifically, it's already used as a parameter on e.g. {{Link-interwiki}}, {{Sec link}}, {{Braille cell}}, and {{Broken ref}}. Plus, there are over 500 templates that have "lang" in their name. This is why I had thought this would be fairly uncontroversial, to be honest. It Is Me Here t / c 22:50, 15 April 2014 (UTC)
- Because IMO it's easy to think that this already exists and so to put it in (as I did earlier), and I can't think of any other uses someone might have for typing "lang=", so it won't be misleading. It Is Me Here t / c 13:25, 13 April 2014 (UTC)
PDFlink
I just noticed that {{PDFlink}} is being merged into CS1 as of May 2013. I don't recall the discussion. -- Gadget850 talk 19:57, 14 April 2014 (UTC)
- There was a discussion here that resulted in a consensus decision to eliminate the template and add parameters to CS1 citations. A CS1 feature request was submitted here, but it didn't go anywhere. It's not clear to me that
|formatsize=
is required in order to eliminate the template, nor is it clear to me that there is a CITEVAR-friendly path from instances of PDFlink to CS1 citations in all or even most cases. The mechanics of how to make the transition, showing how existing instances of the template would be converted, were not explored thoroughly. – Jonesey95 (talk) 20:33, 14 April 2014 (UTC)
remove /sandbox
hi what about removing /sandbox from
--local cfg = mw.loadData( 'Module:Citation/CS1/Configuration/sandbox' );
and
--local whitelist = mw.loadData( 'Module:Citation/CS1/Whitelist/sandbox' );
please
86.173.55.186 (talk) 14:53, 21 April 2014 (UTC)
- Greetings, User:Google6666. --MF-W 15:12, 21 April 2014 (UTC)
Update to the live CS1 module week of 2014-03-23
In about a week's time I intend to update these files from their respective sandboxes:
- Module:Citation/CS1 (diff);
- Module:Citation/CS1/Configuration (diff);
- Module:Citation/CS1/Whitelist (diff)
The update makes these changes to Module:Citation/CS1:
- Add PMC error checking; (discussion)
- Fixed a circa year date validation bug; (discussion)
- Add url in |authorlink parameter error checking; (discuassion and discussion)
- Expand DOI error checking; (discussion)
- Fix longstanding bug that broke citation terminal punctuation if the value assigned to |postscript= is multicharacter (like html entities); Moved citation template's default assignments for |separator=, |postscript, and ref=harv from the invoking template into the module; Added support for |postscript=none; (discussion)
- Limit acceptable years in dates to current year+1; (discussion)
- Expand date validation; all allowable date formats should now be supported; (discussion)
- Migrate cite interview; (discussion)
- Move date validation code into a separate page Module:Citation/CS1/Date validation;
- Extract page numbers from external wikilinks in any of the |page=, |pages=, or |at= parameters for use in COinS; discussion)
- Add lccn error detection; (discussion)
- Migrate cite AV media notes; (discussion)
- Migrate cite DVD notes; (discussion)
to Module:Citation/CS1/Configuration:
- PMC error checking;
- url in |authorlink parameter error checking;
- Move |postscript= and |separator= default initialization into Module:Citation/CS1/sandbox;
- Add subject and subject link for cite interview migration;
- Add artist, albumlink, albumtype, notestitle, publisherid for cite AV media notes migration;
- Add lccn error detection;
- Delete albumtype; merge deprecated parameters albumlink, artist, director, notestitle, publisherid, titleyear as aliases of other parameters; remove these parameters after 1 October 2014;
to Module:Citation/CS1/Whitelist:
- Add subject and subjectlink for cite interview migration;
- Add artist, albumlink, albumtype, notestitle, publisherid for cite AV media notes;
- Invalidate albumtype; deprecate artist, albumlink, director, notestitle, publisherid, titleyear; these last to be invalidated after 1 October 2014;
—Trappist the monk (talk) 11:54, 25 March 2014 (UTC)
Corrected item 5 for Module:Citation/CS1 to read: Added support for |postscript=none;
—Trappist the monk (talk) 12:53, 25 March 2014 (UTC)
- Done.
- Thanks for fixing the year-range issue. Kanguole 12:54, 30 March 2014 (UTC)
Discussion
- I object to the allowable date checking as it exist in sandbox. There is no clear consensus [fixed link] for prohibiting "Feb." or "Sept", and
"Feb.""Feb." is given as an acceptable example in WP:MOS. Documentation and function should proceed in lockstep; if the community won't let you change the documentation, you shouldn't change the code. Jc3s5h (talk) 14:31, 25 March 2014 (UTC) Fix wikilink for the abbreviation "Feb." 14:52 UT. Another link fix 15:59 UT.- Per MOS:MONTH:
- Months are expressed as capitalized whole words (e.g. March).
- Abbreviations such as Mar. or Mar are used only where space is extremely limited, such as in tables and infoboxes.
- -- Gadget850 talk 14:43, 25 March 2014 (UTC)
- Per MOS:MONTH:
- Sorry, I had the wrong wikilink for "Feb." Jc3s5h (talk) 14:52, 25 March 2014 (UTC)
- Reviewing Module_talk:Citation/CS1/sandbox#Invalid_year_doesn.27t_generate_error, I see no discussion about months. -- Gadget850 talk 15:26, 25 March 2014 (UTC)
- Sorry, I had the wrong wikilink for "Feb." Jc3s5h (talk) 14:52, 25 March 2014 (UTC)
- Now, wait a minute. You yourself have written:
I do not agree that WP:MOS or WP:MOSNUM control date formats in citations (although Wikipedia talk:Manual of Style/Archive 128#Which guideline for citation style? shows there is no consensus about this).
But here you are invoking WP:MOS#Months to support your argument that Sept. and Feb. should be allowed in CS1 citations.
- Now, wait a minute. You yourself have written:
- It should be noted that short month names longer than three characters have not been acceptable to CS1 since the first iteration of the date validation code was released 9 November 2013. Except for implementation details, the functionality of that code hasn't changed and isn't changed with this update.
- Your
There is no clear consensus
link above, points to Module_talk:Citation/CS1/sandbox#Invalid_year_doesn.27t_generate_error. Was that what you intended?
- Your
- Please don't put words in my mouth that I have not spoken. I have not asked for nor attempted change to the MOS with regard to date formatting.
- Sorry, my link to the consensus discussion should be Module talk:Citation/CS1/Archive 9#Legitimate date range examples to add to the date checking part of the CS1 module
- As for the timeliness of this objection, it isn't clear to me if this change will make the error messages visible to everyone; if not, resolution of this could wait until the changes will be visible to everyone. Jc3s5h (talk) 16:00, 25 March 2014 (UTC)
- Date errors are hidden by default and will likely remain hidden until the number of pages with these errors has been significantly reduced.
- As for which guideline controls citations, in the general case, WP:CITE says any consistent style is allowed. The CS1 style (but not other styles), has chosen to adopt the date formats in WP:MOSNUM (which contains
"Feb.""Mar."). Also, the RFC mentioned above shows consensus that WP:MOS and WP:MOSNUM should agree with each other, and WP:MOS contains "Feb." Jc3s5h (talk) 16:06, 25 March 2014 (UTC) Fixed abbreviation 22:40 UT.
- As for which guideline controls citations, in the general case, WP:CITE says any consistent style is allowed. The CS1 style (but not other styles), has chosen to adopt the date formats in WP:MOSNUM (which contains
- CS1 does not comply with a lot of WP:DATESNO. Here is a table that indicates CS1 compliance with WP:DATESNO that I will probably copy over to Help:Citation Style 1#Dates so that CS1's compliance is documented for all to see.
section | compliant | comment |
---|---|---|
Acceptable date formats table | yes | Exceptions: linked dates not supported; sortable dates not supported ( {{dts}} etc);proper name dates not supported; |
Unacceptable date formats table | yes | |
Consistency | no | article level restriction beyond the scope of CS1 |
Strong national ties to a topic | no | |
Retaining existing format | no | |
Era style | no | dates eariler than 100 not supported; |
Julian and Gregorian calendars | limited | Module:Citation/CS1 cannot know if a date is Julian or Gregorian; assumes Gregorian |
Ranges | yes | Exceptions: does not support the use of – or does not support dates prior to 100; does not support solidus separator (/) does not support " to " as a date separator; |
Uncertain, incomplete, or approximate dates | yes | Exceptions: does not support {{circa}} or {{floruit}} ;does not support dates prior to 100; |
Days of the week | no | |
Months | yes | Exceptions: shortened month names longer than three characters or with terminating periods are not supported in keeping with the Acceptable date formats table; |
Seasons | no | seasons are treated as if they were months so must be capitalized; |
Decades | no | |
Centuries and millennia | no | |
Abbreviations for long periods of time | no |
- As for Trappist the monk changing date formats in MOS, I did not mean to imply Trappist had done so, or tried to. I am saying that MOS and MOSNUM apply to CS1 because Help:Citation Style 1 says MOSNUM applies, and there is consensus MOS and MOSNUM should agree with each other. Therefore, the code should allow what MOS and MOSNUM allow, and if the coders don't like it, they should change MOS and MOSNUM before making error messages visible to everyone. Jc3s5h (talk) 16:11, 25 March 2014 (UTC)
- This editor has no interest in doing battle over the discrepancies among Wikipedia:MOS#Months and Wikipedia:DATESNO#Months and the Acceptable date formats table. When those discrepancies have been resolved, I am quite content to adapt Module:Citation/CS1 so that it complies where it is possible to comply.
- It seems to me that since both MOS and MOSNUM contain "Feb." and "Mar." respectively, the status quo is that periods after dates are currently allowed. A prior version of Acceptable date formats table spelled out the dates in such detail that it implied that abbreviated dates with periods and "Sept" were not acceptable, but the current version carries no implied prohibition of these formats. Jc3s5h (talk) 19:23, 25 March 2014 (UTC) Fixed abbreviation 22:40 UT.
- @Jc3s5h: Which statement in the MOS is at odds here? -- Gadget850 talk 17:52, 25 March 2014 (UTC)
- The MOS states "Abbreviations for months, such as Feb. in the United States or Feb in most other countries, are used only where space is extremely limited." But the date syntax check in the sandbox version of the CS1 Lua-based templates flags month abbreviations followed by a period as errors. Jc3s5h (talk) 18:31, 25 March 2014 (UTC)
- This was the subject of a RFC which is still pending closure by an uninvolved admin. Suggest that we leave as is until this is closed. Keith D (talk) 19:15, 25 March 2014 (UTC)
- WP:MOS#Months (which goes to Wikipedia:Manual of Style#Months on the general MOS page) does say "Abbreviations for months, such as Feb. in the United States or Feb in most other countries, are used only where space is extremely limited." But before that it has "Further information: MOS:MONTH". This goes to Wikipedia:Manual of Style/Dates and numbers#Months; and as I understand it, the general MOS page summarises the more specific MOS subpages - it can't include all of the details, otherwise there would be no point to having subpages. MOS:MONTH does give more information: "Months are expressed as capitalized whole words (e.g. March). Abbreviations such as Mar. or Mar are used only where space is extremely limited, such as in tables and infoboxes." The bolding is mine: it shows which words are only in the specific page, not in the general page. The last phrase, "such as in tables and infoboxes", does not necessarily include references. It could include references, if the article has a very large number of refs, and those refs are high in information. But if space for refs is at a premium, abbreviating months will save a maximum of eighteen letters per ref (by using Sep for September in the
|date=
|accessdate=
and|archivedate=
), whereas a lot more space can be saved by other means: using initials instead of author's first names; by the use of|displayauthors=
; by judicious use of|location=
and|publisher=
; by the non-use of|quote=
- there are several other ways of reducing the length of a ref, which can easily achieve a saving of more than 18 characters. --Redrose64 (talk) 19:49, 25 March 2014 (UTC)
- WP:MOS#Months (which goes to Wikipedia:Manual of Style#Months on the general MOS page) does say "Abbreviations for months, such as Feb. in the United States or Feb in most other countries, are used only where space is extremely limited." But before that it has "Further information: MOS:MONTH". This goes to Wikipedia:Manual of Style/Dates and numbers#Months; and as I understand it, the general MOS page summarises the more specific MOS subpages - it can't include all of the details, otherwise there would be no point to having subpages. MOS:MONTH does give more information: "Months are expressed as capitalized whole words (e.g. March). Abbreviations such as Mar. or Mar are used only where space is extremely limited, such as in tables and infoboxes." The bolding is mine: it shows which words are only in the specific page, not in the general page. The last phrase, "such as in tables and infoboxes", does not necessarily include references. It could include references, if the article has a very large number of refs, and those refs are high in information. But if space for refs is at a premium, abbreviating months will save a maximum of eighteen letters per ref (by using Sep for September in the
- This was the subject of a RFC which is still pending closure by an uninvolved admin. Suggest that we leave as is until this is closed. Keith D (talk) 19:15, 25 March 2014 (UTC)
- The MOS states "Abbreviations for months, such as Feb. in the United States or Feb in most other countries, are used only where space is extremely limited." But the date syntax check in the sandbox version of the CS1 Lua-based templates flags month abbreviations followed by a period as errors. Jc3s5h (talk) 18:31, 25 March 2014 (UTC)
- The YYYY-MM-DD format is also for places where space is limited, and that format is widespread in CS1 citations. I think you'd have a hard time arguing that we should forbid Jan 1, 2014 but allow 2014-01-01. Personally, I'd be happy to get rid of both Jan 1, 2014 2014-01-01, but I don't think you'll convince the community of that. Jc3s5h (talk) 20:21, 25 March 2014 (UTC)
- That is a holdover from date linking. At one point dates were linked by the templates so they would show per the user's preferences. It was eventually realized that the majority of readers had no preference set, thus they saw a variety of date styles in an article. After two years of discussion, date linking was removed from the templates in 2008, but the dates were never systematically cleaned up. There have been a number of bike shed discussions since. The existence of YYYY-MM-DD dates in citations doesn't mean they are correct. -- Gadget850 talk 20:37, 25 March 2014 (UTC)
- I am very much in favour of
|date=31 December 1999
(or|date=December 31, 1999
if you really have to) for publication dates, and|archivedate=1999-12-31
and|accessdate=1999-12-31
for archive and access dates. This visually separates the publication date from other dates which are relevant only within Wikipedia. -- 79.67.241.76 (talk) 14:55, 28 March 2014 (UTC)
- I am very much in favour of
- That is a holdover from date linking. At one point dates were linked by the templates so they would show per the user's preferences. It was eventually realized that the majority of readers had no preference set, thus they saw a variety of date styles in an article. After two years of discussion, date linking was removed from the templates in 2008, but the dates were never systematically cleaned up. There have been a number of bike shed discussions since. The existence of YYYY-MM-DD dates in citations doesn't mean they are correct. -- Gadget850 talk 20:37, 25 March 2014 (UTC)
- The YYYY-MM-DD format is also for places where space is limited, and that format is widespread in CS1 citations. I think you'd have a hard time arguing that we should forbid Jan 1, 2014 but allow 2014-01-01. Personally, I'd be happy to get rid of both Jan 1, 2014 2014-01-01, but I don't think you'll convince the community of that. Jc3s5h (talk) 20:21, 25 March 2014 (UTC)
HTML entity: – does not seem to be supported in date ranges. Example:
Jc3s5h (talk) 18:43, 25 March 2014 (UTC)
- It is not supported because for the time being html entities in certain date-holding parameters corrupt COinS metadata. Use
{{ndash}}
. Also,{{cite journal/sandbox}}
invokes the live module, not the sandbox version as you might expect. I don't know what the IP editor who made that change had in mind. Use{{cite journal/new}}
:- Smith, Joseph III (1879–1910). "Last Testimony of Sister Emma". The Saints' Herald: 289.
So far, I haven't seen any objections to the changes listed in the original list above, only objections to the current operation of the module code. It might be better to split the above discussion into sections with appropriate titles. I can try to do that in an NPOV manner unless there are objections. If there are objections, I will leave the discussion as is and will not be offended.
The only note I see above that may be read as an objection to the list is in reference to the date checking. Trappist the monk may have been overly concise in item 7 on the first list, which might be clearer if it read something like "Expand date validation; all acceptable date formats in the table at WP:DATESNO should now be supported, along with most ranges listed at WP:DATERANGE (see exceptions)" – Jonesey95 (talk) 20:55, 25 March 2014 (UTC)
- I don't mind if certain items are placed in different sections. As for "Expand date validation; all acceptable date formats in the table at WP:DATESNO should now be supported", I don't think that is a correct reading of the table (although it would have been a reasonable reading of an earlier version of the table. The current table is silent about whether a period may follow a month abbreviation, or whether "Sept." is allowed. Both MOS and MOSNUM contain abbreviations followed by a period ("Feb." and "Mar." respectively.) Jc3s5h (talk) 22:41, 25 March 2014 (UTC)
- @Trappist the Monk: THANK YOU THANK YOU THANK YOU for expanding the date validation! When all allowable date formats will shortly be supported, I hope the number of articles in Category:CS1 errors: dates will drop off dramatically over the next few weeks. GoingBatty (talk) 01:16, 26 March 2014 (UTC)
Invalid parameter not detected
I found this attempt at |issn=
in the wild, and it did not cause a citation error:
{{cite journal | author=Moraes KCM, Quaresma AJ, Kobarg, J |title= Identification and characterization of proteins that selectively interact with isoforms of the mRNA binding protein AUF1 (hnRNP D) |journal=BIOLOGICAL CHEMISTRY |volume=384 |issue= 1 |pages= 25–37 |year= 2003 |pmid= 12674497 | ISSN: 1431-6730 pmc= |doi=10.1515/BC.2003.004}}
- Moraes KCM, Quaresma AJ, Kobarg, J (2003). "Identification and characterization of proteins that selectively interact with isoforms of the mRNA binding protein AUF1 (hnRNP D)". BIOLOGICAL CHEMISTRY. 384 (1): 25–37. doi:10.1515/BC.2003.004. PMID 12674497.
{{cite journal}}
: Cite has empty unknown parameter:|ISSN: 1431-6730 pmc=
(help)CS1 maint: multiple names: authors list (link)
Look specifically at the attempted ISSN parameter. Is there no error because there is nothing following the "=", and parameters with blank values are ignored?
I fixed this one, but I thought I'd drop this example here to offer some food for thought. There's a lot of craziness out there. – Jonesey95 (talk) 23:40, 21 April 2014 (UTC)
- Yes, that would be why it is not detected. There are probably a reasonable number of these out in the wild. However, the last non-whitespace character has to be "=" and it must be the only "=". Such text can also be part of a incorrectly encoded URL which contains a "|" followed by some text ending in "=".
- Another error which can not be detected by the module is duplicate parameter names. I have encountered several of those while running through pages fixing the identified "unknown parameter" errors. I was not looking for such and only detected some in specific situations. There are probably a reasonable quantity of both types of issues out there. Finding them would require a database scan. — Makyen (talk) 00:16, 22 April 2014 (UTC)
A date range error I have been unable to fix
This date range is marked as invalid. It is from 1999 in archaeology.
Baker, Dorie (December 13, 1999 – January 17, 2000). "Finding sheds new light on the alphabet's origins". Yale Bulletin and Calendar. 28 (16). Retrieved 2012-03-16.
Can anyone help turn it into a valid date range? It looks to me like it matches MOSDATE. I checked the source, and the date range matches that of the source. – Jonesey95 (talk) 04:17, 24 April 2014 (UTC)
- That format isn't currently supported. I'll fix that shortly.
- —Trappist the monk (talk) 13:13, 24 April 2014 (UTC)
- You're the best. – Jonesey95 (talk) 14:31, 24 April 2014 (UTC)
- In the sandbox:
'"`UNIQ--templatestyles-00000052-QINU`"'<cite class="citation book cs1">''Title''. 13 December 1999 – 17 January 2000a.</cite><span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Title&rft.date=1999-12-13%2F2000-01-17&rfr_id=info%3Asid%2Fen.wiki.x.io%3AModule+talk%3ACitation%2FCS1%2FArchive+10" class="Z3988"></span> <span class="cs1-visible-error citation-comment"><code class="cs1-code">{{[[Template:cite book|cite book]]}}</code>: </span><span class="cs1-visible-error citation-comment">Invalid <code class="cs1-code">|ref=harv</code> ([[Help:CS1 errors#invalid_param_val|help]])</span>
- Pass. 31 December 2014 – 1 January 2015.
- Fail – same year. 30 December 2014 – 31 December 2014.
{{cite book}}
: Check date values in:|date=
(help)
'"`UNIQ--templatestyles-00000056-QINU`"'<cite class="citation book cs1">''Title''. December 13, 1999 – January 17, 2000a.</cite><span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Title&rft.date=1999-12-13%2F2000-01-17&rfr_id=info%3Asid%2Fen.wiki.x.io%3AModule+talk%3ACitation%2FCS1%2FArchive+10" class="Z3988"></span> <span class="cs1-visible-error citation-comment"><code class="cs1-code">{{[[Template:cite book|cite book]]}}</code>: </span><span class="cs1-visible-error citation-comment">Invalid <code class="cs1-code">|ref=harv</code> ([[Help:CS1 errors#invalid_param_val|help]])</span>
- Pass. December 31, 2014 – January 1, 2015.
- Fail – same year. December 30, 2014 – December 31, 2014.
{{cite book}}
: Check date values in:|date=
(help)
Open library parameter syntax, checking, and linking
We have a parameter, |ol=
which displays in the citation and links to the Open Library. Currently the link prefixes a "OL" to the value of the parameter and an "OL" is displayed to identify that this is an OLID:
{{cite book |last=Last |first=First |title=Title |ol = 1135607M }}
Last, First. Title. OL 1135607M.
However, in their listing pages the Open Library lists their identifiers already including the "OL" prefix (e.g. "OL1135607M"). It would be normal for an editor to expect to be able to copy and paste the identifier from the Open Library page into the citation template:
{{cite book |last=Last |first=First |title=Title |ol = OL1135607M }}
Last, First. Title. OL 1135607M.
Unfortunately, this does not currently work. The link is non-functional. In addition, no indication is given to the editor that there is a problem. In order to determine that there is an issue, the editor has to examine or test the link.
The "OL" in the ID appears to actually be a part of the ID. While I have not found an actual spec for the ID, looking at their API implies that the OL is part of the ID. We have been enforcing, by not linking properly with the "OL" present, not entering the "OL" in the |ol=
. That means that we can not suddenly switch to requiring it. We need to accept |ol=
both with and without "OL" as the first two characters.
At a minimum, we should change the module such that it does not add an additional "OL" to the link if one already exists in the provided OLID.
As to the visual aspect: While I do not find it visually appealing, having it appear as "OL OL1135607M" is consistent with the format which we have adopted of having a descriptor prior to each ID and displays the complete OLID.
The result of this is that that |ol=
parameters should be processed prior to both linking and display to add an OL to the |ol=
value if an "OL" does not already exist as the first two characters of the parameter value. — Makyen (talk) 06:11, 16 May 2014 (UTC)
Subscription required message
I was wondering if it could be possible to make the subscription required message a little bit prettier. Right now it has a double parentheses: "(subscription required (help))", and the "help" shows a tooltip. Couldn't the whole "(help)" just be removed and the tooltip applied to the whole message instead? --Atethnekos (Discussion, Contributions) 17:04, 20 May 2014 (UTC)
ISBN =
If I'm understanding this correctly.... If the parameter is ISBN=, the module will not check the ISBN number for errors. Should the 4,964 articles that contain | isbn =
be converted via a bot to | isbn =
? Amount of articles obtained from April's dump. Bgwhite (talk) 07:16, 27 April 2014 (UTC)
- If there is an ISBN in
|id=
it is not completely checked for format, but is linked if it does not fail some course format checking. If it is 13 digits and starts with 978 or 979 it is linked (e.g. ISBN 9781234567890 Parameter error in {{ISBN}}: checksum), but is not linked if it does not start with those digits (e.g. ISBN 9801234567890). If it is 10 digits (with X as a possible 10th character) it is linked (e.g. ISBN 123456789X). If it is not 10 or 13 digits it is not linked (e.g. ISBN 12345678901). [NOTE: I have not looked at the code for this which is part of MediaWiki, not the citation templates.] - At a minimum, there should be some additional logic to moving ISBNs out of
|id=
into|isbn=
. In many cases|id=
was used because|isdn=
is already occupied and would generate an error if it contained more than one ISBN. In addition, if the editor desired to have additional text prior to, or after, the ISBN then it may have been placed in|id=
for that reason. The|isbn=
parameter accepts nothing other than a strictly formatted ISBN with no other text permitted. If the|isbn=
is already occupied, then obviously an additional ISBN should not be moved out of|id=
into|isbn=
. If there is additional text in|id=
then it is a contextual edit where human editorial judgement should be applied and should not be performed by bot. - If the edit is strictly that
|isbn=
does not exist and an ISBN is in|id=
without additional text – other than "ISBN" – then yes it should be moved into|isbn=
. The contents of|id=
are not included in the COinS data, but|isbn=
is – NOTE: This is contrary to the documentation stating that "any of the identifiers" are included in the COinS data. However,|isbn=
is included in the COinS without any format corrections, which, I assume, is why it has been programmed to generate an error if the value is not strictly compliant as an ISBN (i.e. no other characters are tolerated). - In my opinion, it would be better for us to somewhat relax the formatting required in the
|isbn=
parameter. We could easily strip out all non-numeric characters prior to performing the ISBN format/check-digit verifications and passing that stripped version in the COinS. This would result in fewer errors, both for our editors and in the COinS data at the cost of a single regular expression substitution. In effect we would be permitting additional non-numeric text in the|isbn=
value. If desired, the regular expression could also strip a preceding "1[03]:" as that sequence is somewhat commonly used by editors, for some reason, to indicate that it is a 10, or 13 digit ISBN. — Makyen (talk) 08:42, 27 April 2014 (UTC)- Why do we need additional text? Do you have an example where this is needed? And multiple ISBNs or other identifiers are always suspect. I have only seen multiple ISBNs where someone is trying to identify multiple versions of a source, not the particular source they are using.
- It is not a question about when I think additional text is needed. My personal opinion is that it is a very rare occasion when it is actually needed. The one occurrence which I recall was on an author's Wikipedia page. The {{Cite book}} templates were used to format a list of the author's works. As part of the list, the ISBNs were supplied for all of the different versions of each book. A brief piece of text was supplied inline to describe the version of the book for each ISBN. I'm not sure I would make the same editorial choice, but I respect the fact that they had made that choice on that page.
- The additional text issue is a question of when a significant number of editors consider it appropriate to include such text and how we should handle the fact that it happens a significant amount of the time. Our checking for strict formatting on the ISBN appears to be due to using it in COinS, not just based on verifying that the provided ISBN text would enable a human to find the book, or that linking the ISBN to Special:BookSources will function. Special:BookSources appears to strip all non-numeric characters from what is passed to it. Humans can handle a much wider variety than the strict requirements we are currently applying to this field. We are imposing much stricter requirements that do not need to exist in order to accomplish the primary task of enabling someone to find the reference. The strict format requirement makes the template less user friendly when being a bit more user friendly (tolerant of a somewhat larger range of formats) costs very little and actually improves the quality of the data we are passing via COinS (i.e. we strip any extraneous text instead of only flagging an error).
- In going through Category:Pages with ISBN errors the most common additional text that actually has some meaning is to append a short descriptor about which version of the book the ISBN is for. For example: "{paperback}", "(pbk)", "(hardback)", "(hdb)", etc. Are these strictly necessary for identifying the book – assuming the ISBN is actually correct: no. As a human looking to acquire the exact book is it helpful information to know: yes.
- There are also a significant number of citations where effectively useless information is provided. For example prefixing the value with "10:", "13:" "ISBN", etc.
- I question why we consider the additional text as "errors" when they are in fact not an actual error, merely a deviation from strict formatting of this specific parameter. This is when the strict formatting is not needed for it to be functional in the way that it the information is primarily used (link to Special:BookSources) and most deviations from the strict formatting are trivially handled in the module to provide good data, in most cases, via COinS. The processing necessary to provide good data via COinS is a regular expression replacement. This is something we at least come close to doing already. Even for a properly formatted ISBN we have to strip out the "-" or " " characters in order to calculate the checksum.
- To cover a specific issue: I am not suggesting that we change what we display in the citation (except no error when it is now not an error). We currently display all text supplied in the
|isbn=
value. We should continue to do so. - As to multiple ISBNs in the same citation. Yes, of course, it is suspect. However, please note that what I said about multiple ISBNs was that the proposed move-the-ISBN-from-id-to-isbn bot should not create an error where none currently exists by either creating a duplicate
|isbn=
or by moving a second ISBN into the|isbn=
where it will be an error when an editor has already placed the second ISBN in|id=
where it does not create an error. I made no comment about the editorial choice to have multiple ISBNs in the citation, only that the bot should be programed to not create errors in the citation when it comes across some situations that are known to exist. — Makyen (talk) 13:57, 27 April 2014 (UTC)
- Why do we need additional text? Do you have an example where this is needed? And multiple ISBNs or other identifiers are always suspect. I have only seen multiple ISBNs where someone is trying to identify multiple versions of a source, not the particular source they are using.
|type=
is the proper parameter for your examples "{paperback}", "(pbk)", "(hardback)", "(hdb)" – without the brackets.
- —Trappist the monk (talk) 16:07, 27 April 2014 (UTC)
- @Trappist the monk: I both agree and disagree with
|type=
being most appropriate. When making these changes I have no history on the page, and no knowledge of any possible agreement about format. In my opinion, changes to correct citation errors should remain as close to the original editors intent as possible. Thus, for many cases I feel that it is more important to retain the intent of the original editor rather than use the "correct" parameter|type=
. - Here is an example which I encountered today:
- As originally in the page:
- Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9 (Print) 978-0-306-47916-8 (Online).
{{cite book}}
: Check|isbn=
value: invalid character (help); Unknown parameter|editors=
ignored (|editor=
suggested) (help)
- Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9 (Print) 978-0-306-47916-8 (Online).
- Using
|type=
and|id=
(location of "Print" disassociates it from the ISBN):- Fifty Years of the Shell Model — The Quest for the Effective Interaction (Print). Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9.
{{cite book}}
: More than one of|ISBN=
and|isbn=
specified (help); Unknown parameter|editors=
ignored (|editor=
suggested) (help)
- Fifty Years of the Shell Model — The Quest for the Effective Interaction (Print). Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9.
- Using
|id=
:- Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online).
{{cite book}}
: Unknown parameter|editors=
ignored (|editor=
suggested) (help)
- Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online).
- In my opinion, the version which does not use
|type=
is closer to what the original editor intended. - Note that this citation has other problems and would likely be better as (retaining the 2 ISBN numbers):
- Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. (eds.). Advances in Nuclear Physics, Volume 27. Advances in the Physics of Particles and Nuclei (APPN). Vol. 27. Springer-Verlag. pp. 1–275. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online) ISSN 0065-2970.
{{cite book}}
: External link in
(help)|series=
- Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. (eds.). Advances in Nuclear Physics, Volume 27. Advances in the Physics of Particles and Nuclei (APPN). Vol. 27. Springer-Verlag. pp. 1–275. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online) ISSN 0065-2970.
- — Makyen (talk) 23:58, 27 April 2014 (UTC)
- @Trappist the monk: I both agree and disagree with
- —Trappist the monk (talk) 16:07, 27 April 2014 (UTC)
- Yeah, as you show it,
|type=
doesn't work so well in your example, not because|type=
is wrong but because the original editor is wrong. The CS1 templates are designed to provide information about a single source. Here, the editor is trying to cite two versions of the same source in a single template. We should be glad that he didn't want to include the softcover version as well (ISBN 978-1-4757-8801-3). Perhaps the better solution to the multiple isbn problem is to choose one to use in the template and include the other(s) parenthetically outside the template. This at least avoids the error, includes an isbn in the COinS metadata, and still keeps the rest available:
- Yeah, as you show it,
- Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. (eds.). Advances in Nuclear Physics (hardback). Advances in the Physics of Particles and Nuclei (APPN). Vol. 27. Springer-Verlag. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. ISSN 0065-2970. (alternate: ISBN 978-0-306-47916-8 (Online); 978-1-4757-8801-3 (softcover))
- I took out
|url=
,|chapter-url=
,|pages=
, and removed the external link from|series=
.|doi=
gets the reader to the same place as|chapter-url=
where all you get is a sample of the table of contents and part of the introduction teaser as part of the publisher's effort to sell you a copy of the book;|url=
and the external link in|series=
is more selling. There is no point in listing a chapter and all of the pages that make up the chapter; that does nothing to help a reader find the cited information.
- I took out
id = ISBN
should not be changed wholesale to ISBN =
, for the reasons noted above. I think it would be reasonable for an editor using AWB to convert instances of id = ISBN
that contain plain ISBNs with no extraneous text, in citations where an ISBN is not present.
Some data: I have fixed about 3,000 of the 8,000 articles in Category:Pages with ISBN errors using an AutoEd script in the past couple of months. I have about 2,500 more articles to examine. The script has been able to fix about 60% of the articles I have examined. The most common fixable error, by far, is two ISBNs separated by a comma. These two ISBNs are usually the 10-digit ISBN followed by the 13-digit ISBN.
As for extra text, the examples given above are often present. Sometimes a "printing" or "edition" is present, though it is almost always redundant with |year=
. Sometimes multiple volumes, each with its own ISBN, are specified; I don't touch those.
When I am done going through the category, I expect there to be about 2,500 articles left. The large majority of those errors will be legitimate errors: ISBNs with too few or too many numbers. There will be somewhere under 1,000 "low-hanging fruit" still left, primarily ASINs, multiple ISBNs that were too strange or ambiguous for my scripting skills to handle, ISSNs, publisher names, and other easy fixes. After those are fixed, I expect we'll have under 2,000 actual ISBN problems to track down.
Anyone who would like to contribute to clearing out this category is welcome to do so. I recommend starting at the end of the alphabet, since the remaining articles that my script hasn't touched are in the A–N portion of the alphabet (I've been working my way from Z to A). – Jonesey95 (talk) 17:39, 27 April 2014 (UTC)
- @Jonesey95: I have been working on them from "A" forward. I was splitting multiple ISBNs in
|isbn=
into|isbn=
and|id=
until Redrose64 commented that a large number of them were just both the 10 and 13 digit ISBN for the same book and expressed a belief that the 10 digit one should be removed. I don't agree that there is consensus for us to wholesale override the choice of editors to put both a 10 and 13 digit ISBN into the citation. I fully agree that it is not needed, and would not do so myself. I just don't think that there is a wide enough consensus for us to remove them from thousands of articles. I have not been splitting them wholesale since that point. My intent was to go back through once it was clearer as to how to handle them. I also have not translated the code I wrote for a different purpose which decodes/formats/checks ISBNs from JavaScript to what is needed for AWB (which is the tool I use). Something which actually compares the two and verifies that they are 10/13 duplicates would be needed. - Looking at your script: Your script appears to delete the first ISBN unless it starts with 97[89] without any checks to see that this occurrence is actually a 10/13 duplicate. I consider this to be inappropriate. You may be deleting a non-duplicate. In addition, even in the case where it is a 10/13 duplicate, the editor has made the choice to include both. While I don't agree with that choice, I have not seen something that indicates a wide consensus for removing 10/13 duplicates from thousands of articles.
- I disagree with your choice to comment out any ISBN starting with 977. I have seen a good number of ISBNs which have had "97[89]" mistyped as "977". In these cases, changing the 977 to 97[89] was sufficient for the ISBN to be valid and find the correct book.
- I am not familiar with scripts for AutoEd. However, the replacements you are performing appear to be performed on the complete text of the article, not limited to citations. For the
|isbn=
parameter this might be sufficiently specific. On the other hand it might not. You might want to consider adding/changing your regular expressions to more specifically limit them to only being within citation templates. I use the following (or a variation upon):({{\s*[cC]it[ea](?:[^}{]*(?:\{\{[^}{]*}}[^}{]*)*)\|\s*)isbn(\s*=\s*)
- It also prevents matches with any parameters within one level of sub-template within the citation template. It could be more specific and prevent low probability matches within wiki-links (within citation templates), but a wiki-link with the displayed portion being the format of a parameter, "|\s*isbn\s*=\s*", is a low probability and these are not intended for unattended operation. Note that if there is more than one
|isbn=
in the citation this will match the one furthest from the{{\s*[Cc]it[ae]
. - As to ASINs: you change any that are explicitly called out as ASINs. I would suggest adding additional cases to that. My experience so far is that a sequence matching
B0[0-9A-Za-z]{8}
can safely be considered an ASIN even when not explicitly stated as an "ASIN". However, I have been actually clicking on the links created to verify the fact that is is an ASIN and is valid. I have not found a formal specification for ASIN numbers, but aside from those which are also ISBNs, that format has fit the ones I have seen. — Makyen (talk) 23:58, 27 April 2014 (UTC)- Looks like I spoke a bit too soon about using
B0[0-9A-Za-z]{8}
as indicating an ASIN. I just encountered 4 on a page. Three of them were invalid as ASINs. Although, I have not previously encountered ones which turned up invalid when changed to|asin=
based on that criteria.— Makyen (talk) 00:07, 28 April 2014 (UTC)- Thanks for the tips. I will see if I can incorporate some of them into my editing.
- Looks like I spoke a bit too soon about using
- My answer to most of your concerns is that I visually inspect each article's ISBN errors before running my script, and then I visually inspect each of the script's proposed edits before saving. There are plenty of articles that I skip because I can see in advance or after running the script (but before saving) that the script will produce undesirable results.
- I believe that I am commenting out only 13-digit "977" numbers, which are typically UPC bar codes; I don't see many of these. I look at the citation to confirm that it does not appear to be a book before doing so, but I comment it out instead of deleting it because I can't be sure. There is a particular editor who has inserted many "977" numbers, allegedly for Billboard Brasil, as ISSNs and ISBNs. I did a ton of research to try to find a valid ISSN for these, and failed, so I resorted to commenting them out.
- ASINs: There are a couple hundred apparent ASINs in the category. I didn't feel comfortable changing them without checking each one manually, so I have saved them for a second pass.
- As for removing a 10-digit ISBN when a 13-digit ISBN is also present, my understanding is that they contain identical information and lead the reader to the same book (at worldcat.org, for example) when clicked. The CS1 error help text explicitly says to "Use the 13-digit ISBN when it is available" and that "Only one ISBN is allowed in this field" because it breaks the metadata and breaks the link to Special:BookSources. – Jonesey95 (talk) 01:08, 28 April 2014 (UTC)
- Including multiple ISBNs, such as for print and online is an issue, since we can not definitively determine which version was consulted. Fixing these has the same problem, where we cannot determine the definitive source. -- Gadget850 talk 01:11, 28 April 2014 (UTC)
- Multiple ISBNs may be useful outside of references, in a list of works. For example, the subject of an article might be the editor of a multi-volume encyclopedia, for example, where each volume has its own ISBN. In that case, putting all of the ISBNs into
|isbn=
is not appropriate, but neither is removing all but one ISBN. Using|id=
or putting the ISBNs outside of the citation template might work; I haven't given it enough thought yet, since I've been working on the easy fixes. – Jonesey95 (talk) 01:17, 28 April 2014 (UTC)- wp:SAYWHEREYOUGOTIT pertains. If we can't tell which was seen due to multiple ISBNs, we imply they are equivalent (down to pagination). In that case it might be cleaner to cite OCLC 70752232 or OL 9534802M.LeadSongDog come howl! 13:49, 28 April 2014 (UTC)
- Multiple ISBNs may be useful outside of references, in a list of works. For example, the subject of an article might be the editor of a multi-volume encyclopedia, for example, where each volume has its own ISBN. In that case, putting all of the ISBNs into
- 13-digit numbers beginning 977 are the EAN-13 representation of an ISSN, but they are not ISSNs: a true ISSN has eight digits. It is not always easy to convert an EAN-13 to an ISSN: for example, The Railway Magazine is ISSN 0033-8923 and the barcode is 977-0033-89229-3 - clearly seven digits correspond, but I don't know about the rest. --Redrose64 (talk) 17:48, 28 April 2014 (UTC)
- If a multi-volume work has an ISBN for each volume, then I recommend listing each volume individually with the appropriate ISBN. Otherwise, there is no connection between the volume and the ISBN. -- Gadget850 talk 13:03, 29 April 2014 (UTC)
- Including multiple ISBNs, such as for print and online is an issue, since we can not definitively determine which version was consulted. Fixing these has the same problem, where we cannot determine the definitive source. -- Gadget850 talk 01:11, 28 April 2014 (UTC)
- As for removing a 10-digit ISBN when a 13-digit ISBN is also present, my understanding is that they contain identical information and lead the reader to the same book (at worldcat.org, for example) when clicked. The CS1 error help text explicitly says to "Use the 13-digit ISBN when it is available" and that "Only one ISBN is allowed in this field" because it breaks the metadata and breaks the link to Special:BookSources. – Jonesey95 (talk) 01:08, 28 April 2014 (UTC)
Multiple ISBNs
Would it be feasible to have multiple instances of {{{isbn}}}, each associated with a {{{type}}}?
For example, the above example could be converted to
{{cite book |chapter=Fifty Years of the Shell Model — The Quest for the Effective Interaction |date=2003 |publisher=[[Springer-Verlag]] |doi=10.1007/0-306-47916-8_1 |title=Advances in Nuclear Physics |volume=27 |first=Igal |last=Talmi |editor1-first=J. W. |editor1-last=Negele |editor2-first=E. W. |editor2-last=Vogt |isbn1 = 978-0-306-47708-9 |type1=hardback |issn=0065-2970 |series = Advances in the Physics of Particles and Nuclei (APPN)|isbn2 = 978-0-306-47916-8 | type2 = Online | isbn3 = 978-1-4757-8801-3 | type3 = softcover}}
We would default to {{{isbn1}}} or simply {{{isbn}}} for generating COinS metadata, just like at present.
HTH HAND —Phil | Talk 17:40, 15 May 2014 (UTC)
- No. Where would it stop? Some books have many more than one ISBN - paperback/hardback; audio; USA/UK/Australia/etc. publisher; separate volumes or all-in-one; special coffee-table binding. How many do you need? The answer to that is: give the ISBN of the edition that you actually consulted, and no other. --Redrose64 (talk) 17:52, 15 May 2014 (UTC)
- There should only be one - the one the page numbers were taken from. Keith D (talk) 18:40, 15 May 2014 (UTC)
- We should not be encouraging storing a significant list of different ISBN numbers. The one which should be selected is the one, without modification, which is printed in the book actually being referenced. If there is more than one printed, use the one that matches the version of the book in-hand. If there is both a 10-digit and a 13-digit version printed in the book, the 13 digit version is preferred. Do not convert from a 10-digit version to a 13-digit version by just adding the 978-; it will be wrong. Do not convert a 13-digit version to a 10-digit version by removing the 978-; it will also be wrong. Use the version as printed in the book.
- There are ways to have more than one ISBN if the
|id=
is used, but that should be an exception, not a rule. If we were going to start listing all of the different identifiers for every edition/version of a book, as Redrose64 said "where would it stop?" As an example: a reference on which I was attempting to fix the ISBN earlier today was citing Magic and Mystery in Tibet. Should we be listing identifiers for all of the 60 versions listed in WorldCat? - If the citing editor has actually checked multiple versions to find that the page numbers and text are exactly the same, then it is reasonable for them to list more than one identifier. The
|id=
parameter can be used for this purpose and as long as the text "ISBN" precedes a valid format ISBN it will be linked to Special:BookSources by the MediaWiki software. (see Help:Magic links)- On the other hand, We should not generate badly formed COinS data if there are extraneous non-numeric characters in the
|isbn=
parameter. Removing everything other than digits is trivial.
- On the other hand, We should not generate badly formed COinS data if there are extraneous non-numeric characters in the
- I also believe that we should not generate an error if there is extraneous non-numeric text in the ISBN parameter. All non-numeric text can be removed prior to processing with a single regular expression substitution. We are already performing one regular expression substitution to remove the "-" marks. Given the ease with which all extraneous non-numeric text can be removed – particularly given we are already removing some such text (hyphens) – it feels like we are going out of our way to make the requirements for this parameter more stringent than is needed in order to meet the goals of an accurate link to Special:BookSources and valid COinS data. In fact, we appear to choose to provide bad COinS data when providing good COinS data in a larger percentage of cases is trivial. Just removing such extraneous text prior to checksum verification and forwarding to COinS is slightly easier, from a processing point of view, than what is currently done and results in both that parameter being much more user friendly and our providing good COinS data in a higher percentage of citations. — Makyen (talk) 02:35, 16 May 2014 (UTC)
- There should only be one - the one the page numbers were taken from. Keith D (talk) 18:40, 15 May 2014 (UTC)
ISBN =
If I'm understanding this correctly.... If the parameter is ISBN=, the module will not check the ISBN number for errors. Should the 4,964 articles that contain | isbn =
be converted via a bot to | isbn =
? Amount of articles obtained from April's dump. Bgwhite (talk) 07:16, 27 April 2014 (UTC)
- If there is an ISBN in
|id=
it is not completely checked for format, but is linked if it does not fail some course format checking. If it is 13 digits and starts with 978 or 979 it is linked (e.g. ISBN 9781234567890 Parameter error in {{ISBN}}: checksum), but is not linked if it does not start with those digits (e.g. ISBN 9801234567890). If it is 10 digits (with X as a possible 10th character) it is linked (e.g. ISBN 123456789X). If it is not 10 or 13 digits it is not linked (e.g. ISBN 12345678901). [NOTE: I have not looked at the code for this which is part of MediaWiki, not the citation templates.] - At a minimum, there should be some additional logic to moving ISBNs out of
|id=
into|isbn=
. In many cases|id=
was used because|isdn=
is already occupied and would generate an error if it contained more than one ISBN. In addition, if the editor desired to have additional text prior to, or after, the ISBN then it may have been placed in|id=
for that reason. The|isbn=
parameter accepts nothing other than a strictly formatted ISBN with no other text permitted. If the|isbn=
is already occupied, then obviously an additional ISBN should not be moved out of|id=
into|isbn=
. If there is additional text in|id=
then it is a contextual edit where human editorial judgement should be applied and should not be performed by bot. - If the edit is strictly that
|isbn=
does not exist and an ISBN is in|id=
without additional text – other than "ISBN" – then yes it should be moved into|isbn=
. The contents of|id=
are not included in the COinS data, but|isbn=
is – NOTE: This is contrary to the documentation stating that "any of the identifiers" are included in the COinS data. However,|isbn=
is included in the COinS without any format corrections, which, I assume, is why it has been programmed to generate an error if the value is not strictly compliant as an ISBN (i.e. no other characters are tolerated). - In my opinion, it would be better for us to somewhat relax the formatting required in the
|isbn=
parameter. We could easily strip out all non-numeric characters prior to performing the ISBN format/check-digit verifications and passing that stripped version in the COinS. This would result in fewer errors, both for our editors and in the COinS data at the cost of a single regular expression substitution. In effect we would be permitting additional non-numeric text in the|isbn=
value. If desired, the regular expression could also strip a preceding "1[03]:" as that sequence is somewhat commonly used by editors, for some reason, to indicate that it is a 10, or 13 digit ISBN. — Makyen (talk) 08:42, 27 April 2014 (UTC)- Why do we need additional text? Do you have an example where this is needed? And multiple ISBNs or other identifiers are always suspect. I have only seen multiple ISBNs where someone is trying to identify multiple versions of a source, not the particular source they are using.
- It is not a question about when I think additional text is needed. My personal opinion is that it is a very rare occasion when it is actually needed. The one occurrence which I recall was on an author's Wikipedia page. The {{Cite book}} templates were used to format a list of the author's works. As part of the list, the ISBNs were supplied for all of the different versions of each book. A brief piece of text was supplied inline to describe the version of the book for each ISBN. I'm not sure I would make the same editorial choice, but I respect the fact that they had made that choice on that page.
- The additional text issue is a question of when a significant number of editors consider it appropriate to include such text and how we should handle the fact that it happens a significant amount of the time. Our checking for strict formatting on the ISBN appears to be due to using it in COinS, not just based on verifying that the provided ISBN text would enable a human to find the book, or that linking the ISBN to Special:BookSources will function. Special:BookSources appears to strip all non-numeric characters from what is passed to it. Humans can handle a much wider variety than the strict requirements we are currently applying to this field. We are imposing much stricter requirements that do not need to exist in order to accomplish the primary task of enabling someone to find the reference. The strict format requirement makes the template less user friendly when being a bit more user friendly (tolerant of a somewhat larger range of formats) costs very little and actually improves the quality of the data we are passing via COinS (i.e. we strip any extraneous text instead of only flagging an error).
- In going through Category:Pages with ISBN errors the most common additional text that actually has some meaning is to append a short descriptor about which version of the book the ISBN is for. For example: "{paperback}", "(pbk)", "(hardback)", "(hdb)", etc. Are these strictly necessary for identifying the book – assuming the ISBN is actually correct: no. As a human looking to acquire the exact book is it helpful information to know: yes.
- There are also a significant number of citations where effectively useless information is provided. For example prefixing the value with "10:", "13:" "ISBN", etc.
- I question why we consider the additional text as "errors" when they are in fact not an actual error, merely a deviation from strict formatting of this specific parameter. This is when the strict formatting is not needed for it to be functional in the way that it the information is primarily used (link to Special:BookSources) and most deviations from the strict formatting are trivially handled in the module to provide good data, in most cases, via COinS. The processing necessary to provide good data via COinS is a regular expression replacement. This is something we at least come close to doing already. Even for a properly formatted ISBN we have to strip out the "-" or " " characters in order to calculate the checksum.
- To cover a specific issue: I am not suggesting that we change what we display in the citation (except no error when it is now not an error). We currently display all text supplied in the
|isbn=
value. We should continue to do so. - As to multiple ISBNs in the same citation. Yes, of course, it is suspect. However, please note that what I said about multiple ISBNs was that the proposed move-the-ISBN-from-id-to-isbn bot should not create an error where none currently exists by either creating a duplicate
|isbn=
or by moving a second ISBN into the|isbn=
where it will be an error when an editor has already placed the second ISBN in|id=
where it does not create an error. I made no comment about the editorial choice to have multiple ISBNs in the citation, only that the bot should be programed to not create errors in the citation when it comes across some situations that are known to exist. — Makyen (talk) 13:57, 27 April 2014 (UTC)
- Why do we need additional text? Do you have an example where this is needed? And multiple ISBNs or other identifiers are always suspect. I have only seen multiple ISBNs where someone is trying to identify multiple versions of a source, not the particular source they are using.
|type=
is the proper parameter for your examples "{paperback}", "(pbk)", "(hardback)", "(hdb)" – without the brackets.
- —Trappist the monk (talk) 16:07, 27 April 2014 (UTC)
- @Trappist the monk: I both agree and disagree with
|type=
being most appropriate. When making these changes I have no history on the page, and no knowledge of any possible agreement about format. In my opinion, changes to correct citation errors should remain as close to the original editors intent as possible. Thus, for many cases I feel that it is more important to retain the intent of the original editor rather than use the "correct" parameter|type=
. - Here is an example which I encountered today:
- As originally in the page:
- Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9 (Print) 978-0-306-47916-8 (Online).
{{cite book}}
: Check|isbn=
value: invalid character (help); Unknown parameter|editors=
ignored (|editor=
suggested) (help)
- Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9 (Print) 978-0-306-47916-8 (Online).
- Using
|type=
and|id=
(location of "Print" disassociates it from the ISBN):- Fifty Years of the Shell Model — The Quest for the Effective Interaction (Print). Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9.
{{cite book}}
: More than one of|ISBN=
and|isbn=
specified (help); Unknown parameter|editors=
ignored (|editor=
suggested) (help)
- Fifty Years of the Shell Model — The Quest for the Effective Interaction (Print). Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9.
- Using
|id=
:- Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online).
{{cite book}}
: Unknown parameter|editors=
ignored (|editor=
suggested) (help)
- Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online).
- In my opinion, the version which does not use
|type=
is closer to what the original editor intended. - Note that this citation has other problems and would likely be better as (retaining the 2 ISBN numbers):
- Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. (eds.). Advances in Nuclear Physics, Volume 27. Advances in the Physics of Particles and Nuclei (APPN). Vol. 27. Springer-Verlag. pp. 1–275. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online) ISSN 0065-2970.
{{cite book}}
: External link in
(help)|series=
- Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. (eds.). Advances in Nuclear Physics, Volume 27. Advances in the Physics of Particles and Nuclei (APPN). Vol. 27. Springer-Verlag. pp. 1–275. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online) ISSN 0065-2970.
- — Makyen (talk) 23:58, 27 April 2014 (UTC)
- @Trappist the monk: I both agree and disagree with
- —Trappist the monk (talk) 16:07, 27 April 2014 (UTC)
- Yeah, as you show it,
|type=
doesn't work so well in your example, not because|type=
is wrong but because the original editor is wrong. The CS1 templates are designed to provide information about a single source. Here, the editor is trying to cite two versions of the same source in a single template. We should be glad that he didn't want to include the softcover version as well (ISBN 978-1-4757-8801-3). Perhaps the better solution to the multiple isbn problem is to choose one to use in the template and include the other(s) parenthetically outside the template. This at least avoids the error, includes an isbn in the COinS metadata, and still keeps the rest available:
- Yeah, as you show it,
- Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. (eds.). Advances in Nuclear Physics (hardback). Advances in the Physics of Particles and Nuclei (APPN). Vol. 27. Springer-Verlag. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. ISSN 0065-2970. (alternate: ISBN 978-0-306-47916-8 (Online); 978-1-4757-8801-3 (softcover))
- I took out
|url=
,|chapter-url=
,|pages=
, and removed the external link from|series=
.|doi=
gets the reader to the same place as|chapter-url=
where all you get is a sample of the table of contents and part of the introduction teaser as part of the publisher's effort to sell you a copy of the book;|url=
and the external link in|series=
is more selling. There is no point in listing a chapter and all of the pages that make up the chapter; that does nothing to help a reader find the cited information.
- I took out
id = ISBN
should not be changed wholesale to ISBN =
, for the reasons noted above. I think it would be reasonable for an editor using AWB to convert instances of id = ISBN
that contain plain ISBNs with no extraneous text, in citations where an ISBN is not present.
Some data: I have fixed about 3,000 of the 8,000 articles in Category:Pages with ISBN errors using an AutoEd script in the past couple of months. I have about 2,500 more articles to examine. The script has been able to fix about 60% of the articles I have examined. The most common fixable error, by far, is two ISBNs separated by a comma. These two ISBNs are usually the 10-digit ISBN followed by the 13-digit ISBN.
As for extra text, the examples given above are often present. Sometimes a "printing" or "edition" is present, though it is almost always redundant with |year=
. Sometimes multiple volumes, each with its own ISBN, are specified; I don't touch those.
When I am done going through the category, I expect there to be about 2,500 articles left. The large majority of those errors will be legitimate errors: ISBNs with too few or too many numbers. There will be somewhere under 1,000 "low-hanging fruit" still left, primarily ASINs, multiple ISBNs that were too strange or ambiguous for my scripting skills to handle, ISSNs, publisher names, and other easy fixes. After those are fixed, I expect we'll have under 2,000 actual ISBN problems to track down.
Anyone who would like to contribute to clearing out this category is welcome to do so. I recommend starting at the end of the alphabet, since the remaining articles that my script hasn't touched are in the A–N portion of the alphabet (I've been working my way from Z to A). – Jonesey95 (talk) 17:39, 27 April 2014 (UTC)
- @Jonesey95: I have been working on them from "A" forward. I was splitting multiple ISBNs in
|isbn=
into|isbn=
and|id=
until Redrose64 commented that a large number of them were just both the 10 and 13 digit ISBN for the same book and expressed a belief that the 10 digit one should be removed. I don't agree that there is consensus for us to wholesale override the choice of editors to put both a 10 and 13 digit ISBN into the citation. I fully agree that it is not needed, and would not do so myself. I just don't think that there is a wide enough consensus for us to remove them from thousands of articles. I have not been splitting them wholesale since that point. My intent was to go back through once it was clearer as to how to handle them. I also have not translated the code I wrote for a different purpose which decodes/formats/checks ISBNs from JavaScript to what is needed for AWB (which is the tool I use). Something which actually compares the two and verifies that they are 10/13 duplicates would be needed. - Looking at your script: Your script appears to delete the first ISBN unless it starts with 97[89] without any checks to see that this occurrence is actually a 10/13 duplicate. I consider this to be inappropriate. You may be deleting a non-duplicate. In addition, even in the case where it is a 10/13 duplicate, the editor has made the choice to include both. While I don't agree with that choice, I have not seen something that indicates a wide consensus for removing 10/13 duplicates from thousands of articles.
- I disagree with your choice to comment out any ISBN starting with 977. I have seen a good number of ISBNs which have had "97[89]" mistyped as "977". In these cases, changing the 977 to 97[89] was sufficient for the ISBN to be valid and find the correct book.
- I am not familiar with scripts for AutoEd. However, the replacements you are performing appear to be performed on the complete text of the article, not limited to citations. For the
|isbn=
parameter this might be sufficiently specific. On the other hand it might not. You might want to consider adding/changing your regular expressions to more specifically limit them to only being within citation templates. I use the following (or a variation upon):({{\s*[cC]it[ea](?:[^}{]*(?:\{\{[^}{]*}}[^}{]*)*)\|\s*)isbn(\s*=\s*)
- It also prevents matches with any parameters within one level of sub-template within the citation template. It could be more specific and prevent low probability matches within wiki-links (within citation templates), but a wiki-link with the displayed portion being the format of a parameter, "|\s*isbn\s*=\s*", is a low probability and these are not intended for unattended operation. Note that if there is more than one
|isbn=
in the citation this will match the one furthest from the{{\s*[Cc]it[ae]
. - As to ASINs: you change any that are explicitly called out as ASINs. I would suggest adding additional cases to that. My experience so far is that a sequence matching
B0[0-9A-Za-z]{8}
can safely be considered an ASIN even when not explicitly stated as an "ASIN". However, I have been actually clicking on the links created to verify the fact that is is an ASIN and is valid. I have not found a formal specification for ASIN numbers, but aside from those which are also ISBNs, that format has fit the ones I have seen. — Makyen (talk) 23:58, 27 April 2014 (UTC)- Looks like I spoke a bit too soon about using
B0[0-9A-Za-z]{8}
as indicating an ASIN. I just encountered 4 on a page. Three of them were invalid as ASINs. Although, I have not previously encountered ones which turned up invalid when changed to|asin=
based on that criteria.— Makyen (talk) 00:07, 28 April 2014 (UTC)- Thanks for the tips. I will see if I can incorporate some of them into my editing.
- Looks like I spoke a bit too soon about using
- My answer to most of your concerns is that I visually inspect each article's ISBN errors before running my script, and then I visually inspect each of the script's proposed edits before saving. There are plenty of articles that I skip because I can see in advance or after running the script (but before saving) that the script will produce undesirable results.
- I believe that I am commenting out only 13-digit "977" numbers, which are typically UPC bar codes; I don't see many of these. I look at the citation to confirm that it does not appear to be a book before doing so, but I comment it out instead of deleting it because I can't be sure. There is a particular editor who has inserted many "977" numbers, allegedly for Billboard Brasil, as ISSNs and ISBNs. I did a ton of research to try to find a valid ISSN for these, and failed, so I resorted to commenting them out.
- ASINs: There are a couple hundred apparent ASINs in the category. I didn't feel comfortable changing them without checking each one manually, so I have saved them for a second pass.
- As for removing a 10-digit ISBN when a 13-digit ISBN is also present, my understanding is that they contain identical information and lead the reader to the same book (at worldcat.org, for example) when clicked. The CS1 error help text explicitly says to "Use the 13-digit ISBN when it is available" and that "Only one ISBN is allowed in this field" because it breaks the metadata and breaks the link to Special:BookSources. – Jonesey95 (talk) 01:08, 28 April 2014 (UTC)
- Including multiple ISBNs, such as for print and online is an issue, since we can not definitively determine which version was consulted. Fixing these has the same problem, where we cannot determine the definitive source. -- Gadget850 talk 01:11, 28 April 2014 (UTC)
- Multiple ISBNs may be useful outside of references, in a list of works. For example, the subject of an article might be the editor of a multi-volume encyclopedia, for example, where each volume has its own ISBN. In that case, putting all of the ISBNs into
|isbn=
is not appropriate, but neither is removing all but one ISBN. Using|id=
or putting the ISBNs outside of the citation template might work; I haven't given it enough thought yet, since I've been working on the easy fixes. – Jonesey95 (talk) 01:17, 28 April 2014 (UTC)- wp:SAYWHEREYOUGOTIT pertains. If we can't tell which was seen due to multiple ISBNs, we imply they are equivalent (down to pagination). In that case it might be cleaner to cite OCLC 70752232 or OL 9534802M.LeadSongDog come howl! 13:49, 28 April 2014 (UTC)
- Multiple ISBNs may be useful outside of references, in a list of works. For example, the subject of an article might be the editor of a multi-volume encyclopedia, for example, where each volume has its own ISBN. In that case, putting all of the ISBNs into
- 13-digit numbers beginning 977 are the EAN-13 representation of an ISSN, but they are not ISSNs: a true ISSN has eight digits. It is not always easy to convert an EAN-13 to an ISSN: for example, The Railway Magazine is ISSN 0033-8923 and the barcode is 977-0033-89229-3 - clearly seven digits correspond, but I don't know about the rest. --Redrose64 (talk) 17:48, 28 April 2014 (UTC)
- If a multi-volume work has an ISBN for each volume, then I recommend listing each volume individually with the appropriate ISBN. Otherwise, there is no connection between the volume and the ISBN. -- Gadget850 talk 13:03, 29 April 2014 (UTC)
- Including multiple ISBNs, such as for print and online is an issue, since we can not definitively determine which version was consulted. Fixing these has the same problem, where we cannot determine the definitive source. -- Gadget850 talk 01:11, 28 April 2014 (UTC)
- As for removing a 10-digit ISBN when a 13-digit ISBN is also present, my understanding is that they contain identical information and lead the reader to the same book (at worldcat.org, for example) when clicked. The CS1 error help text explicitly says to "Use the 13-digit ISBN when it is available" and that "Only one ISBN is allowed in this field" because it breaks the metadata and breaks the link to Special:BookSources. – Jonesey95 (talk) 01:08, 28 April 2014 (UTC)
Multiple ISBNs
Would it be feasible to have multiple instances of {{{isbn}}}, each associated with a {{{type}}}?
For example, the above example could be converted to
{{cite book |chapter=Fifty Years of the Shell Model — The Quest for the Effective Interaction |date=2003 |publisher=[[Springer-Verlag]] |doi=10.1007/0-306-47916-8_1 |title=Advances in Nuclear Physics |volume=27 |first=Igal |last=Talmi |editor1-first=J. W. |editor1-last=Negele |editor2-first=E. W. |editor2-last=Vogt |isbn1 = 978-0-306-47708-9 |type1=hardback |issn=0065-2970 |series = Advances in the Physics of Particles and Nuclei (APPN)|isbn2 = 978-0-306-47916-8 | type2 = Online | isbn3 = 978-1-4757-8801-3 | type3 = softcover}}
We would default to {{{isbn1}}} or simply {{{isbn}}} for generating COinS metadata, just like at present.
HTH HAND —Phil | Talk 17:40, 15 May 2014 (UTC)
- No. Where would it stop? Some books have many more than one ISBN - paperback/hardback; audio; USA/UK/Australia/etc. publisher; separate volumes or all-in-one; special coffee-table binding. How many do you need? The answer to that is: give the ISBN of the edition that you actually consulted, and no other. --Redrose64 (talk) 17:52, 15 May 2014 (UTC)
- There should only be one - the one the page numbers were taken from. Keith D (talk) 18:40, 15 May 2014 (UTC)
- We should not be encouraging storing a significant list of different ISBN numbers. The one which should be selected is the one, without modification, which is printed in the book actually being referenced. If there is more than one printed, use the one that matches the version of the book in-hand. If there is both a 10-digit and a 13-digit version printed in the book, the 13 digit version is preferred. Do not convert from a 10-digit version to a 13-digit version by just adding the 978-; it will be wrong. Do not convert a 13-digit version to a 10-digit version by removing the 978-; it will also be wrong. Use the version as printed in the book.
- There are ways to have more than one ISBN if the
|id=
is used, but that should be an exception, not a rule. If we were going to start listing all of the different identifiers for every edition/version of a book, as Redrose64 said "where would it stop?" As an example: a reference on which I was attempting to fix the ISBN earlier today was citing Magic and Mystery in Tibet. Should we be listing identifiers for all of the 60 versions listed in WorldCat? - If the citing editor has actually checked multiple versions to find that the page numbers and text are exactly the same, then it is reasonable for them to list more than one identifier. The
|id=
parameter can be used for this purpose and as long as the text "ISBN" precedes a valid format ISBN it will be linked to Special:BookSources by the MediaWiki software. (see Help:Magic links)- On the other hand, We should not generate badly formed COinS data if there are extraneous non-numeric characters in the
|isbn=
parameter. Removing everything other than digits is trivial.
- On the other hand, We should not generate badly formed COinS data if there are extraneous non-numeric characters in the
- I also believe that we should not generate an error if there is extraneous non-numeric text in the ISBN parameter. All non-numeric text can be removed prior to processing with a single regular expression substitution. We are already performing one regular expression substitution to remove the "-" marks. Given the ease with which all extraneous non-numeric text can be removed – particularly given we are already removing some such text (hyphens) – it feels like we are going out of our way to make the requirements for this parameter more stringent than is needed in order to meet the goals of an accurate link to Special:BookSources and valid COinS data. In fact, we appear to choose to provide bad COinS data when providing good COinS data in a larger percentage of cases is trivial. Just removing such extraneous text prior to checksum verification and forwarding to COinS is slightly easier, from a processing point of view, than what is currently done and results in both that parameter being much more user friendly and our providing good COinS data in a higher percentage of citations. — Makyen (talk) 02:35, 16 May 2014 (UTC)
- There should only be one - the one the page numbers were taken from. Keith D (talk) 18:40, 15 May 2014 (UTC)
website
|website=
is listed on the Whitelist; however, if used in conjunction with |archiveurl=
, it generates an error message:
{{cite book |archiveurl=//www.web.archive.org |archivedate=May 15, 2014 |deadurl=no |author=Johnson, Malcom |title=Sample Title |website=http://www.example.com}}
If |url=
is populated, the error is resolved, but a bare url displays in the citation. (This is also what happens if |archiveurl=
isn't populated.)
{{cite book |url=http://www.example.com |archiveurl=//www.web.archive.org |archivedate=May 15, 2014 |deadurl=no |author=Johnson, Malcom |title=Sample Title |website=http://www.example.com}}
- Johnson, Malcom. Sample Title. Archived from the original on May 15, 2014.
{{cite book}}
:|website=
ignored (help); External link in
(help); Unknown parameter|website=
|deadurl=
ignored (|url-status=
suggested) (help)
- Johnson, Malcom. Sample Title. Archived from the original on May 15, 2014.
Is this the desired behavior of this parameter, or is it a glitch? I would think that |website=
is an alias of |url=
; the AWB renaming script currently replaces it with |url=
. Should it continue to do so? Or is this a glitch that will be fixed? I'm about to post a lengthy list of valid alias parameters that the script is currently replacing on that talk page; if the script shouldn't continue to replace |website=
with |url=
, please be sure to comment there. Thanks!—D'Ranged 1 VTalk 00:14, 28 May 2014 (UTC)
|website=
is an alias of|work=
, for the name of a website, not the address. Imzadi 1979 → 00:17, 28 May 2014 (UTC)- Imzadi1979 Thank you; I was mistaken, the script is currently replacing
|website=
with|work=
, which is unnecessary. Sorry I was confused; I've straightened it out over there. Thanks again!—D'Ranged 1 VTalk
- Imzadi1979 Thank you; I was mistaken, the script is currently replacing
Valid parameters missing from Whitelist
These parameters are not on the Whitelist, but are used successfully in templates.
|eprint=
in {{cite arXiv}}; a valid alias for|arxiv=
, which is required.|class=
in {{cite arXiv}}; optional|pmc-embargo-date=
in {{arxiv}} coding; however, it doesn't appear in the documentation. Possibly an alias for|embargo=
on the Whitelist.
I thought this was supposed to be a current, complete listing of all approved parameters for the templates. If it isn't, one is needed. I've made changes to the list of parameters for Citation bot based on this list in an attempt to avoid the bot making errors; I now have to undo some of those changes.—D'Ranged 1 VTalk 16:58, 30 May 2014 (UTC)
- I think you meant {{Cite arXiv}}, which does not yet use the CS1 Lua module to render its citation. The Whitelist is only for cite templates that use the module, and
|eprint=
is not used in any of those cite templates. - It's confusing, but the Whitelist is correct without those parameters. If {{Cite arXiv}} is migrated to use the module,
|eprint=
may need to be added to the Whitelist. There is a list of cite templates that use the module at Help:Citation Style 1; they are highlighted in light green in the left column. – Jonesey95 (talk) 19:25, 30 May 2014 (UTC)- Yes, I meant {{cite arXiv}}; I've changed my original post to reflect that; thank you. I'm still confused, however. Help:Citation Style 1#Specific source states: "There are a number of templates that are CS1 compliant but are tied to a specific source; these are listed in Category:Citation Style 1 specific-source templates." (emphasis mine) It then goes on to specifically list {{cite arXiv}}; however, that template is not in the stated category, but is in the "core" category, Category:Citation Style 1 templates. Either way, if a template is considered to be "CS1 compliant", I would expect its parameters to be part of the Whitelist. Additionally, Template:Citation Style documentation/cs1, which is transcluded on {{cite arXiv}} and other modules that are not highlighted in the list of modules, doesn't distinguish between those using Module:Citation/CS1 and those using {{citation/core}}. I thought since they were listed, they were using CS1. So, questions: Should {{cite arXiv}}'s category be changed? Should the language "CS1 compliant" be modified on the Help page? Should the templates listed at Template:Citation Style documentation/cs1 indicate their source? Sorry to be a bother; thanks for your patience—I really appreciate the help.—D'Ranged 1 VTalk 20:31, 30 May 2014 (UTC)
{{Cite arXiv}}
is essentially a special case of{{cite journal}}
, where some of the parameters (like|journal=
|work=
and|publisher=
) put the page into an error category, and a few extra parameters are recognised. These,|eprint=
(and its alias|arxiv=
),|version=
and|class=
are used to construct special links. To cope with these variations, it still uses the older{{Citation/core}}
method instead of Module:Citation/CS1. --Redrose64 (talk) 21:39, 30 May 2014 (UTC)
- Yes, I meant {{cite arXiv}}; I've changed my original post to reflect that; thank you. I'm still confused, however. Help:Citation Style 1#Specific source states: "There are a number of templates that are CS1 compliant but are tied to a specific source; these are listed in Category:Citation Style 1 specific-source templates." (emphasis mine) It then goes on to specifically list {{cite arXiv}}; however, that template is not in the stated category, but is in the "core" category, Category:Citation Style 1 templates. Either way, if a template is considered to be "CS1 compliant", I would expect its parameters to be part of the Whitelist. Additionally, Template:Citation Style documentation/cs1, which is transcluded on {{cite arXiv}} and other modules that are not highlighted in the list of modules, doesn't distinguish between those using Module:Citation/CS1 and those using {{citation/core}}. I thought since they were listed, they were using CS1. So, questions: Should {{cite arXiv}}'s category be changed? Should the language "CS1 compliant" be modified on the Help page? Should the templates listed at Template:Citation Style documentation/cs1 indicate their source? Sorry to be a bother; thanks for your patience—I really appreciate the help.—D'Ranged 1 VTalk 20:31, 30 May 2014 (UTC)
Author check
Could there be a check on the author field to detect when it contains date type information, such as |author=Published on Mon May 19 14:30:16 BST 2008
, that occurs in numerous articles. Probably need to add a tracking category when this occurs so that they can be fixed by removal or transferring information to the |date=
field. Regards. Keith D (talk) 00:46, 20 June 2014 (UTC)
- Interesting idea. Do you have ideas for patterns that would detect erroneous author values while preventing false positives? It seems possible that a valid author value might contain a date, like "May 2014 Conference Organizing Committee" or something like that.
- The root of this problem, in many cases, is lazy use of Reflinks. Reflinks could be programmed to be more clever about sites that put bad data in author fields. I don't know if Dispenser is interested in fixing this problem with Reflinks. People who fix citations could provide a list of the most common web sites that have this bad data, like latimes.com and some web sites in India. – Jonesey95 (talk) 01:50, 20 June 2014 (UTC)
- Probably need to start with picking up some and then expanding as other ones are found. I was thinking of something like " BST nnnn", " EST nnnn", " GMT nnnn" as a starting point. Keith D (talk) 10:43, 20 June 2014 (UTC)
- I suggest also looking for the name of the publication at the end of the
|title=
parameter - usually separated by an HTML entity for a hyphen, dash or suchlike. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:02, 20 June 2014 (UTC)- This looks like a great task for someone with AWB skills who can search the whole database of articles periodically for questionable patterns. A bot could even be set up to create a page that updated a page periodically. If we find certain patterns that never produce false positives, those patterns could be added to the CS1 module and used to create an error category. – Jonesey95 (talk) 16:02, 20 June 2014 (UTC)
- @Keith D and Jonesey95: See my BattyBot 24 task for author fixes.
- @Pigsonthewing: - User:Ohconfucius/script/Sources removes some publications from the end of the
|title=
parameter. GoingBatty (talk) 16:47, 20 June 2014 (UTC) - All - it appears that Reflinks might go away at the end of the month - see Wikipedia:Village pump (technical)/Archive 127#Migrating Reflinks, Dab solver, and User:Dispenser's other tools to Tool Labs. GoingBatty (talk) 16:47, 20 June 2014 (UTC)
- @GoingBatty: thanks - looking at status page it looks as though task 24 has not been run - could it be run? Keith D (talk) 18:09, 20 June 2014 (UTC)
- @Keith D: I added the last run column this month, because I realized that it's been a while since I've run some of the bot tasks. I last ran that task in March, so I'll run it again soon. Thanks! GoingBatty (talk) 22:42, 20 June 2014 (UTC)
- @Keith D: I would like to run task 24 to fix the authors with task 31 in early July once the Toolserver is taken down - see User:Dispenser/Toolserver migration. If you can think of any patterns for author fixes you'd like me to include, please let me know. Thanks! GoingBatty (talk) 23:11, 22 June 2014 (UTC)
- You could try the patterns " BST nnnn", " EST nnnn", " GMT nnnn" that I indicated above, but you would need to check that the date info was in the
|date=
field before removing. Keith D (talk) 23:18, 22 June 2014 (UTC)- I specifically mentioned in my RFBA that I wouldn't be removing dates from the author field, since it's beyond my bot creation ability to ensure that the same date is also in the
|date=
field. GoingBatty (talk) 02:22, 23 June 2014 (UTC)- I did start the thread by suggesting a tracking category for the field so that they could be tackled manually if a BOT cannot do this. Keith D (talk) 12:46, 23 June 2014 (UTC)
- @Keith D: My bot completed its run. Once Reflinks gets restabilized, I would like to ask Dispenser if there is a possibility to update Reflinks so incorrect author parameters don't get added in the first place. GoingBatty (talk) 04:43, 10 July 2014 (UTC)
- Thanks for the BOT run. As reflinks looks as though it is staying then it needs to be updated in several ways not just the author parameter. Options for date format would also be useful so that we do not have to run round after users setting the dates to the appropriate format. Using a publisher name rather than a web site would be another. Keith D (talk) 12:30, 10 July 2014 (UTC)
- @Keith D: My bot completed its run. Once Reflinks gets restabilized, I would like to ask Dispenser if there is a possibility to update Reflinks so incorrect author parameters don't get added in the first place. GoingBatty (talk) 04:43, 10 July 2014 (UTC)
- I did start the thread by suggesting a tracking category for the field so that they could be tackled manually if a BOT cannot do this. Keith D (talk) 12:46, 23 June 2014 (UTC)
- I specifically mentioned in my RFBA that I wouldn't be removing dates from the author field, since it's beyond my bot creation ability to ensure that the same date is also in the
- You could try the patterns " BST nnnn", " EST nnnn", " GMT nnnn" that I indicated above, but you would need to check that the date info was in the
- @Keith D: I would like to run task 24 to fix the authors with task 31 in early July once the Toolserver is taken down - see User:Dispenser/Toolserver migration. If you can think of any patterns for author fixes you'd like me to include, please let me know. Thanks! GoingBatty (talk) 23:11, 22 June 2014 (UTC)
- @Keith D: I added the last run column this month, because I realized that it's been a while since I've run some of the bot tasks. I last ran that task in March, so I'll run it again soon. Thanks! GoingBatty (talk) 22:42, 20 June 2014 (UTC)
- @GoingBatty: thanks - looking at status page it looks as though task 24 has not been run - could it be run? Keith D (talk) 18:09, 20 June 2014 (UTC)
- This looks like a great task for someone with AWB skills who can search the whole database of articles periodically for questionable patterns. A bot could even be set up to create a page that updated a page periodically. If we find certain patterns that never produce false positives, those patterns could be added to the CS1 module and used to create an error category. – Jonesey95 (talk) 16:02, 20 June 2014 (UTC)
doix
This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
doix
is used in {{Cite doi/preload}} to prefill a version of the DOI that the citation bot fixes up; it then changes doix
to doi
. doix
was added to the whitelist but subsequently removed in an update from the sandbox. I think that was unintentional. Would an administrator please add doix
back into Module:Citation/CS1/Whitelist, maybe with a comment to keep it from disappearing again? Thank you.
– Minh Nguyễn (talk, contribs) 21:55, 6 July 2014 (UTC)
- Not clear to me why
|doix=
should be included in Module:Citation/CS1/Whitelist. I think that including parameters that have nothing to do with CS1 blurs a very distinct line that should not be blurred. Without some discussion that shows that|doix=
must be included, I'm not ready to accommodate this request.
- —Trappist the monk (talk) 22:56, 6 July 2014 (UTC)
|doix=
is an unsupported parameter used temporarily by Citation Bot, I believe. The bot removes it during creation of cite doi templates. The bot is currently blocked, which is why you are seeing|doix=
. It should never be seen by a human editor under normal circumstances. – Jonesey95 (talk) 23:12, 6 July 2014 (UTC)
- Thanks for your feedback. Another user asked me why their Cite doi subpages were showing a CS1 error. In the meantime, they had "corrected" the error by changing
doix
todoi
, but without decoding the.2F
escape, the link was broken. I went ahead and reimplemented the conversion fromdoix
todoi
in Module:Cite doi. Let me know if you see any bugs. – Minh Nguyễn (talk, contribs) 23:17, 6 July 2014 (UTC)
- Thanks for your feedback. Another user asked me why their Cite doi subpages were showing a CS1 error. In the meantime, they had "corrected" the error by changing
Wrong capitalization in check for season names
The module apparently checks that season names are spelled Spring, Summer, Fall (or Autumn), or Winter. But season names are not capitalized (look them up in, for example, American Heritage Dictionary 3rd. ed.) Jc3s5h (talk) 05:29, 26 August 2014 (UTC)
- @Jc3s5h: Prior discussions about seasons include:
- Module talk:Citation/CS1/Archive 8#Anchors from dates with seasons, with examples of capitalized seasons
- Module talk:Citation/CS1/Archive 10#Discussion, which says "seasons are treated as if they were months so must be capitalized"
- Help talk:Citation Style 1/Archive 4#Month / season range order validation, with examples of capitalized seasons
- On the other hand, the documentation at Help:Citation Style 1#Dates shows uncapitalized seasons.
- Since CS1 is designed to conform with a subset of WP:DATESNO and WP:SEASON states "Seasons are uncapitalized", I suggest that either the CS1 module be changed to follow the MOS or CS1 documentation is added in various places to state why CS1 citations use capitalized months. Thanks! GoingBatty (talk) 13:55, 26 August 2014 (UTC)
- There are a lot of words that are capitalized in citations that are not capitalized in running text or in the dictionary: words in article titles, words in journal or book titles, and more. Season names are the same. They would look ridiculous in lower case in a
|date=
or|issue=
field.
- There are a lot of words that are capitalized in citations that are not capitalized in running text or in the dictionary: words in article titles, words in journal or book titles, and more. Season names are the same. They would look ridiculous in lower case in a
- Are you really suggesting that this would be a reasonable citation format?
- P. E. Dant (summer 2002). "seasons, months, and days: what to do?". journal of capitalization. 4 (2).
{{cite journal}}
: Check date values in:|date=
(help)
- P. E. Dant (summer 2002). "seasons, months, and days: what to do?". journal of capitalization. 4 (2).
- I have posted a question at the MOS:DATE talk page. – Jonesey95 (talk) 13:52, 26 August 2014 (UTC)
- The Chicago Manual of Style, 16th ed., section 14.180 "Journal Volume, Issue, and Date: Seasons, though not capitalized in running text (see 8.87), are capitalized in source citations." -- Gadget850 talk 23:08, 28 August 2014 (UTC)
- I have posted a question at the MOS:DATE talk page. – Jonesey95 (talk) 13:52, 26 August 2014 (UTC)
Suggestion; if url="null" and PMID is valid then set URL to PMID-url
In the list of identifiers, PMID is one result that usually (if not always) results in a free full text of the citation. I'm not a programmer so I'm suggesting this in the form of a logical sentence: if 'url'="null" and PMID is valid (not flag "bad pmid") then set 'URL' can be set to PMID-url. It looks like this could be built into function 'buildlist'. ~Technophant (talk) 14:04, 28 August 2014 (UTC)
- Free full text is probably the exception rather than the rule for citations that are indexed in PubMed. If a
|pmid=
parameter is present, then one link is already there. I don't see a particular advantage of creating a redundant link from the citation's title. Boghog (talk) 14:18, 28 August 2014 (UTC)
- I bring this up because I tried to add a full-free text url that had a PMID link already and was reverted here. When I tried to complain about it I was told that it was a perfectly appropriate revert. I thought of this as an alternative. Google Scholar automatically gives links to full-free texts when available. I'm not sure if there's any other way for those to found and sorted than manually however. ~Technophant (talk) 14:35, 28 August 2014 (UTC)
- Ah, in this case, the
|pmc=
parameter already does exactly what you want to do. PubMed Central by definition does contain the free full text. In addition, if|pmc=
is present and|url=
is empty, then the citation's title is automatically linked. Boghog (talk) 14:45, 28 August 2014 (UTC)
- Ah, in this case, the
- This then seems to just be my own confusion between PMC and PMID. There's a section of code prefaced with "--Account for the oddity that is {{cite journal}} with |pmc= set and |url= not set" that seems to do exactly what I was wishing for already in place. ~Technophant (talk) 15:42, 28 August 2014 (UTC)
- Yes we have pmc= when free full text is available. Doc James (talk · contribs · email) (if I write on your page reply on mine) 21:37, 28 August 2014 (UTC)
- I think the title of the article should only be linked to a free full text version (unless subscription is set to yes, but then only to full text version). Linking the title to the PMID entry does not provide full text access. If no url is set and a the PMC parameter is set the template automatically links the title. PMC is free full text. Both the PMID and DOI parameters link their entries and I think that is the appropriate place for those links not the title. - - MrBill3 (talk) 03:18, 29 August 2014 (UTC)
- A consensus was developed here that if
|pmc=
is present, then the title should be linked because pmc provides full free text. Boghog (talk) 06:17, 29 August 2014 (UTC)
- A consensus was developed here that if
- I think the title of the article should only be linked to a free full text version (unless subscription is set to yes, but then only to full text version). Linking the title to the PMID entry does not provide full text access. If no url is set and a the PMC parameter is set the template automatically links the title. PMC is free full text. Both the PMID and DOI parameters link their entries and I think that is the appropriate place for those links not the title. - - MrBill3 (talk) 03:18, 29 August 2014 (UTC)
- Yes we have pmc= when free full text is available. Doc James (talk · contribs · email) (if I write on your page reply on mine) 21:37, 28 August 2014 (UTC)
- This then seems to just be my own confusion between PMC and PMID. There's a section of code prefaced with "--Account for the oddity that is {{cite journal}} with |pmc= set and |url= not set" that seems to do exactly what I was wishing for already in place. ~Technophant (talk) 15:42, 28 August 2014 (UTC)
URL in title check
Could some form of check be made on title fields that include a URL? The URL is usually at the end of the title after something like "Read more". May be these could be categorised and a BOT set-up to remove this clutter. Thanks. Keith D (talk) 01:36, 8 September 2014 (UTC)
- I don't think that I've seen what it is that I think you're describing. Is it common? Can you show some examples?
- —Trappist the monk (talk) 13:33, 8 September 2014 (UTC)
- On some newspaper websites, if you drag the mouse over text like the headline, copy to clipboard, and then paste into an empty
|title=
parameter, you sometimes find that you've copied more than you intended. Among the extra text is often "Read more:" and a URL. I always delete the unintended extras, but there must be people who either didn't notice, or didn't realise that it was not part of what we want in a|title=
--Redrose64 (talk) 14:09, 8 September 2014 (UTC)
- On some newspaper websites, if you drag the mouse over text like the headline, copy to clipboard, and then paste into an empty
- Yeah, I've experienced that but have not yet seen such stuff left in citations. Does it happen enough that we should attempt to detect a uri scheme in the
|title=
parameter's value?
- Yeah, I've experienced that but have not yet seen such stuff left in citations. Does it happen enough that we should attempt to detect a uri scheme in the
- —Trappist the monk (talk) 15:24, 8 September 2014 (UTC)
- This sort of junk often shows up in the CS1 unnamed (or maybe the unsupported) parameter category, which I patrol every couple of days. Here are some examples: [2] [3]. Titles of articles from the Daily Mail often have this junk, although I don't see one in my edit history right now. – Jonesey95 (talk) 18:23, 8 September 2014 (UTC)
- Probably over 300 at a guess, I have cleared about 30 in the last few days, example. Keith D (talk) 21:31, 8 September 2014 (UTC)
- This sort of junk often shows up in the CS1 unnamed (or maybe the unsupported) parameter category, which I patrol every couple of days. Here are some examples: [2] [3]. Titles of articles from the Daily Mail often have this junk, although I don't see one in my edit history right now. – Jonesey95 (talk) 18:23, 8 September 2014 (UTC)
- —Trappist the monk (talk) 15:24, 8 September 2014 (UTC)
Changing the Zbl URL
Would it be possible to change, at /Configuration, the |zbl=
URL from http://www.zentralblatt-math.org to https://zbmath.org? E.g. for me, http://www.zentralblatt-math.org/zmath/en/search/?format=complete&q=an:1177.37036 redirects to https://zbmath.org/?format=complete&q=an:1177.37036. It Is Me Here t / c 11:46, 4 September 2014 (UTC)
- The HTTP URL also appears under "JFM". It Is Me Here t / c 12:18, 9 September 2014 (UTC)
- Have you discussed this change with the Wikipedia communities that are most likely users of these identifiers? If I make this change and something breaks it will be my fault.
- —Trappist the monk (talk) 12:34, 9 September 2014 (UTC)
- Also {{zbl}} and {{citation/identifier}} which is still in use. -- Gadget850 talk 13:10, 9 September 2014 (UTC)
- {{JFM}} and the CS1
|jfm=
parameter also redirect. -- Gadget850 talk 13:51, 9 September 2014 (UTC)
- {{JFM}} and the CS1
- Also {{zbl}} and {{citation/identifier}} which is still in use. -- Gadget850 talk 13:10, 9 September 2014 (UTC)
- —Trappist the monk (talk) 12:34, 9 September 2014 (UTC)
- I've now notified WP:WPMATH. It Is Me Here t / c 16:04, 9 September 2014 (UTC)
- Seems correct: I land on URL https://zbmath.org/ too. Deltahedron (talk) 21:28, 9 September 2014 (UTC)
- I've now notified WP:WPMATH. It Is Me Here t / c 16:04, 9 September 2014 (UTC)
Ok. Done in the sandbox.
- Title. JFM 1177.37036. Zbl 1177.37036.
{{cite book}}
: Check|jfm=
value (help)
—Trappist the monk (talk) 12:37, 13 September 2014 (UTC)
Support Format ISO 8601 dates
At Norwegian Bokmål Wikipedia we have added support for formatting ISO 8601 dates (YYYY-MM-DD) and recommend using them. If more wikipedias could support ISO 8601, it would be easier to copy citations from one wikipedia to another. It could also make life easier for bots that need to insert dates, or scripts relying on Citoid. Are there any objections to adding ISO 8601 date support on enwiki? Here's an example of how it could be done (with testcases), inspired by our implementation at nowiki. One possible problem is that any fixed output format like 'j M Y' might not confirm to WP:MOS#Choice_of_format. – Danmichaelo (talk) 16:58, 14 September 2014 (UTC)
- I suspect that this will not be accepted at enwiki. There was a similar sort of thing in the old Wiki markup /
{{citation/core}}
days where the templates automatically converted from year-initial numeric dates to the user's preferred format. For a while. That functionality was removed.
- —Trappist the monk (talk) 17:37, 14 September 2014 (UTC)
- The pertinent guideline is MOS:DATEUNIFY. -- Gadget850 talk 19:19, 14 September 2014 (UTC)
- Ok, thanks for clarifying. – Danmichaelo (talk) 20:26, 14 September 2014 (UTC)
- The pertinent guideline is MOS:DATEUNIFY. -- Gadget850 talk 19:19, 14 September 2014 (UTC)
- ISO 8601 does not allow the use of the Julian calendar, so any publication that bears a Julian calendar publication date could not be cited. On the surface Danmichaelo's post hints that the Norwegian Wikipedia suffers from Wikipedia:Recentism. Jc3s5h (talk) 22:46, 14 September 2014 (UTC)
PDFlink
I see from the template that {{PDFlink}} is in the process of being merged into CS1. The TfD discussion noted that wikipedia's CSS adds the PDF icon where an external link indicates the document is a pdf, however not all links will do this. Some (e.g. this one used on Ford Island) generate a pdf and force the browser to download it. Even for sites which don't force a download, sometimes the content type is displayed in the header, not the url. For resources like that we should have some way of indicating to the user that the resource is a PDF. How do we do that when the merger is complete with CS1? Protonk (talk) 14:26, 19 September 2014 (UTC)
- The pdf icon is added when the file extension of the associated url is '.pdf. So
[http://www.example.com/example.pdf Example]
turns into this: Example – there is no such file, by the way. Your Ford Island example does not have the '.pdf' extension so Wikimedia can't know it's file format. To notify readers that the link is to a pdf file, use|format=pdf
which gives: "Historic Hawaii" (pdf).
- There has been precious little discussion about merging
{{PDFlink}}
into CS1 so don't expect anything soon.{{PDFlink}}
should not be used within CS1 templates.
- —Trappist the monk (talk) 15:06, 19 September 2014 (UTC)
- The PDF icon is not accessible, thus visually impaired readers have no idea that it exists. A recent MediaWiki update removed all of the other icons (AVI, OGG, News, etc.) from the Vector skin and a more recent update removed the HTTPS icon. The PDF icon exists only because it is locally added. -- Gadget850 talk 18:04, 19 September 2014 (UTC)
Range of seasons in "date="
CS1 doesn't seem equipped to handle a range of seasons as the "date=" parameter; see for example here. I managed to get both a single season and a range of years displayed correctly, so I don't think it's me. Huon (talk) 20:35, 20 September 2014 (UTC)
- When dates contain spaces (Winter 2014 has a space), separate the other date with space ndash space. Do not use
{{ndash}}
or–
; like this:{{cite book |title=Title |date=Winter 2013 – Spring 2014}}
→ Title. Winter 2013 – Spring 2014.
- —Trappist the monk (talk) 20:49, 20 September 2014 (UTC)
- And is documented in the help page that was linked in the error message. You have the full error message display enabled, so you are seeing the error. -- Gadget850 talk 21:04, 20 September 2014 (UTC)
- Not so well documented, I think. For example, it actually is ok to use
{{ndash}}
, it's{{spaced ndash}}
that shouldn't be used. I'll tweak the help a bit.
- Not so well documented, I think. For example, it actually is ok to use
How to format the range of winter crossing two years?
How should Winter 2005 / 2006 be formatted to not get cs1 errors? I cannot get it to work in the Susanna Paine article: Michael R. and Suzanne R. Payne (Winter 2005 / 2006). "Roses and Thorns: The Life of Susanna Paine". Folk Art. p. 63. Check date values in: |date= (help)
Thanks!--CaroleHenson (talk) 13:28, 22 September 2014 (UTC)
|date=Winter 2005–2006
- Thanks!--CaroleHenson (talk) 16:33, 22 September 2014 (UTC)
rtl language support in CS1 titles
I have split this off from Module talk:Citation/CS1/Archive 11#non-italic titles. It is related but I want to focus this discussion on right-to-left language support. |script-title=
is a new parameter in the sandbox version of Module:Citation/CS1 Its purpose is to hold citation title text that must not be italicized in the final rendered citation. It is concatenated with the value in |title=
which is to be italicized. See non-italic titles for the discussion that led to the creation of |script-title=
. At the time of this writing, |script-title=
is just hacked into the CS1 sandbox as a proof of concept. Its full use and purpose is not clearly defined. I am seeking a better name for this parameter. Ideas for a better name and for use definition and restrictions welcome.
The post that I split from non-italic titles begins here:
This citation comes from a discussion now in WP:VPT archive 129. I changed it from {{cite web}}
to {{cite book}}
so that |script-title=
would apply because {{cite web}}
doesn't italicize |title=
). You'll notice that title and translated title are malformed:
- Tova Green (6 May 2010). 12 ימים (in Hebrew). Maybe So. Retrieved 15 May 2010.
{{cite book}}
: Unknown parameter|trans_title=
ignored (|trans-title=
suggested) (help)
I added code to Module:Citation/CS1/sandbox to wrap |script-title=
in <bdi>...</bdi>
tags. The sandbox version of the citation renders correctly:
{{cite book/new |author=Tova Green |date=6 May 2010 |script-title=12 ימים|language=he |trans_title=13 days |publisher=Maybe So |url=http://MaybeSo.com/12days.html |accessdate=15 May 2010}}
I'm beginning to think that values assigned to |title=
and|script-title=
(or whatever it finally becomes) should be wrapped in <bdi>...</bdi>
tags – the value assigned to |trans-title=
is always supposed to be English so wrapping it seems unnecessary.
—Trappist the monk (talk) 14:46, 13 September 2014 (UTC)
- In Module:Citation/CS1/sandbox I have replaced
|logogram=
with|script-title=
. All instances of|logogram=
in the above post have also been replaced.
- One problem is, it seems that IE and Safari don't support <bdi>. I wonder if it's needed. I visited the arabic C++ page and it looked the same in both Chrome and IE. LTR "C++" was handled properly in both, and I didn't see any bdi tags in the HTML or anything that looked like wrappers around "C++". Is is possible that the browser would just do the right thing with arabic text in the title field? At WC3, this page goes into some of the issues in wrapping elements in spans to bulletproof the code. It looks really hairy.--Margin1522 (talk) 02:54, 20 September 2014 (UTC)
- According to this page, only IE doesn't support
<bdi>...</bdi>
. We're in the position of not really knowing what direction the value assigned to|script-title=
might use. We could wrap|script-title=
in<span dir=auto>...</span>
but, yet again, IE doesn't supportdir=auto
. That, I think, leaves us with two options: do nothing and wait for IE to join the 21st century, or wrap|script-title=
in<bdi>...</bdi>
. In the former case, the problem that led to this discussion is a problem for everyone; in the latter, its only a problem for those who use IE. I think that we should use<bdi>...</bdi>
because eventually, I presume, IE will get its act together. - —Trappist the monk (talk) 13:50, 20 September 2014 (UTC)
- Er, I followed your link, and it doesn't say that "only IE doesn't support
<bdi>...</bdi>
"; it says "Internet Explorer, Safari and Opera do not support bdi." --Redrose64 (talk) 15:28, 20 September 2014 (UTC)
- Er, I followed your link, and it doesn't say that "only IE doesn't support
- According to this page, only IE doesn't support
- One problem is, it seems that IE and Safari don't support <bdi>. I wonder if it's needed. I visited the arabic C++ page and it looked the same in both Chrome and IE. LTR "C++" was handled properly in both, and I didn't see any bdi tags in the HTML or anything that looked like wrappers around "C++". Is is possible that the browser would just do the right thing with arabic text in the title field? At WC3, this page goes into some of the issues in wrapping elements in spans to bulletproof the code. It looks really hairy.--Margin1522 (talk) 02:54, 20 September 2014 (UTC)
- Allow me to clarify what I think the that page says. I think that the first listed basic test, 'bdi has dir=auto by default' is the result that matters to us. Because the default
dir=auto
is how we will be using<bdi>...</bdi>
, then it would seem that the basic test results indicate that all but IE 'support'<bdi>...</bdi>
.
- Allow me to clarify what I think the that page says. I think that the first listed basic test, 'bdi has dir=auto by default' is the result that matters to us. Because the default
- I have current versions of Chrome and Opera. Both of them render the Hebrew portion of this citation's title correctly:
- (where I understand that correctly is four Hebrew characters, a space, followed by '12' – reading left to right)
- Anyone out there have current versions of Firefox, Safari, and IE? Run the test at the link above; for example click the link 'bdi has dir=auto by default'.
- After looking into it some more, it seems that every browser implements the Unicode birectional algorithm, which normally works well. But it has trouble when neutral characters or characters with weak directionality (numbers) occur at the boundaries between opposite-direction runs. "12 ימים " might to be a special case of this. This thread about XeTeX and the bidi algorithm has an example of an arabic page about Lionel Messi, where the citations at the bottom of the page are messed up, even on Wikipedia. That looks like what we are trying to prevent. Wrapping the title in
<bdi>...</bdi>
would probably fix it. The title string would get the direction of its first strongly directional character and wouldn't be affected by neighboring strings. - The reason I still feel kind of reluctant to do it is,
<bdi>...</bdi>
is 11 bytes. With a lot of cites, that could add a considerable amount of data to everyone's download, on every page. Would it work to look at the first byte in the title string and wrap the title in<bdi>...</bdi>
if it's a number or another weak character? --Margin1522 (talk) 22:30, 20 September 2014 (UTC)
- After looking into it some more, it seems that every browser implements the Unicode birectional algorithm, which normally works well. But it has trouble when neutral characters or characters with weak directionality (numbers) occur at the boundaries between opposite-direction runs. "12 ימים " might to be a special case of this. This thread about XeTeX and the bidi algorithm has an example of an arabic page about Lionel Messi, where the citations at the bottom of the page are messed up, even on Wikipedia. That looks like what we are trying to prevent. Wrapping the title in
- The value provided by
|script-title=
will be wrapped with<bdi>...</bdi>
to isolate whatever directionality it has from the ltr direction in|title=
.<bdi>...</bdi>
will only be applied when a citation contains a non-empty|script-title=
so this imposes relatively little download penalty. This is how the sandbox is working now. - —Trappist the monk (talk) 13:08, 21 September 2014 (UTC)
- Oh, I see. Of course. In that case, maybe we should wrap every field that might contain RTL script. E.g., I can imagine a browser mishandling an author name next to "(2010)" or a journal name next to the issue number. There would have to be some way of invoking this handling only when it's needed. --Margin1522 (talk) 21:08, 21 September 2014 (UTC)
- The value provided by
- Yep, an rtl author name followed by date will get buggered-up. For now, I think that we should limit this 'feature' to
|script-title=
so that we can get some experience with it. Then let us decide how to proceed. It may be that we should consider renaming|script-title=
to|rtl-title=
because the original reason for the parameter's existence (italics and Asian scripts) is probably not the issue it was now that wikimarkup can be used to undo default title italics without corrupting the metadata. - —Trappist the monk (talk) 22:02, 21 September 2014 (UTC)
- Yep, an rtl author name followed by date will get buggered-up. For now, I think that we should limit this 'feature' to
- The rtl text could go in the title field, for that matter, as long as we knew that it had to be wrapped. How about parameters? Say
{{cite web|rtl=y|...
or{{cite web|cjk=Yes|...
. This would be an easy-to-understand fix for editors who enter the cite with the regular menu UI and then notice that the output looks funny. They wouldn't have to edit the|title=
parameter name or escape the kanji. Just add the parameter. In a future UI, they could be check boxes instead of separate entry fields. --Margin1522 (talk) 09:14, 22 September 2014 (UTC)
- The rtl text could go in the title field, for that matter, as long as we knew that it had to be wrapped. How about parameters? Say
- If an editor includes a transcription with the script title in
|title=
,<bdi>...</bdi>
isn't clever enough to figure out which part is which. To mimic what you suggest, I have changed our current favorite citation. Here I put both the translation and the rtl script in|script-title=
so that it contains both ltr and rtl. The whole is wrapped with<bdi>...</bdi>
. The title should be: 13 days ימים 12 but instead we get this:- Tova Green (6 May 2010). 13 days 12 ימים (in Hebrew). Maybe So. Retrieved 15 May 2010.
{{cite book}}
: Invalid|script-title=
: missing prefix (help)
- Tova Green (6 May 2010). 13 days 12 ימים (in Hebrew). Maybe So. Retrieved 15 May 2010.
- This is because the first strong character is ltr. We also lost the proper italic formatting of the transliterated portion of the title.
- If an editor includes a transcription with the script title in
- We would necessarily have to have several
|rtl=
and|cjk=
parameters; one each for every standard parameter that might contain rtl or cjk script (and it isn't just cjk scripts that should be rendered in upright font style). Wikimarkup is something that editors are familiar with and understand so allowing them to undo default styling that way seems best. If we have to have another parameter to indicate that this 'thing' is rtl, why not just assign that parameter the value of 'thing'?|script-title=
,|script-chapter=
,|script-author=
, etc.
- We would necessarily have to have several
- I have been wondering if we might add a prefix to a script that indicates its language. For example:
|script-title=he:ימים 12
. This would then let us wrap the script with<bdi lang="he">...</bdi>
. This prefix would not be required and could be escaped in the unlikely event that the first three characters in a|script-title=
value make a legitimate ISO639-1 code followed by a colon. If the prefix is used, we can check it for validity and emit appropriate error messages.
- I have been wondering if we might add a prefix to a script that indicates its language. For example:
- I see. The apostrophes part is OK, although it is kind of counterintuitive that double apostrophes selects not italic. I guess we would need parameters for every field, because sometimes a title would be in Japanese while the jounal name could be either in English or Japanese. No way to tell without a parameter or parsing the string.
- About rtl, you're right. It wouldn't work with transcriptions + title in the title field. That might be an argument for a separate "transcription" field. Which I would say would be more intuitive for cjk editors as well. Compared to putting the transcription in the title field, and the title in another field. (Either way, having them separate would solve this problem and allow us to enforce the right order.)
- I'm trying to think how this could be as simple as possible for editors, and as close as possible to what they are used to. I just added a cite with Japanese in it, cite 2 in Districts of Japan. I managed to get the title part close to the format we've been discussing, but it overall my impression was, a pain in the neck. How to add the kanji for author's name, which I wanted to do but didn't because of the Last, first name order.
- The language codes might be a good idea. It might help the browser choose the right fonts, which it might not be able to do when code points are shared across languages. --Margin1522 (talk) 15:25, 22 September 2014 (UTC)
- I think that editors are trainable. If they once learn that double apostrophes change the style from upright to italic, it isn't a big leap to the realization that double apostrophes change the style from italic to upright.
- Surely not every parameter needs a script option. The biggies,
|title=
,|chapter=
,|work=
and maybe|author=
. This is en.wikipedia. There probably needs to be more thought given to what parameters after|title=
get this option. Just you and me in this ghost-town of a talk page is not enough.
- Surely not every parameter needs a script option. The biggies,
- We already have a
|transcript=
parameter that is used as the title of an external link to the transcript of something (interview, speech, etc). We can certainly alias|title=
with another parameter, perhaps|xscr-title=
or some-such, if that is required.
- We already have a
- Perhaps, instead of
{{cite book|last1=Kinoshita|first1=Masahi ...
, write{{cite book|author=Kinoshita Masahi <kanji> ...
But, that might be problematic because: What will the metadata contain? That's why I suggested|script-author=
above.|author=
is part of the metadata but should|script-author=
or for that matter any|script-whatever=
parameter values be part of the metadata? But you want to use|author=
anyway, don't you? Aren't Japanese names properly surname first without the comma?
- Perhaps, instead of
I've added code that accepts an ISO639-1 prefix (no validity testing yet, spaces aren't allowed) but the language code is added to <bdi>...</bdi>
. Our favorite citation, stripped of just about everthing for clarity:
{{cite book/new |script-title=he:12 ימים |trans_title=13 days}}
produces this:
Here is what the html looks like:
'"`UNIQ--templatestyles-00000092-QINU`"'<cite class="citation book cs1 cs1-prop-script"> <bdi lang="he" >12 ימים</bdi>.</cite><span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=12+%D7%99%D7%9E%D7%99%D7%9D&rfr_id=info%3Asid%2Fen.wiki.x.io%3AModule+talk%3ACitation%2FCS1%2FArchive+10" class="Z3988"></span> <span class="cs1-visible-error citation-comment"><code class="cs1-code">{{[[Template:cite book|cite book]]}}</code>: </span><span class="cs1-visible-error citation-comment">Unknown parameter <code class="cs1-code">|trans_title=</code> ignored (<code class="cs1-code">|trans-title=</code> suggested) ([[Help:CS1 errors#parameter_ignored_suggest|help]])</span>
Because this citation doesn't use I guess that this argues for inclusion of the |title=
for a transcription, there isn't any title information in the metadata.|script-title=
. Do we always include it? Or, do we only include it when |title=
is missing or empty?
—Trappist the monk (talk) 00:12, 23 September 2014 (UTC)
- Yes, I agree, adding parameters for every field that might need special handling would also make the documentation longer and more complicated, when the vast majority of editors don't need it. I was wondering whether there should be an entire family of templates (or template documentation) for international templates. People who don't need it could be blissfully unaware that anything has changed, and people who need it can read the new documentation.
- Anyway, we seem to have reached a consensus about what we want it to look like (like Chicago). The next step should be to take it to other forums? To MOS-JP to see what they want, and MOS to discuss whether it's a good idea to standardize transcription/script cite formats across languages, and whether to encourage transcriptions. I think it's a good idea, and yes. Others may have different ideas, suggestions for the names, syntax, etc. MOS is pretty active. At that time, I think we should make it clear that the bdi issues really need to be fixed. The CJK changes would be mainly cosmetic, but bdi seems urgent. And if we are going to fix it, we might as well decide the format now.
- (BTW I changed that cite to use author= and the kanij. It really helps to have the name if you're trying to look up the book. About the order, in Japanese it's always family name first. In English, let's just say it varies. The MOS explanation is at WP:JATITLE. The language syntax looks like a possibility that could be suggested.) --Margin1522 (talk) 06:34, 23 September 2014 (UTC)
- A week or more ago I mentioned these discussions at:
- We didn't get a flood of opinion. I suspect that the dearth of comment has a lot to do with the topic's somewhat technical nature. This may be a case where we just go ahead. As long as we don't break anything we'll be fine.
- I occurs to me that when editors include a non-Latin title in
|title=
, the non-Latin title is included in the metadata. So, lacking direction to the contrary, I'll make sure that the content of|script-title=
is also part of the metadata.
- I occurs to me that when editors include a non-Latin title in
- As an aside, shouldn't there be a space in 木下正史, per WP:JATITLE?
- Metadata now contains either or both of
|title=
and|script-title=
.
- Metadata now contains either or both of
- Language prefix now tested to make sure it is a valid ISO639-1 language code. If it isn't valid, we don't include the
lang
attribute and leave the prefix with the script.
- Language prefix now tested to make sure it is a valid ISO639-1 language code. If it isn't valid, we don't include the
- I'm having a little trouble following this because until yesterday I knew nothing about COinS. I did find this paper, which gives an example of Dublin Core encoding for titles in multiple languages (section 6.3). They just repeat the encoding of <DC.Title LANGUAGE="x">once for each language, where "x" is an ISO code. Would that work? Apparently it works if everything gets displayed to a metadata client. That way in the metadata the script title, romanization, and translation could all get encoded as separate versions of the title, each with a language code. I've always thought of romanization as more of a pronunciation guide, but it is a title, in a sense. (Also thanks for the space in the name. Fixed.)--Margin1522 (talk) 13:52, 23 September 2014 (UTC)
- As far as I know, COinS doesn't support any differentiation by language; there isn't a keyword for language. So, what works for Dublin Core, doesn't work for COinS.
- I don't know how we came to decide on COinS for extracting metadata from CS1 citations. It is what we have, so for now we must live with its limitations.
I have adjusted the code so that |script-title=
is applied to all CS1 templates (heretofore only those that italicized |title=
. Here is a simplified {{cite journal}}
:
- "Transcription title" 12 ימים. Journal.
{{cite journal}}
: Unknown parameter|trans_title=
ignored (|trans-title=
suggested) (help)
—Trappist the monk (talk) 15:02, 24 September 2014 (UTC)
- I gather that the metadata is used by clients like Zotero, e.g. by college students who can grab a cite and check whether the library has a copy. Zotero apparently doesn't support multiple languages. There is a Multilingual Zotero which does. But it seems this uses the Citation Style Language, which is a whole different standard and xml-based, so I guess that's out. --Margin1522 (talk) 16:04, 24 September 2014 (UTC)
For this citation:
- {{cite book/new |author=Tova Green |date=6 May 2010 |script-title=12 ימים|language=he |trans_title=13 days |publisher=Maybe So |url=http://MaybeSo.com/12days.html |accessdate=15 May 2010}}
With Firefox 33 and IE 11, I see 12 space followed by the Hebrew then [13 days]]. -- Gadget850 talk 23:21, 20 September 2014 (UTC)
- As I understand it, that rendering isn't correct. But, it is more correct than the rendering that both Chrome and Opera make if we don't put the Hebrew text in
|script-title=
and wrap the script in<bdi>...</bdi>
:- 12 13] xxxx days] – here I replaced the Hebrew characters with xxxx to make my life easier
- So, even if the rendering isn't perfect, it is better than that mishmash. I suspect that the various browsers will converge on correct solutions eventually. Wrapping the value provided in
|script-title=
with<bdi>...</bdi>
doesn't break anything and appears to provide mostly-correct rendering across browsers. This, I think, argues for its inclusion in the module.
- As I understand it, that rendering isn't correct. But, it is more correct than the rendering that both Chrome and Opera make if we don't put the Hebrew text in
- —Trappist the monk (talk) 13:08, 21 September 2014 (UTC)
- There may be some other CSS for this:
unicode-bidi: embed; unicode-bidi: -webkit-isolate; unicode-bidi: -moz-isolate; unicode-bidi: -ms-isolate; unicode-bidi: isolate;
- -- Gadget850 talk 02:29, 23 September 2014 (UTC)
- There may be some other CSS for this:
- —Trappist the monk (talk) 13:08, 21 September 2014 (UTC)
- If I understand these correctly,
unicode-bidi:
works withdirection:
which can have the valuesltr
,rtl
, orinherit
. That means that we would need to know beforehand the direction required by the content of|script-title=
. Therein lies the value of<bdi>...</bdi>
.
- If I understand these correctly,