Wikipedia:Bots/Requests for approval/Chartbot 3
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Kww (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 06:23, Tuesday March 19, 2013 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): PHP
Source code available:
Function overview: Correct links to Billboard stories and articles
Links to relevant discussions (where appropriate):
Edit period(s): one time run
Estimated number of pages affected: 5000
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: This is a continuation of Chartbot's mission to get our links to Billboard repaired. This bot will require interaction with Billboard's existing redirect system, which will remain in place until May.
We have links that have not worked in years, using the format http://www.billboard.com/bbcom/esearch/article_display.jsp?vnu_content_id=<article id>
. This URL format was dropped by Billboard in 2008. Unfortunately, we have several thousand links that use this format. We have even more links that use the format that began use in 2008, http://www.billboard.com/news/<title>-<article id>.story
. The article IDs held constant from the 1990s through January 2013.
Billboard currently has a redirect system in place to aid in the transition. Presented with a URL of the form http://www.billboard.com/news/<title>-<article id>.story
, it will redirect to a link of the form http://www.billboard.com/articles/news/<new article id>/<new article title>
Links of the form http://www.billboard.com/bbcom/esearch/article_display.jsp?vnu_content_id=<article id>
simply return a 404: witness http://www.billboard.com/bbcom/esearch/article_display.jsp?vnu_content_id=1000808321, which used to point at http://www.billboard.com/news/chart-beat-bonus-1000808321.story.
Note that Billboard's redirection system is not sensitive to the text: http://www.billboard.com/news/dummy-noise-for-url-retrieval-1000808321.story successfully redirects to http://www.billboard.com/articles/news/64057/chart-beat-bonus, even though the text is obviously not the text originally returned.
In this bot, I will search for links of the form http://www.billboard.com/bbcom/esearch/article_display.jsp?vnu_content_id=<article id>
and http://www.billboard.com/news/<title>-<article id>.story
.
If it is the former style, I will synthesize the link http://www.billboard.com/news/dummy-noise-for-url-retrieval-<article id>.story
. I will then retrieve that link from Billboard, and extract the XML field <link rel="canonical" href="... link text ... " /><
(in this case, it is <link rel="canonical" href="http://www.billboard.com/articles/news/64057/chart-beat-bonus" />
. I will validate that the link appears to point to a news story, and, if it does, replace the link.
This is closely related to existing Chartbot functions. All of the supporting scripts that created {{BillboardURLbyName}} interacted with the redirection functions at Billboard to determine where things had been placed, so this is just more of the same.
Discussion
editApproved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. MBisanz talk 11:52, 19 March 2013 (UTC)[reply]
Trial complete. Trial was held between 21 mrt 2013 16:48 and 21 mrt 2013 17:32. All the edits have a summary of "***TRIAL Chartbot function 3: Repair of article links. Contact User talk:Kww if there are problems. Edits being monitored.***"
- This edit contains article links of both kinds contained in the original bot request.
- This edit shows a link style I didn't anticipate in the bot request, but appeared to be a harmless extension. http://www.billboard.com/news/1926761.story doesn't correctly redirect, but by changing it to http://www.billboard.com/news/dummy-text-1926761.story, the redirect works, and the bot was able to locate http://www.billboard.com/articles/news/70145/beyonce-branch-albums-storm-the-chart. Note that the original webcite title parameter was "Beyonce, Branch Albums Storm The Chart", so we can be quite confident in the replacement. I've checked out several dozen proposed replacements in these cases, and all seem accurate.
Looking at the bot log, I can see that it correctly handled cases of articles that cannot be successfully found, refusing to perform the replacements in those cases. For the technically minded, the bot is smart enough to cache replacements: if it finds the same link multiple times in the same run, it stores the canonical version on the first occurrence, and doesn't query Billboard again on subsequent hits. This phase will replace 11,679 links in 5,280 articles. Today's trial replaced 352 links in 50 articles.—Kww(talk) 17:47, 21 March 2013 (UTC)[reply]
- Approved. MBisanz talk 20:42, 23 March 2013 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.