Wikipedia talk:Link rot/Archive 3
This is an archive of past discussions on Wikipedia:Link rot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 |
Internet Archive
The Internet Archive doesn't seem to have archived anything since about August 2008. What does this mean for dead links that should have been archived since then? AnemoneProjectors (talk) 14:36, 23 January 2010 (UTC)
- See Wayback_Machine#Growth_and_storage: "Snapshots become available 6 to 18 months after they are archived." -- Quiddity (talk) 20:55, 23 January 2010 (UTC)
The Wayback Machine and IA
This has been talked about over there and a huge number of websites are now tagging their sites as "do not archive" which the IA respects. Thus it no longer indexes any information and removes the site from being seen, from the time of the first archive. — Preceding unsigned comment added by 99.180.245.237 (talk) 21:20, 16 January 2013 (UTC)
- It is true that almost every news outlet now blocks indexing by Internet Archive, but that is understandable as their sites feature advertising and often a pay-for-archive-view, meaning that free access reduces their potential revenue in a market where the margins are quite thin. Some sites do still have extensive free archives, like the Arizona Daily Star which I included in a table below, which has free archives stretching back to at least 2006. --User:Ceyockey (talk to me) 00:49, 17 January 2013 (UTC)
removing a dead link?
if I fina a dead link and I don't feel like fixing it is it cool to remove it, esp if I think the claim it supported was kinda retarded anyway? --n-dimensional §кakkl€ 18:52, 26 January 2010 (UTC)
- uh... not with reasoning like that, no. 'kinda retarded' does not qualify as an objective assessment of the merits of the link, since other editors can easily say 'it aint so retarded' - an equally valid statement without any further evidence. if you're just fixing linkrot, fix the link or flag it for others; if you want to get involved with content editing (to remove 'retarded' content) go ahead and do it explicitly as an edit; don't call it a linkrot fix. --Ludwigs2 20:25, 26 January 2010 (UTC)
- Tag it with a {{deadlink}}.--Blargh29 (talk) 22:17, 26 January 2010 (UTC)
Dead link vs. linkrot
What's the difference, exactly? 85.76.80.10 (talk) 20:56, 30 January 2010 (UTC)
- There isn't one. "Linkrot" is a term used to describe the phenomenon of good links going dead over time. --ThaddeusB (talk) 04:40, 4 February 2010 (UTC)
Linkrot and sustained notability
If an article is at first supported by a series of links to establish notability, and 100% of those links go bad, does that mean that in some cases the subject of the article can be considered not notable and the article be deleted? Sebwite (talk) 16:21, 18 February 2010 (UTC)
- It shouldn't happen. Notability is forever, even if all of the links go bad. That's why it's probably a good idea to use a citation template, so that there is plenty of documentation about the former link. Also, check out WP:OFFLINE, which would apply to dead links. --Blargh29 (talk) 19:16, 18 February 2010 (UTC)
- I have seen articles get put up for deletion on the basis that all the links have gone bad, and the noms use the WP:PROVEIT argument to support their cause, while those who support keeping cannot prove it. Those favoring deletion do not buy the WP:OFFLINE argument in these cases. Sebwite (talk) 15:23, 23 February 2010 (UTC)
- Editors delete content all the time based on dead links. The Orwellian memory hole lives, and it lives here in the Wikipedia. I think it is a huge problem. --Marcwiki9 (talk) 23:18, 4 March 2010 (UTC)
Archiving British web pages
The following Wired article explores some of the problems regarding archiving British web pages: Archiving Britain's web: The legal nightmare explored These problems affect the strategy used here. Squideshi (talk) 19:23, 6 March 2010 (UTC)
- Does it though? A web archiving service acts on the laws of its resident country, not on those of the site it is archiving (as I understand the law). So archive.org and WebCite are fine, and that is their concern anyway; only if "we" (the WMF) were to set up our own archiving server in the UK would "we" be affected (as I understand it). - Jarry1250 [Humorous? Discuss.] 11:22, 7 March 2010 (UTC)
- That is not how I understand it. In fact, the article itself mentions that this is a problem for organizations like the Internet Archive, which hosts the Wayback Machine. It affects us because, as part of our strategy, we specifically recommend using tools, such as the Wayback Machine, which are affected by this law. I'm not asking for a change in the article--I just wanted to make people aware that the Wayback Machine isn't a magic bullet in the effort to help stave off linkrot. Squideshi (talk) 21:28, 8 March 2010 (UTC)
External links are not references
Just to explain my recent changes:
You should (almost) never remove this:
==References== * Long dead reference
You should cheerfully remove this:
==External links== * Calculator that you'd think was cool, except it no longer exists
It is not possible to justify a dead "External link" under the External links guideline. WhatamIdoing (talk) 18:32, 18 March 2010 (UTC)
- You are correct. However, caution should certainly be used since inexperienced users often put stuff they've used a reference under "External links". --ThaddeusB (talk) 20:30, 29 March 2010 (UTC)
- Besides this, it seems to me that an archived copy of an external link may well be a good replacement for the original (as it is for a reference), so linking to such a copy (if available) is preferable to simply removing the link. JudahH (talk) 16:30, 3 January 2012 (UTC)
linkrot vs. stability, e.g. News Corp vs. Fairfax in Australia
I've noticed that links to many articles published by News Corp in Australia are especially susceptible to linkrot, whereas links to articles in the Fairfax papers, The Age and the SMH, are quite solid. If there were enough evidence to support my statement, would WP ever have a guideline such as "Use paper X, Y, Z, if possible, instead of P, Q, as these are less susceptible to linkrot?" cojoco (talk) 20:52, 1 April 2010 (UTC)
- We do advise against using Yahoo news stories (which typically decay within weeks), so it is certainly possible. --ThaddeusB (talk) 02:01, 11 April 2010 (UTC)
- Of course, nobody reads the directions, so I wouldn't get my hopes up, but you're certainly welcome to include the advice. WhatamIdoing (talk) 03:44, 11 April 2010 (UTC)
Archiving every reference?
Is it suggested that we should archive every reference used in our articles? I see there's a WebCiteBOT, but I've never seen it in action, and certainly not on any article I've worked on. I just recently lost a very important reference and I'm still trying to work on finding a fix (contacting the editors, etc.). This was a great lesson to me about link rot, but now I'm wondering if I'm supposed to archive every reference I use? – Kerαunoςcopia◁galaxies 20:48, 9 June 2010 (UTC)
- Quite simply put WebCite cannot handle the volume that Wikipedia provides, even the small run of 10-50 PDFs a night by Checklinks seems to be contributing to the problem. — Dispenser 22:15, 9 June 2010 (UTC)
- That I suppose would explain the bot, but what about manual submissions to the archive? Should I just archive references as I see fit? WayBack's six-month lag seems to be a bit of a long wait considering some website pages disappear in only a few weeks. – Kerαunoςcopia◁galaxies 22:18, 9 June 2010 (UTC)
- Is this still the case? It could affect issue PYWP-18. — Jeff G. ツ (talk) 04:24, 29 February 2012 (UTC)
Impossible archiving
Some cited sources use various forms of presentation, including streaming audio (sometimes integrated within a written interview), streaming video, and, especially in the case of Billboard's website, flash or some similar method of loading articles. These sites can't be archived at all. Without transcripts published elsewhere, these sites seem to me to be absolutely vulnerable to link rot. – Kerαunoςcopia◁galaxies 19:04, 12 June 2010 (UTC)
dafuq? impossible to archive a .mov file? or a .mp4 file? or a .swf file? this is usually no problem... (although most search engines CHOOSE not to do it, but its entirely optional.) 88.88.102.239 (talk) 21:13, 2 May 2012 (UTC)
- Most media embeds are not simple .mov/.mp4 references and usually an archival crawler might only be able to grab the .swf that is doing the embedding. It doesn't run the .swf, so it can't grab the video stream itself (which may not even be over HTTP). HTML5 has the potential to improve things somewhat, but progress is slow, and there are still lots of other tricky cases that are difficult to archive. AndyJ 14:54, 2 January 2013 (UTC)
Link Rescue Bots
Two new bots have just been approved to find archives for dead links. User:DASHBot, the first one, is written and opperated by User:Tim1357. It has gone through all the featured articles, and has made a large dent in the good articles. However, due to some small technical difficulties, it is down for the moment. User: H3llBot is written and operated by User: H3llkn0wz. It does pretty much the same thing. As the two bots finish up the Featured articles and the Good articles i think we will do articles by request. Any ideas of which articles we could let the bots run on next? (Categories are good) Tim1357 talk 17:12, 15 June 2010 (UTC)
- I'd say A-Class articles and then all Vital articles that are B-class and below, that is if the bot is able to make that distinction. -- Ϫ 02:25, 21 July 2010 (UTC)
blogs.nzherald.co.nz
URLs http://blogs.nzherald.co.nz will cease 301 redirecting to URLs on http://www.nzherald.co.nz shortly. Checking my logs I note that a few articles have references/links to articles on blogs.nzherald.co.nz ( such as Gordon Ramsay ). These should be updated as soon as possible. The equivalent articles should still exist but will be harder to find after the redirect is gone. Could somebody please inform a bot operator. I have no idea how many links are in place. - NZH Admin —Preceding unsigned comment added by 203.99.66.20 (talk) 03:46, 10 August 2010 (UTC)
Web Link Checking Bot
Hi, I'm currently running a bot on my server against Wikipedia to check the external links, using pywikipediabot and the included weblinkchecker.py script. What this bot does is scan the contents of articles for external links, and then proceeds to check the links for 404s or timeouts, and creates a datafile of the non-working links. After about one week, the bot will then recheck the links, and report on the talk pages of the articles which links are dead, according to the data that the bot collected. In the report submitted, the bot will automagically suggest a link to archive.org, which if it was caught, should be a valid archived version of the link. The reason for my post here is to request input from the community, per the suggestion of Tim1357 in this thread. I am watching both this page, and the BRFA thread, so commenting at either location is ok, and your input is greatly appreciated. Thanks, Phuzion (talk) 14:34, 17 August 2010 (UTC)
- On dewiki we decided that at minimum 4 weeks delay and 3 tests are required because many links are back online after 2-3 weeks after changing hosting service. But the script on repository has some bugs you should care about. You could test the script this page:
- which report errors on all four links above. Merlissimo 16:13, 17 August 2010 (UTC)
- Thanks for the input. Do you know if there is an updated version of the script that has the bugs fixed? Phuzion (talk) 16:45, 17 August 2010 (UTC)
- What bugs are meant by "the script on repository has some bugs you should care about."? — Jeff G. ツ (talk) 04:57, 17 January 2012 (UTC)
- How can I help? I'm interested in helping with any automated deadlink detection/mitigation. Since archive.org stopped archiving as of late 2008, checking it is necessary, but not sufficient. Automated checking of, and pre-emptive archiving with, Webcitation is needed, IMHO (or other service, especially for pages poorly captured by Webcitation - conditionals, Javascript, AJAX, etc have problems). I'm in favor of an on-demand full-rendered-web-page screengrab service, or an as-rendered-html+CSS-only service if one exists - these seem to be the only way to simultaneously guarantee pixel accuracy and actual content presence. Of course, respecting robots.txt. --Lexein (talk) 01:35, 12 September 2010 (UTC)
- We mostly need people filling out references. Currently Reflinks is probably the best in filling out references, but I haven't updated it with the feedback/learning mechanisms and the WebCite interface is a bit hard to use. You can also use Checklinks to semi-automatically fix links. — Dispenser 22:37, 12 September 2010 (UTC)
- I know and use those tools frequently, but I would certainly participate in revising and betatesting semi-auto tools which help as well. --Lexein (talk) 23:18, 13 September 2010 (UTC)
- We mostly need people filling out references. Currently Reflinks is probably the best in filling out references, but I haven't updated it with the feedback/learning mechanisms and the WebCite interface is a bit hard to use. You can also use Checklinks to semi-automatically fix links. — Dispenser 22:37, 12 September 2010 (UTC)
I have a proposal in for such a bot, and could use some responses at m:Talk:Pywikipediabot/weblinkchecker.py##Questions_from_BRFAs_and_elsewhere_on_English_Wikipedia. — Jeff G. ツ 03:02, 23 March 2011 (UTC)
- My request for responses linked above has moved here. — Jeff G. ツ (talk) 20:57, 21 January 2012 (UTC)
Solution against the broken external_links: backup the Internet
Please find the concept description on the Village Pump. JackPotte (talk) 09:53, 3 September 2010 (UTC)
- 2013 update: NSA does this, and email, IMs, texts, VOIP and phone calls now. New 'pedia: NSApedia.gov. Soon all us editors will be out of work. --Lexein (talk) 02:45, 18 August 2013 (UTC)
Marking a dead link within a citation template
How is one to mark a dead link within a citation template, e.g.:
- "Gujrat Police official website, Standard Operating Procedures" (PDF). Retrieved 2009-03-08.
I did a hack by adding |publisher={{Dead link}} into the template, but that may not be the preferred way to do this. __meco (talk) 16:33, 5 September 2010 (UTC)
- It's better not to do so, but rather follow the }} with
{{dead link|date=August 2010}}
.- "Gujrat Police official website, Standard Operating Procedures" (PDF). Retrieved 2009-03-08.[dead link ]
- Yes, it seems to look odd, but I believe it's best practice for "deadlink" to always appear as the last text on a citation or link line. Of course, make an attempt to repair with Checklinks, too... --Lexein (talk) 18:51, 5 September 2010 (UTC)
All links eventually go bad
I think that in the fullness of time, on geologic time scales, all links will go bad. This is simply because those who sponsor such web pages will ultimately die off. Web servers will be lost in fires and floods. Wikipedia administration needs to recognize this reality. The future expansion of Virtual Servers with NO PERSISTENT STATES will only make this worse. Please see Amazon Virtual Private Cloud. There are many Wikipedia editors who delete content that has a dead link, and use WP:proveit to make a point. Most editors are too lazy to go to the library to verify older information, and just delete things. It is hard to maintain "presumption of good faith" when undereducated editors are denying a lot of history. Look at this example: Wikipedia:Articles_for_deletion/Event_Driven_Language. We can see that Beeblebrox, by all accounts a good wikipedian, justified a delete because the Library was too far away. Wikipedia should not exist at the convenience of the editors, but should exist in the service of truth. Perhaps there can be some kind of "grandfathering" clause on links. Perhaps, I would suggest, that if a link exists for a long enough period of time, that the standard of proof should shift from the creators/maintainers to those who would delete. In other words, if the link was there for a number of years, and then it rotted, then the link would be "presumed valid" instead of the present case, where is seems to be presumed a fabrication of someone's imagination. This way, the content in Wikipedia could age gracefully, becoming more authoritative as it got older. This feels more proper to me. This would be a good alternative to the present case where good content is deleted willy nilly by those who would deny history, simply because it is hard to verify. — Preceding unsigned comment added by Marcwiki9 (talk • contribs) 03:30, 20 December 2010 (UTC)
- You seem to have declared everyone's opinion on a single incident. The closing administrators should be experienced enough to separate valid reasons from invalid reasons. The content was not lost, it was merged. Verifiability is a principle of Wikipedia, and the reader cannot verify the material if the website rotted years ago. That's why we have this page. Given you posted here, is there an actual change/removal/addition you propose to this guideline? The "more authoritative as it gets older" will in my opinion not pass. — HELLKNOWZ ▎TALK 10:55, 20 December 2010 (UTC)
- I don't mean to impugn everyone. What I am proposing is not a reduction in verifiability. Wikipedia must remain verifiable, of course. But the system we have now is that overzealous and undereducated editors will deny history, simply because the links have rotted. They are too lazy to verify content, so they delete it. They do it because "the library is 250 miles away", and they cannot just pop over there. I am making the suggestion that this is wrong and bad. Wikipedia ought to do something about the very long term problem of rotted links, because all links eventually will rot. WP:linkrot seems to show this as an accellerating problem. As links rot through distant time scales, under the present system, the whole of wikipedia will have to be slowly rewritten. I think this is revisionist history, and it is objectionable to me. It can lead to history being manipulated by those who control search engines. Of course, you all might think I'm wrong. Whatever. I intend it only as food for thought. I am not declaring everyone's opinion on a single incident. I see a pattern here of editors denying history and deleting content, simply because they see the verification as too much work. I see it all the time. It is as if the orwellian memory hole lives. Editors will chuck all content without a valid link, even if the link was good in the past. They do this despite the wikipedia policies expressly forbidding it. --Marcwiki9 (talk) 02:51, 21 December 2010 (UTC)— Preceding unsigned comment added by Marcwiki9 (talk • contribs) 02:40, 21 December 2010 (UTC)
- If their actions are against policy, then their edits should be reverted. If their good-faith edits are against policy or guidelines, then they should be educated. If they remove previously undisputed content because a link is bad, they should be informed not to do this. I don't see what solution you propose for the hyperbolic problems you are describing. Wikipedia has a strong bias towards electronic sourcing, because frankly websites are easy to access without driving 150 miles to the library. As far as actual record of history is concerned, there is much much written material elsewhere that doesn't "linkrot". — HELLKNOWZ ▎TALK 10:25, 21 December 2010 (UTC)
- So, my thoughts are meaningless drivel? To be chucked into the ether? No, the problem is much worse than you are even able to comprehend. You're unshakable defense of the status quo blinds you to even see that there is a problem, much less forge a solution. You admit there is a bias, but yet, fail to point to any solution at all. And when one is put forward as food for thought, not a serious proposal, you dismiss it as hyperbole. And then you make the astonishing claim that Wikipedia doesn't matter, because the "actual record of history" lies elsewhere. I guess that Wikipedia will overcome all of these problems someday. I was just trying to help.--Marcwiki9 (talk) 00:52, 22 July 2011 (UTC)
- It seems you have misinterpreted every sentence I said to the level of personal remarks. Personally, having run a bot that tags and replaces thousands of dead links, I do not see a need to explain my stance or motivation if my replies are misinterpreted anyway. — HELLKNOWZ ▎TALK 07:32, 22 July 2011 (UTC)
- So, my thoughts are meaningless drivel? To be chucked into the ether? No, the problem is much worse than you are even able to comprehend. You're unshakable defense of the status quo blinds you to even see that there is a problem, much less forge a solution. You admit there is a bias, but yet, fail to point to any solution at all. And when one is put forward as food for thought, not a serious proposal, you dismiss it as hyperbole. And then you make the astonishing claim that Wikipedia doesn't matter, because the "actual record of history" lies elsewhere. I guess that Wikipedia will overcome all of these problems someday. I was just trying to help.--Marcwiki9 (talk) 00:52, 22 July 2011 (UTC)
- If their actions are against policy, then their edits should be reverted. If their good-faith edits are against policy or guidelines, then they should be educated. If they remove previously undisputed content because a link is bad, they should be informed not to do this. I don't see what solution you propose for the hyperbolic problems you are describing. Wikipedia has a strong bias towards electronic sourcing, because frankly websites are easy to access without driving 150 miles to the library. As far as actual record of history is concerned, there is much much written material elsewhere that doesn't "linkrot". — HELLKNOWZ ▎TALK 10:25, 21 December 2010 (UTC)
- I don't mean to impugn everyone. What I am proposing is not a reduction in verifiability. Wikipedia must remain verifiable, of course. But the system we have now is that overzealous and undereducated editors will deny history, simply because the links have rotted. They are too lazy to verify content, so they delete it. They do it because "the library is 250 miles away", and they cannot just pop over there. I am making the suggestion that this is wrong and bad. Wikipedia ought to do something about the very long term problem of rotted links, because all links eventually will rot. WP:linkrot seems to show this as an accellerating problem. As links rot through distant time scales, under the present system, the whole of wikipedia will have to be slowly rewritten. I think this is revisionist history, and it is objectionable to me. It can lead to history being manipulated by those who control search engines. Of course, you all might think I'm wrong. Whatever. I intend it only as food for thought. I am not declaring everyone's opinion on a single incident. I see a pattern here of editors denying history and deleting content, simply because they see the verification as too much work. I see it all the time. It is as if the orwellian memory hole lives. Editors will chuck all content without a valid link, even if the link was good in the past. They do this despite the wikipedia policies expressly forbidding it. --Marcwiki9 (talk) 02:51, 21 December 2010 (UTC)— Preceding unsigned comment added by Marcwiki9 (talk • contribs) 02:40, 21 December 2010 (UTC)
Solving link rot problem
We are working to solve the link rot problem here. We would like everybody to voice there concerns here. Thanks - Hydroxonium (H3O+) 14:25, 6 February 2011 (UTC)
Conflict between guidelines
This guideline and WP:DEADREF give conflicting advice about dealing with dead links used to support article content. Please join the conversation at WT:Citing sources#Question_regarding_.22Preventing_and_repairing_dead_links.22. WhatamIdoing (talk) 22:12, 17 February 2011 (UTC)
- The lengthy conversation has closed, and I have updated the advice at WP:DEADREF. If anyone wants to check over this page and improve its contents, please feel free. WhatamIdoing (talk) 19:43, 28 March 2011 (UTC)
Proposal for new WikiProject to repair dead links
Just a notice for anyone who's interested. Wikipedia:WikiProject Council/Proposals/Dead Link Repair. -- Ϫ 06:39, 20 April 2011 (UTC)
A new WebCiteBOT
Hi all. I'm working in a new WebCiteBOT. I have opened a request for approval. It is free software and written in Python. I hope we can work together on this. Archiving regards. emijrp (talk) 17:15, 21 April 2011 (UTC)
RfC to add dead url parameter for citations
A relevant RfC is in progress at Wikipedia:Requests for comment/Dead url parameter for citations. Your comments are welcome, thanks! — HELLKNOWZ ▎TALK 10:49, 21 May 2011 (UTC)
Simple answer
Use more print references...
Obvious really. Wikipedia is a joke if it leans too heavily on the web alone.--MacRusgail (talk) 16:32, 10 August 2011 (UTC)
- If only more people were aware of the fact that references don't have to be online.. we should promote WP:Offline sources more.. -- Ϫ 15:58, 16 August 2011 (UTC)
- But they're so eeeeeeasy! But seriously, in practice, there's a balance to be struck. Some editors such as Cirt have created articles which are fantastically sourced, but completely offline, leaving out all convenience links. I don't know why; it may be due the research tools he uses, which, though deep, are not at all accessible to non-subscribers. Very annoying.
- Over at WP:AN/I I finally twigged to Bare link rot harms verifiability. Seems I don't care so much if a link rots if it has been properly, verifiably expanded. --Lexein (talk) 17:23, 16 August 2011 (UTC)
Extension:ArchiveLinks
http://www.mediawiki.org/wiki/Extension:ArchiveLinks
Is it possible to ask WMF to enable (maybe also finish) this wonderful extension? Bulwersator (talk) 10:20, 10 January 2012 (UTC)
Incompatibility with Wikipedia:Citing sources#Preventing and repairing dead links (even if that is linked here)
This page (Wikipedia:Link rot) states in its lead section that "These strategies should be implemented in accordance with Wikipedia:Citing sources#Preventing and repairing dead links, which describes the steps to take when a link cannot be repaired."
But how can we do in accordance with Wikipedia:Citing sources#Preventing and repairing dead links if some sentence in this page's lead section (for example "Do not delete factual information solely because the URL to the source does not work any longer. WP:Verifiability does not require that all information be supported by a working link, nor does it require the source to be published online.
Except for URLs in the External links section that have not been used to support any article content, do not delete a URL solely because the URL does not work any longer. Recovery and repair options and tools are available.") and the whole "Keeping dead links" section are incompatible with that page?
Does explicit instruction to "implement in accordance with Wikipedia:Citing sources#Preventing and repairing dead links" means that that page is predominat? --79.17.150.185 (talk) 22:25, 8 February 2012 (UTC)
- I'm not seeing a direct incompatibility. (I'd like to wikify your comment to make readability easier, may I?)
- As an encyclopedia, I think part of our mission is preservation. This means not to delete "dead links" just because they're dead. We also should not delete content just because a link goes dead. That's why we archive sources, and why we must, IMHO, always, as fast as possible, expand bare urls in inline citations so that if their links go dead, the title, date, publisher, and author still permit verification of claims.
- As for harmonizing the text of various policy, guidelines, essays, and info pages, that's an important ongoing task. It's good to wait for editing flurries to die down before trying to harmonize text. And thanks for discussing before making radical changes, by the way. --Lexein (talk) 02:24, 17 September 2012 (UTC)
Archive.is
I think we should go slow on advocating http://archive.is. The field is littered with defunct archive sites - just look at this article history. Archive.is looks good, very good in fact, and its performance and coverage of essentially all used sources is very encouraging. But IMHO Wikipedia can't afford to depend on a brand new site which so far, discloses no public information about its funding, affiliation, or future. I have communicated with the owner, and I am confident the owner is acting in good faith, but it's a solo effort. I'd like to see if the site is here in a year. In the meantime, I would like to advocate using WebCite in parallel with Archive.is, meaning at least archiving at WebCitation, if not citing in ref. I hope this is received as a sensible precaution, in the best interest of Wikipedia's future source verifiability. --Lexein (talk) 02:10, 17 September 2012 (UTC)
- I agree that we need to be circumspect. Just before seeing your commment above, I asked at http://blog.archive.is/ask :
- "Who runs this site? If we're going to trust it (see Wikipedia:Link_rot#Repairing_a_dead_link) we need to have good reason to think it's stable/funded/likely to stick around indefinitely. The webcite faq is CC-NC-SA, so consider using it as a starting point for your own faq."
- If we don't hear back soon, we should remove it. If Archive.is triggered a WebCite archive, in addition to its own, then I'd support it's continued mention here starting now. Also, the IA now supports on-demand archiving. It just doesn't appear online for months. --Elvey (talk) 17:25, 5 October 2012 (UTC)
- IA certainly supports on-demand archiving. Traditionally, archived pages took many (three to six?) months to appear. In mid-2012, many pages seemed to be returned within about three to five weeks. By early 2013, this seems to have further reduced to about three to seven days, especially when archiving pages from several well-known sites. In more recent times, some archived pages have been returned in around 200 minutes by IA but this very much depends on the site being archived. -- 31.52.117.100 (talk) 20:27, 29 July 2013 (UTC)
- +1 for removing archive.is from the instructions, or at least not promoting it so strongly over sites like archive.org and other institutions that are part of the International Internet Preservation Consortium --Edsu (talk) 16:52, 16 November 2012 (UTC)
- +2 for removing http://archive.is from the instructions, until such time as its reliability and persistence is better demonstrated. Beyond the web archives already mentioned, the List of Web archiving initiatives and Memento Project pages may be other useful resources to point to in the instructions. --nullhandle (talk) 21:46, 16 November 2012 (UTC)
- Sorry, I did not find your message not in my inbox nor in Tubmlr control panel :( Hopefully, I found this conversation by searching for archive.is on twitter. As I found the questions here, I answer here as well.
- About FAQ and more info on the page: a new design is being prepared. It will have more information (both textual and infographic) about how to use the site, how to search for saved pages, etc.
- About funding: it was started as a side project, because I had a computational cluster with huge hard drives and those disk space was not used. It was an kind of experiment, to see if the people would need a service like this and choose a ways to develop the service based on how people will use it.
- About stability: currently it hosted on budget hosting providers (ovh.net and hetzner.de) using Hadoop File System. Although the hardware is cheap, all data is duplicated 3 times in 2 datacenters (in Germany and France) and designed to survive hardware fault with minimal downtime.
- Almost all external links of Wikipedia (all Wikipedias, not only English) were archived in May 2012 pursuing two goals: to preserve the pages which may disappear and to stress test and find bugs in my software. If you see your link is rot, you can check it on archive.is and change link to the saved page. If you feel you do not trust archive.is but it is the only site which has preserved your content, you can save archive.is's page on WebCite or other site thus making more copies and increase redundancy.
- Vice-versa, you can save WebCite's or IA pages on archive.is to increase redundancy. (IA is not likely to go offline, but the new domain owner may put "Disallow: /" in robots.txt and thus remove the previous domain owner's content from IA, so it may have sense).--Rotlink (talk) 04:25, 18 November 2012 (UTC)
- Also, there are some popular sites IA and WebCite cannot work with. Facebook.com is a big example. --Rotlink (talk) 04:58, 21 November 2012 (UTC)
- I've rewritten the archive.is mention as "under evaluation", and emphasized that it should not be used alone until consensus agrees it is reliable. I did not delete it because we have quite a history of suggesting trying out services without advocating them. Back when IA was broken in 2008-2010, I was desperate, and used anything that seemed like it would work. Many of those services later vanished. But WebCitation, as sketchy and unfunded as it first seemed, has survived, Javascript malscripts be damned. So can we AGF for archive.is as "under evaluation"? --Lexein (talk) 23:09, 16 November 2012 (UTC)
It very much looks like Archive.is keeps only the newest shots when it archives external links automatically. It archives the external links once in a while, discarding the old archived versions. In the end, it's archiving dead links. And that is very bad. I detailed the process at Talk:Archive.is#How does automatic archiving work?. The owner of Archive.is probably doesn't realize that the program deletes old versions. — Ark25 (talk) 00:35, 27 July 2013 (UTC)
- I've written to archive.is both on the Ask Me Anything form and by email, to ask about this behavior. I have not yet checked out old archive.is links I've used to see if this a global problem. --Lexein (talk) 02:29, 18 August 2013 (UTC)
- I am sorry, my bad. I didn't know that Archive.is is making incremental backups and that it started to create backups on all Wikipedias in may-june 2013 - see Talk:Archive.is#How does automatic archiving work? and User talk:Rotlink#Questions about Archive.is. Sorry for the false alarm! — Ark25 (talk) 22:52, 21 August 2013 (UTC)
- I think Archive.is is very nice, for making automatic backups for all links in Wikipedia. It really deserves to be integrated as a WikiMedia project or at least it deserves to be payed by Wikipedia. It's very important to preserve the archives of the newspapers. — Ark25 (talk) 23:00, 21 August 2013 (UTC)
- The proprietor of Archive.is, User:Rotlink, assures us here and on its FAQ page that it is financially secure. However Webcitation.org has stated on its home page that it will be in financial trouble later this year. This has become a topic of discussion here:meta:WebCite. --Lexein (talk) 18:59, 25 August 2013 (UTC)
Per WP:Archive.is RFC all archive.is links are to be removed from the English Wikipedia. This has been added so that anyone finding this discussion will be aware both the existence of this RfC and the results. — Makyen (talk) 05:11, 20 March 2014 (UTC)
We need an anti-link-rot bot!!!
What the hell is really going on here, big picture? I just read Wikipedia:Bots/Requests_for_approval/RotlinkBot and it's a long discussion between botmaster HELLKNOWZ and Rotlink. To me, at first, at least, it looked a bit like the botmaster of a competing bot that is INACTIVE hassled Rotlink and drove him out of town in Aug 2013. User:H3llBot has made no edits since November 5, 2013 ; I guess its trial run was not a success? I haven't been able to investigate deeply enough and wonder what the hell is really going on here. Why aren't any of these bots running? Seems like someone is anti-archive. I guess I could go read H3llBot, DASHBot and BlevintronBot and their talk pages, which HELLKNOWZ referred Rotlink to, but perhaps someone who has can summarize? Lexein? Ark25? Rotlink? (I know, you're blocked; we can copy any comments you make on your talk page here.) Hellknowz? Blevintron? I don't give a shit what happened AFTER the bot was Withdrawn by operator. (though I have read the RFC page). I want to understand why that happened in the first place. It seems to me more than a bit odd that despite all the efforts that have been made, no bot is performing this task. Perhaps HELLKNOWZ is merely familiar with big, hidden roadblocks to getting/keeping such a bot running and was just trying to help Rotlink do what HellBot has been unable to do, at least lately. The RFC closer wrote, and I too "urge the folks at Archive.is to come forward, apologize for policy violations carried out thus far, and work within the system." AND, I'd like to have a bot taking corrective, or better yet preventative action to address link rot, so I'd like to see us do what it takes to make that happen. The first step is getting an idea of what went wrong. Did Rotlink just run out of patience? Did some folks tell Rotlink to calm down so they could stop whipping him, as so often happens 'round here? I'm guessing there was lots more discussion beyond what's on the RFBA. The RFC closer said, " It seems likely this service could be valuable to the community". I think the enormous potential value of an anti-link-rot bot is certain. So, just to be clear, I'm not looking to assign blame, I'm looking to understand what keeps going wrong, so it can be addressed, as that seems to me to be the first step. Help? --{{U|Elvey}} (t•c) 19:05, 5 June 2014 (UTC)
Off Topic Discussion about what happened AFTER the bot was Withdrawn by operator. I clearly stated that I was ONLY interested in what happened before that.
|
---|
|
- AGAIN: I want to understand why bot was withdrawn by operator in the first place. AGAIN: I don't give a shit what happened AFTER the bot was Withdrawn by operator. The RFC happened AFTER the bot was Withdrawn by operator. I don't recall seeing anything there that dicussed why the bot was withdrawn by operator in the first place. If I should take another look, what should I look at, Makyen?--{{U|Elvey}} (t•c) 05:41, 10 June 2014 (UTC)
- I've got some ideas for a bot and I've been thinking about proposing one for a while. I guess the place to start would be to raise an RFC and check that there is consensus for a bot to do this (or look for a past discussion on the subject)? I'm also somewhat daunted by the process and feel that I'm too inexperienced with the RFC / BRFA processes, and with Wikipedia in general - not to mention lacking the time to formulate the RFC...--Otus scops (talk) 22:05, 5 June 2014 (UTC)
- I see a separate RFC as inappropriate. A BRFA is the process for getting a bot approved. If it's approved, the bot should be able to do the thing it was approved to do. Someone could add an RFC tag to the BRFA if they felt wider notice was needed. That's as simple as adding {{rfc|proj}} to the BRFA. That makes the BRFA an RFC too. Presto! --{{U|Elvey}} (t•c) 05:38, 10 June 2014 (UTC)
- The only information I have seen as to why the bot was withdrawn by the operator is the comment rotlink left in response to that question on his talk page: "I need time to rewrite the description, it is too unclear."
- Note that he withdrew the bot close to 24 hours after making posts to Wikipedia:Bots/Requests for approval/RotlinkBot which appeared to indicate that he was preceding with development. His answers (linked above) to the why? question was provided 9–15 hours later (multiple edits to his response). — Makyen (talk) 07:20, 10 June 2014 (UTC)
- I see a separate RFC as inappropriate. A BRFA is the process for getting a bot approved. If it's approved, the bot should be able to do the thing it was approved to do. Someone could add an RFC tag to the BRFA if they felt wider notice was needed. That's as simple as adding {{rfc|proj}} to the BRFA. That makes the BRFA an RFC too. Presto! --{{U|Elvey}} (t•c) 05:38, 10 June 2014 (UTC)
- I've got some ideas for a bot and I've been thinking about proposing one for a while. I guess the place to start would be to raise an RFC and check that there is consensus for a bot to do this (or look for a past discussion on the subject)? I'm also somewhat daunted by the process and feel that I'm too inexperienced with the RFC / BRFA processes, and with Wikipedia in general - not to mention lacking the time to formulate the RFC...--Otus scops (talk) 22:05, 5 June 2014 (UTC)
- Maybe we should go back further and see what happened before he filed his bot request.
- RotlinkBot edited prior to approval, and prior to even requesting approval
- The "bot" was editing in good faith, as its operator mistakenly believed that as a supervised, but unautomated script, it could be run without prior approval
- But other editors were reporting problems with these edits, so evidently the supervision was lacking
- Only after it was blocked, was a bot request filed.
- "You would be surprised at the amount of tiny problems that all need to be fixed (and why previous bots aren't even running)."
- I do not in any way see Hellknowz hassling Rotlink. I see a civil conversation where legitimate concerns are being raised. Reading between the lines and speculating a bit, from the later email conversation that you said was off-topic, I think that Rotlink, who was just doing this as a hobby or something like that (I'm assuming good faith and that he was not setting up some nefarious moneymaking enterprise), just misjudged the amount of work it would take to bring up a robust, acceptable bot—as had apparently several others before him—and decided that, as a volunteer, it was just not worth the time and trouble, so he just abandoned the project. Bots are frequently withdrawn after their operators realize that they tried to bite more then they could chew. Unfortunately the fact that Rotlink was operating an archive site himself clouds the situation, as that gives rise to suspicions that he could favor linking to his own archive site.
- Maybe we should go back further and see what happened before he filed his bot request.
- Again, I repeat my assertion that I don't think that a bot is the way to go, as I don't think that restoration of archived links should be automated, due to copyright, BLP, "right to be forgotten", etc. issues. A human needs to check for those issues. But as Rotlink has said, saving copies of linked references is relatively easy, heck hard drive prices are down to where most anyone could afford to build the server for archive storage. It's the legal issues that are a bigger concern than the technical issues. I think the Foundation should just automatically save copies of all external URL-linked pages whenever a new link is added. These archived copies could then be later restored by an administrator, after verifying that there were no issues with that. Is this a viable model for dealing with link rot? Wbm1058 (talk) 16:55, 10 June 2014 (UTC)
- A big thanks for all that. I think your speculation - that misjudgement of the amount of work it would take to bring up a robust, acceptable bot—by several who tried is the main issue - may well be spot-on, with respect to what the hell is really going on here, big picture. If Rotlink or anyone else wants to archive every URL added to a Mediawiki as its added, by monitoring recent changes type feeds, there's nothing anyone can do that would stop him. We could stop him from adding links to them to articles, yes. But I see that, and any other restriction based on some sort of "right to be forgotten" theory is an attempt to close the barn door after the horse has already run out. Very similarly, http://en.wikichecker.com/ can't be stopped (or forced to honor opt-in). --{{U|Elvey}} (t•c) 20:52, 10 June 2014 (UTC)
Newspaper websites which have undergone link format changes
Might be good to record a list of newspaper websites which have changed their article link format. This would help systematic, albeit manual, review of citations based on said newspapers. For instance:
periodical | old format | new format | change date |
---|---|---|---|
Arizona Daily Star | azstarnet.com/{section}/{article id} | azstarnet.com/{section}/{abbreviated title}/article_{identifier}.html | sometime after 2010 |
Just a thought. --User:Ceyockey (talk to me) 11:47, 16 January 2013 (UTC)
- Yes, I find it a very good idea. I am doing that extensively on my native language Wikipedia (Romanian). I am putting such information on the talk pages of every newspaper's article. Check for example:
- ro:Discuție:Jurnalul Național#Reindexări articole
- ro:Discuție:Evenimentul#Reindexare articole
- ro:Discuție:Ziarul Lumina#Link rot
- ro:Discuție:Adevărul#Reindexări articole
- ro:Discuție:Evenimentul zilei#Reindexări de articole
For example the website of Adevărul changed link formatting 2 times:
==>
- http://www.adevarul.ro/actualitate/Cosmonautul-Prunariu-trimis-rezerva_0_43197528.html Cosmonautul Prunariu, trimis în rezervă], 8 februarie 2007, Adevărul
==>
- http://adevarul.ro/news/societate/cosmonautul-prunariu-trimis-rezerva-1_50ac14727c42d5a663849b01/index.html Cosmonautul Prunariu, trimis în rezervă], 8 februarie 2007, Adevărul (2012)
It's also an interresting exercise, I learned that many sites (most) changed their formatting like this:
- http://www.evz.ro/detalii/stiri/averile-celor-mai-bogati-1000-de-britanici-s-au-redus-la-jumatate-833971.html (having the article index at the end - the change happened in 2010 or so)
In this case, first it was like:
and then:
It's interresting that you can access the article like this:
Print version:
Mobile version:
- http://m.evz.ro/news/833971 - doesn't keep them for long it seems. A link that works: http://m.evz.ro/news/1049504
PDF version:
It's useful to know all those things. It's a little bit of reverse engineering. But it helps those who try to repair broken links. Such knowledge even helped me to repair broken links with a robot. Links like
Transformed into:
Here is the robot at work: [1] — Ark25 (talk) 00:18, 27 July 2013 (UTC)
Yahoo news - when they disappear, do they disappear altogether?
I found recently that at least some content at Yahoo News has been captured at archive.org (internet archive). The item which caught my attention ... http://web.archive.org/web/20090502072711/http://news.yahoo.com/s/ap/20090427/ap_on_re_mi_ea/ml_odd_israel_kosher_flu . It might be that after a certain date, Yahoo! blocked the archiving and they've not bothered to reach back and request removal of older content?? --User:Ceyockey (talk to me) 15:09, 26 January 2013 (UTC)
Mementos - cross-archive searching
Hi all,
A colleague of mine has just alerted me to the Mementos interface - it's hosted by the UK Web Archive, but searches across a range of archive sites. Here's an example of a search run for news.bbc.co.uk; as you can see, it picks up a couple of smaller repositories, such as the LoC, as well as the usual suspects.
Any objections to my pointing to this as a resource in the "Web archive services" section? Andrew Gray (talk) 16:15, 29 January 2013 (UTC)
- Hi, I work for the UK Web Archive and created that interface. We have no problem with it being publicised more widely. AndyJ 13:08, 19 February 2014 (UTC)
- Cool! I've added it, but to the (Internet archives section of the) Repairing a dead link section, as repair is what the tool is most useful for.--Elvey (talk) 18:06, 19 March 2014 (UTC)
- I have tried using the interface to which we are directed. I found it to have a significant issue. On the first page I tried, it removed all parameters passed with the URL. I was trying to find archives of a dead reference link I stumbled upon. Specifically the link was:
http://www.autonews.com/apps/pbcs.dll/article?AID=/20011203/ANE/112030837
- When this URL is entered, the interface strips it to:
http://www.autonews.com/apps/pbcs.dll/article
- This, obviously, produces erroneous results. As such, the interface was completely unusable for this particular search.
- While the interface will work for many URLs, it appears that it is useless for any that require parameters. I am going to add some explanatory text to the project page and move it from the top of the section. While it is useful, this issue significantly limits its usability. I also found that it was not immediately obvious that the interface had effectively corrupted the URL for which I was searching. Thus, I expect that users will find it confusing when they encounter this issue. — Makyen (talk) 05:36, 20 March 2014 (UTC)
- You could simply encode it http://www.webarchive.org.uk/mementos/search/http%3A%2F%2Fwww.autonews.com%2Fapps%2Fpbcs.dll%2Farticle%3FAID%3D%2F20011203%2FANE%2F112030837. Or even just the parameter delimiters http://www.webarchive.org.uk/mementos/search/http://www.autonews.com/apps/pbcs.dll/article%3FAID=/20011203/ANE/112030837 — HELLKNOWZ ▎TALK 10:56, 20 March 2014 (UTC)
- Thank you for the information. I came to the same conclusion after experimenting a bit. However, my main point was that we should not have Mementos as the first thing mentioned in the section on archiving unless it is working perfectly (e.g. without the need for users to hand-edit the URL for which they are searching). I believe the text now reflects the need for encoding the query string separator (i.e. the "?"). This appears to get people 90% of the way to usability without having to go into great detail about how to encode a URL. The bookmarklet continues to function as intended. In the example I used above the bookmarklet was not very useful due to the site now in control of the domain redirecting the page. — Makyen (talk) 12:40, 20 March 2014 (UTC)
- Good catch and good workarounds. @AndrewNJackson: Hope the memento folks make use of an appropriate urlencode/urldecode library soon. Kudos all 'round. --Elvey (talk) 15:47, 2 April 2014 (UTC)
- Yes, thank you, we have resolved the underlying issue and will endeavour to get this deployed across the relevant web archives ASAP. AndyJ 15:51, 2 April 2014 (UTC)
- Currently, it looks like it would be a good idea for us to also state that Mementos should not be the only site checked as Mementos can sometimes return no results when archives exist at sites which it normally includes. An example of this is trying to find archives of Battle of the Atlantic. As of April 2014[update], Archive.org reports it has 63 or 64 archives (https, http). Mementos reports 0 archives (https, http). Mementos usually finds archives at Archive.org, but clearly that is sometimes not the case. We should recommend that editors also do their own searches at least in addition to searching on Mementos.
- AndrewNJackson, given your response above, my hope is that this also gets fixed.
- Note: I said 63 or 64 archives reported by Archive.org because the number it reports changes upon different refreshes of the results pages. — Makyen (talk) 20:50, 2 April 2014 (UTC)
- Hmm, the central Memento service my UI depends upon treats Wikipedia as special, and attempts to redirect to the different versions of content held on Wikipedia itself, rather than web archives (and my UI does not cope well with this). I'll follow up with the Memento developers as to their intentions. AndyJ 21:20, 2 April 2014 (UTC)
- Hmm, http://www.webarchive.org.uk/mementos/search is down yet again. Been down all day, but displaying "Sorry, but there was an unexpected error that will prevent the Memento from being displayed. Try again in 5 minutes."--{{U|Elvey}} (t•c) 06:33, 6 June 2014 (UTC)
- Hmm, the central Memento service my UI depends upon treats Wikipedia as special, and attempts to redirect to the different versions of content held on Wikipedia itself, rather than web archives (and my UI does not cope well with this). I'll follow up with the Memento developers as to their intentions. AndyJ 21:20, 2 April 2014 (UTC)
- Yes, thank you, we have resolved the underlying issue and will endeavour to get this deployed across the relevant web archives ASAP. AndyJ 15:51, 2 April 2014 (UTC)
- Good catch and good workarounds. @AndrewNJackson: Hope the memento folks make use of an appropriate urlencode/urldecode library soon. Kudos all 'round. --Elvey (talk) 15:47, 2 April 2014 (UTC)
- Thank you for the information. I came to the same conclusion after experimenting a bit. However, my main point was that we should not have Mementos as the first thing mentioned in the section on archiving unless it is working perfectly (e.g. without the need for users to hand-edit the URL for which they are searching). I believe the text now reflects the need for encoding the query string separator (i.e. the "?"). This appears to get people 90% of the way to usability without having to go into great detail about how to encode a URL. The bookmarklet continues to function as intended. In the example I used above the bookmarklet was not very useful due to the site now in control of the domain redirecting the page. — Makyen (talk) 12:40, 20 March 2014 (UTC)
- You could simply encode it http://www.webarchive.org.uk/mementos/search/http%3A%2F%2Fwww.autonews.com%2Fapps%2Fpbcs.dll%2Farticle%3FAID%3D%2F20011203%2FANE%2F112030837. Or even just the parameter delimiters http://www.webarchive.org.uk/mementos/search/http://www.autonews.com/apps/pbcs.dll/article%3FAID=/20011203/ANE/112030837 — HELLKNOWZ ▎TALK 10:56, 20 March 2014 (UTC)
- Cool! I've added it, but to the (Internet archives section of the) Repairing a dead link section, as repair is what the tool is most useful for.--Elvey (talk) 18:06, 19 March 2014 (UTC)
UK Web Archive is just one of dozens of participants in the Memento project. The master or default Memento site is timetravel.mementoweb.org which has a search interface. As noted, the memento data on archive availability is spotty - still worth checking if you can't find an archive at Wayback/WebCite/Archive.is -- GreenC 16:17, 13 December 2016 (UTC)
Citations on Wikipedia and discussion at meta:WebCite
There is a discussion at meta:WebCite regarding citations on Wikipedia that would be of interest to those that watchlist this page. Wikipedia currently has 182,368 links to this archive site. Regards. 64.40.54.47 (talk) 11:41, 11 February 2013 (UTC)
suggest removing: Web archiving is especially important when citing web pages that are unstable or prone to changes, like time sensitive news articles or pages hosted by financially distressed organizations.
I believe we should remove "Web archiving is especially important when citing web pages that are unstable or prone to changes, like time sensitive news articles or pages hosted by financially distressed organizations." Any news article that changes places, you can just update the link to. Most don't change themselves though. If ever the link stops working, then you can add an archive to replace it. Over at Talk:Garrett_(character), an editor is quoting that sentence as a reason to include archive links all over an article, when its not needed since links to the content still work fine on their own. Dream Focus 22:34, 12 March 2013 (UTC)
- Late reply: I don't agree with removing this sentence. All sources are ephemeral. Some are more ephemeral than others, like news such as AP(with contractual expiration times), UPI, NYT, Google News(15-30 days), and anything Google caches(15-120 days). Many Archive links in an article "all over" aren't a problem, since they're just an incremental burden when using templates, and can be filled in automatically by some tools, like reflinks. Eventually, all links rot. This is invariant. Archive.org and Webcitation.org don't/can't archive all websites due to robots.txt. I've even seen whole sites which used to be archiveable by them completely disappear behind a domain owner's new robots.txt (archive.org respects current robots.txt, not past ones). My argument is that we should archive early, defensively, redundantly(multiple archives), and often, to avoid being caught flatfooted by such blackouts. I use the webcitation bookmarklet like a Tourette syndrome twitch. --Lexein (talk) 00:55, 30 July 2013 (UTC)
Pay Wall
The " Web archive services " section implies that the use of a web archiving service is useful in cases where material is moved behind a paywall. This position is troubling. Do we really mean this, and if so, how do we justify it?--SPhilbrick(Talk) 23:45, 19 March 2013 (UTC)
- Not likely to get a response over here. That's why I asked that over at User_talk:Jimbo_Wales#violating_copyright_laws_by_linking_to_archived_sites_when_original_site_is_still_live. Also have the discussion still going on at Talk:Garrett (character). Dream Focus 23:53, 19 March 2013 (UTC)
Should the original url= be required when using archiveurl=
People here may be interested in commenting on the issue described at:
Wikipedia:Village pump (policy)/Archive_105#Citations: Should the original url.3D be required when using archiveurl.3D. Dragons flight (talk) 18:47, 8 April 2013 (UTC)
Link Rotting Across the Universe
Me and Tmol42 have been discussing link archiving, and I'd just like some clarification on a matter. Sorry, I know you've probably had this so many times in so many forms, but I'd be grateful if you'd humour me! I found a dead link to a PDF file at Parish councils in England, and found a backup at the WayBack machine. I added the archiveurl=
and archivedate=
parameters, and took the information from Wayback. My revision can be found here. The other user then changed this, removing the archive parameters, and setting the URL to the Wayback archive. His change can be seen here. Which, if either, is correct? drewmunn talk 16:22, 24 May 2013 (UTC)
- Yours is correct, we don't link direct to archives direct for many reasons. We even have bots to correct this. — HELLKNOWZ ▎TALK 20:24, 24 May 2013 (UTC)
Better archiving of external links
On Romanian Wikipedia we were trying to use a WikiWix gadget. Each external link is accompanied by it's WikiWix cache link. In order to archive the external link, you just have to click on it's WikiWix corresponding link. The first time you click it - it will archive your external link/reference. The next time you click on it, you will get the archived page. This is a much better way to archive web pages than submitting a link on WebCite or even than using a bookmarklet. It works very fast, if you want to archive 20 references, you just have to open them all in tabs. However, WikiWix has some issues: seems it has some daily or weekly or monthly quota - you can't archive more than say 100 links per week - which makes it quite unusable.
You can see the cached links on WikiWix using a gadget that you can activate in your preferences.
The best solution would be something like the WikiWix gadget because you don't have to bother to present the archived link, the gadget will show it automatically. And it's very easy to archive it by just clicking on the archive for the first time. However, we need a better solution: A robot to cache all the external links in Wikipedia automatically (I just noticed in the discussion above that Archive.is did just that). And without such quotas like WikiWix, of course.
One solution would be to create a gadget for WebCite, to show the archived (cached) links near each external link. Together with a robot to take care of archiving all external links.
Another solution would be to create a gadget for Archive.is, to show the cached links, since its owner claims it cached all the external links last year. And, if possible, to arrange with him to archive the new external links each month or so.
For those who are not clear how WikiWix works, check this page: ro:Șantierul Naval Constanța. On the "Note"' section, you will see the next link:
- Bosanceanu si-a dublat afacerile la Santierul Naval Constanta, 31.08.2009, zf.ro, accesat la 11 februarie 2010
In order to see the WikiWix cached links, you have to activate the WikiWix gadget: http://ro.wiki.x.io/wiki/Special:Preferences#mw-prefsection-gadgets - the last checkbox: Versiunea arhivată pentru legăturile externe
Now, near each link you can see a small yellow image (like 10x10 pixels). In this case, right before the date (31.08.2009), the WikiWix archive link: http://archive.wikiwix.com/cache/?url=http://www.zf.ro/burse-fonduri-mutuale/bosanceanu-si-a-dublat-afacerile-la-santierul-naval-constanta-4823139/&title=Bosanceanu%20si-a%20dublat%20afacerile%20la%20Santierul%20Naval%20Constanta . — Ark25 (talk) 19:05, 26 July 2013 (UTC)
Memento for Wayback Machine links
I just discovered Memento, a protocol that has been proposed in the past as a way for MediaWiki to provide easier access to historical revisions of pages; and that has a draft extension which could implement such a thing. AFAIK it isn't implemented on any wikis yet. But the idea is a neat one, and according to the bug request asking for the extension to be added to mediawiki core, Archive.org now recognizes memento URLs for referencing cached pages in the wayback machine. So this seems like a good time to revisit setting up a linkbot that caches links with them.
I sent an email to Alexis @ IA and Kevin who wrote the unfinished ArchiveLinks extension to see if any recent progresds had been made. If so, it would be nice to have a guideline for including a memento timestamp in links to the archive.org cache. – SJ + 22:41, 15 August 2013 (UTC)
I created the document Memento Capabilities for Wikipedia that describes areas in which the Memento protocol could be leveraged to add end-user value to Wikipedia. One of the described capabilities relates to the link rot problem. I will propose a Wikiproject that uses the document as its starting point. Hvdsomp (talk) 21:57, 19 September 2013 (UTC)
Again about the Gadget
I have modified the Gadget I was talking about in the above section. Now, every external link on Romanian Wikipedia has another link near to it that takes you to the Archive.org version of the page on that link. The gadget can be modified to take you to the Archive.is version of the page, or to show both archives - ro:MediaWiki:Cache.js (function addcache). I think English Wikipedia should have such a thing too. This is the most convenient solution, since Archive.is is archiving all the external links in Wikipedia pages, and Archive.org is archiving almost all the newspaper sites. It's far more efficient than having to archive pages manually or than searching for archives manually. — Ark25 (talk) 05:08, 27 August 2013 (UTC)
Wikipedia:Archive.is RFC republicizing
Recent events related to archive.is have left the state of Wikipedia links to archive.is in a state that requires a community decision relative to archive.is. |
This constitutes broader publicizing of the above Request for Comment as suggested/requested on October 3. This RFC was started September 20, 2013. --Lexein (talk) 08:05, 7 October 2013 (UTC)
Wayback API
I just learned of this: mw:Archived Pages: "The Internet Archive wants to help fix broken outlinks on Wikipedia and make citations more reliable. Are there members of the community who can help build tools to get archived pages in appropriate places? If you would like to help, please discuss, annotate this page, and/or email alexis archive.org." - leaving the link here in case anyone else is interested or can help. –Quiddity (talk) 16:38, 31 October 2013 (UTC)
Are there any functioning linkfix bots?
http://www.language-museum.com/ used to be
- a linguistic website which offers the samples of 2000 languages in the world. Every sample includes 4 parts: (1) a sample image, (2) an English translation, (3) the speaking countries and populations, (4) the language's family and branch. … constructed and maintained by Zhang Hong, an internet consultant and amateur linguist in Beijing China.
The domain's top level is now used by a language teaching company, LM Languages. But by using the Wayback Machine, I found that the content is still up there, at http://www.language-museum.com/encyclopedia/, though I don't know if Zhang Hong or anyone is still maintaining it.
I discovered this while following a link from Toba Batak language, pointing to
That content is now at
I have updated the links in Toba Batak language. But the "museum" claims to have examples from 2000 languages, and there may be many more links in Wikipedia. So the thing to do, as I see it, is to update all links of the form
http://www.language-museum.com/LETTER/LANGUAGE.php
by inserting encyclopedia/
after .com/
. Or, even simpler, change
http://www.language-museum.com/
to
http://www.language-museum.com/encyclopedia/
unless "encyclopedia" is already in the URL.
Following the links in §Bots, I posted the request at User:RileyBot/Requests. Then I saw that the user is "semiretired: no longer very active on Wikipedia as of 29 April 2013", so I tried MerlLinkBot, but that one doesn't seem to be much better. Is there ANYBOT that can do a clearly defined search-and-replace?
If you have any questions, please {{ping}} me so I don't have to add this page to my watchlist. Thanks. --Thnidu (talk) 19:30, 11 January 2014 (UTC)
- I took care of it. --Ysangkok (talk) 16:53, 4 July 2015 (UTC)
- @Ysangkok: Thanks. --Thnidu (talk) 16:21, 5 July 2015 (UTC)
Archiving Archive.is snapshots with WebCite/Wayback
Since Archive.is may be on its way to being blacklisted soon, per the RfC, would it be possible to archive any Archive.is snapshots whose original URL is dead and has not been archived through WebCite/Wayback, in order to circumvent the blacklist? — Whisternefet (t · c) 00:08, 18 January 2014 (UTC)
- You can circumvent now by using MementoWeb. --Ysangkok (talk) 16:53, 4 July 2015 (UTC)
any way to search for dead links within a category
When I look at the list of dead links, it's huge. Is there a way to search it to find articles within particular categories with dead links? I guess I am more motivated to maintain articles within categories of interest to me than articles in general. Kerry (talk) 00:38, 21 January 2014 (UTC)
- Many Wikiprojects have their own to-do listings, which organizes maintenance tasks by subject. Check out Category:To-do list templates for WikiProjects, and also the Cleanup Listing tool which will allow you to search maintenance tasks by category. Hope that helps! - Ϫ 11:56, 19 February 2014 (UTC)
- Thanks, this is exactly what I was looking for! Kerry (talk) 12:12, 19 February 2014 (UTC)
Repair vs. archive
I've been doing some link fixing, mostly to news articles – hopefully some of it even correctly. If I come across a link that's dead and I can find, both, a replacement live link to the same article and an archive copy of the same, based on the original location, should I have a preference towards a particular one of the following, or is policy / established consensus that I can pick whichever I prefer?
- Leave the dead url, and add the archiveurl and archivedate
- Update url to the working one
I tend to use option 2, always archive the new page and usually add the archiveurl, archivedate and deadurl=no, but I guess those are all optional.
If I do option 2, should I update the accessdate to today or leave it set to the original date? —Otus scops (talk) 17:53, 24 April 2014 (UTC)
- Personally, I'd go with option 1, because it's much less work for me, and is almost as valuable. However, I think option 2 is better, and say go for it; you could add "(updated)" after the updated URL to help anyone who was confused and failed to look at the edit history. --{{U|Elvey}} (t•c) 19:19, 5 June 2014 (UTC)
- I usually consider the
|accessdate=
as the date on which the article text was checked against the reference to verify that what the article says is supported by the reference. If you are not re-verify the article text, I would add the archiveurl, but not the new URL. I would probably put the new, updated url in as a wiki-comment to aid the next person that comes through. The purpose of any reference is to support the article text. For the next person that comes through to verify, the best situation is to be able to see the reference as close to how it existed when last verified. - If you are verifying the article text against the reference, I would do option 2 including the preemptive archiving you mention. Given that creating the preemptive archive is a one click action, I might initiate archiving at both archive.org and WebCite. Obviously, you only include one archive in the citation, but having the second exist doesn't hurt and may be desirable at some point in the future. — Makyen (talk) 20:06, 5 June 2014 (UTC)
- @Elvey and Makyen:Thank you both. I'll tend towards 1 when I don't reverify and 2 when I do, then. I suppose that I should probably have another go at getting the WebCite bookmarklet to work properly...—Otus scops (talk) 21:49, 5 June 2014 (UTC)
- I usually consider the
- When I fix a URL matter for a reference, I usually choose option 1. Sometimes there is no archive URL, and it's simply a matter of locating the updated URL (because the URL was moved instead of archived). Flyer22 (talk) 21:58, 5 June 2014 (UTC)
Memento MediaWiki extention
Does wikipedia.org have it installed? https://www.mediawiki.org/wiki/Extension:Memento — Preceding unsigned comment added by 89.47.81.188 (talk) 17:08, 25 June 2014 (UTC)
All those references are dead. --93.216.69.148 (talk) 15:02, 10 July 2014 (UTC)
Wikipedia:Archive.is RFC 3 republicizing
This constitutes broader publicizing of the above Request for Comment.--{{U|Elvey}} (t•c) 19:59, 11 September 2014 (UTC)
sportsillustrated.cnn.com moved to www.si.com
- Old link: http://sportsillustrated.cnn.com/vault/article/magazine/MAG1152491/4/index.htm
- New link: http://www.si.com/vault/1996/12/16/220460/someone-to-lean-on-in-upstate-south-carolina-a-man-who-was-left-out-on-the-margins-has-found-a-family-and-a-purpose-in-a-high-school-football-team
In enwiki are ~ 16600 external links of this domain, in dewiki are about 500 links. Is there somebody, who is able to make a list of old-link to new-link? Boshomi (talk) 09:31, 26 October 2014 (UTC)
Boshomi, how would one know what the new link is? There's nothing in the old links to provide guidance. Probably the thing is to link to Internet Archive ie. https://web.archive.org/web/20130125052017/http://sportsillustrated.cnn.com/vault/article/magazine/MAG1152491/4/index.htm -- GreenC 18:16, 21 June 2015 (UTC)
- Meanwhile I fixed all the 500 Links in dewiki [LinkSearch ns=0 sportsillustrated.cnn.com]. I found the articles at http://www.si.com/ , sometimes I needed archive.org and archive.is. The archives can help to find the artikle at si.com. Boshomi (talk) 20:17, 22 June 2015 (UTC)
- Boshomi: Nice work. How do the archives help finding the si.com link? -- GreenC 21:34, 24 June 2015 (UTC)
- GreenC: Both web.archive.org/web/ and archive.is/ are very usefull.
- The first step was to seache in the webarchives. I have written a software for creating lists like de:Wikipedia:WikiProjekt_Weblinkwartung/Toter_Link/Liste_afp_google (List for afp.google.com I created today )
- The second step was to google the archived title like »"Sports Title" site:si.com« sometimes with date
- If nothing found at si.com I used web.archive.org/web/ or archive.is/ (web.archive.org/web/ is preferenced, but we use also archive.is but only with full URL, no short links, and we observe the use of this webarchive)
- I case of nothing found wether at si.com nor at the archives, I searched in other medias Boshomi (talk) 19:40, 25 June 2015 (UTC)
- GreenC: Both web.archive.org/web/ and archive.is/ are very usefull.
- Boshomi: Nice work. How do the archives help finding the si.com link? -- GreenC 21:34, 24 June 2015 (UTC)
- I see. Good idea to use Google to site:si.com based on title. -- GreenC 16:42, 26 June 2015 (UTC)
Links that have now become spam
On page "Rae Baker" there is a link to "burnettgrangercrowther.co.uk" which I assume once was the actress's agent's page but now appears to be controlled by a blatant spammer. I have marked the link as [dead link ]
but wonder whether the link should be removed altogether. Can anyone advise? — Preceding unsigned comment added by Toffeepot (talk • contribs) 16:20, 25 November 2014
- In general, I would remove such links. If the link was something like the website in the infobox or an external link, removal should occur. If it is to verify some fact, the situation is more murky because in principle the dead link could be reactivated, or may have moved to some other website. If the verified text is uncontroversial, it might be ok to remove the ref and hope someone will fix it eventually. We are supposed to search for the information and try to find if it is verified anywhere else as at WP:LINKROT. In this article, there is a problem because the verified assertion is rather weird. I would be inclined to search the article history to see how it was added, and remove most of the sentence if the editor's intention was unclear. Johnuniq (talk) 22:59, 25 November 2014 (UTC)
Found a Great Tool (Save Page to Wayback Machine Bookmarklet)
This tool makes saving pages incredibly faster and easier. All you have to do is click on the big blue "save to wayback machine" banner and drag it to your bookmarks (they must be visible, which on chrome would make them under the search bar). Now when you click on the bookmark it automatically saves the page to the wayback machine, instead of having to copy and paste single links manually into a bunch of wayback machine tabs. Never again shall there be a dead link lol. WikiOriginal-9 (talk) 20:54, 3 December 2014 (UTC)
Reverse linkrot
You might want to look at a discussion that I've started at Wikipedia talk:Moving a page#Breaking incoming links. Thanks, Bazonka (talk) 08:27, 11 December 2014 (UTC)
- There wasn't a lot of discussion, so I have been bold and added a section to this page. I think any further discussion should take here now. Bazonka (talk) 21:48, 16 December 2014 (UTC)
- Bazonka - Some websites have a "Permanent URL" link that never changes, though I have not thought it through how that would help at Wikipedia. -- GreenC 17:57, 21 June 2015 (UTC)
Userbox related to link rot
Is there a user box I can place on my user page which says something like "This user always adds archived URLs to citations"? Lugevas (talk) 21:23, 5 January 2015 (UTC)
- Alright folks, I answered my own question. There are a whole bunch here. Lugevas (talk) 21:30, 5 January 2015 (UTC)
This is an archive of past discussions on Wikipedia:Link rot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 |