Wikipedia:Wikipedia Signpost/2009-06-22/Vandalism
Study of vandalism survival times
- Loren Cobb (User:Aetheling) holds a Ph.D. in mathematical sociology and is a research professor in the Department of Mathematical and Statistical Sciences at the University of Colorado Denver.
This study has a narrow focus: to determine the distribution of the length of time that vandalism remains on the English-language Wikipedia. This distribution is also known as the survival function for vandalism. The two primary results from this study are: (a) the median time to correction is down to four minutes, and (b) some subtle forms of vandalism still persist for months and even years.
In the past there have been other statistical studies, both formal and informal, of how long vandalism remains in Wikipedia until it is corrected, but almost all of them express their results as a mean time to correction (i.e., as a simple arithmetic average of the observed times). I will show in this study that the distribution function for time to correction has such a fat tail that the mean time to correction is both mathematically and substantively meaningless. The median time to correction, on the other hand, conveys useful information.
Methods
A random sample of 100 articles from the English language edition of Wikipedia was obtained through the use of the random article link in navigation toolbar. For each article, the history log was used to examine each recorded change, starting from the most recent, going back until a clear instance of vandalism was found. Then the changes were scanned in reverse order, going forward until the vandalism was corrected.
For each such instance of vandalism, the elapsed time until correction was computed, in minutes. These are the fundamental data on which this report is based.
In addition, some notes were taken on the general nature of the vandalism. All data collection occurred on 2009-06-11.
Results
- Of the 100 articles, fully 75 had never been vandalized.
- Of the 25 articles that were vandalized at least once, the most recent such instance of vandalism was eventually corrected in 23 articles.
- In five (20%) of the vandalized articles, the most recent instance of vandalism was corrected in less than one minute. A further four instances were corrected in less than two minutes.
- The median time to correction was four minutes.
- Two articles were found to have suffered vandalism that was never corrected. One of these was a subtle act of vandalism that was committed on 2007-02-23, and still not detected by the date of the study, 2009-06-11.
Discussion
A histogram of times to correction is shown in the chart to the right. Note that the horizontal axis is depicted on a logarithmic scale, to accomodate its enormously long right-hand tail.
In this histogram there are evidently two separate processes at work. The bulk of the histogram follows a curve that declines as a power function of elapsed time: this is the process by which ordinary readers and editors of Wikipedia stumble across and correct instances of vandalism.
The first two bars on the left, however, are significantly higher than the curve would suggest. The difference between the actual height of the bars and the height predicted by the curve is accounted for by the independent activity of Wikipedia's Recent Change Patrol (RCP). Members of the RCP typically monitor the Recent Change Log for suspicious edits. The RCP is able to correct most blatant vandalism within seconds of occurrence.
Both of these vandalism-correction processes act in concert to produce a remarkable result: the median time to correction for vandalism in this study was found to be just four minutes. Similar (unpublished) studies performed by this author one and two years ago yielded median times to correction of five and six minutes, respectively. It seems apparent that Wikipedia is improving its already impressive rate of vandalism detection and correction.
Problems with Mean Time to Correction
The fact that the estimated curve for the survival function is exponential on a graph whose horizontal axis is logarithmic indicates that the probability density function itself follows a power law distribution, also known as a Pareto distribution, given by the formula
If the parameter in the above formula is less than one — as it is in this case — then the mean of the distribution is infinite. The practical significance of this unusual situation is that any sample mean calculated from empirical data conveys absolutely no information whatsoever about the typical length of time that it takes for an instance of vandalism to be corrected.
The only useful alternative to a sample mean in this situation is the sample median, which is fully robust with respect to long-tailed distributions.
Depending upon what assumptions are made concerning the rate of activity of the RCP, the parameter for the Pareto distribution lies in a range between about 0.25 and 0.40. This range is comfortably below one, indicating that the tail of the distribution is huge and that sample means are completely and utterly useless for describing the data.
Observations on types of vandalism
About 84% of the vandalism that I observed in this random sample seemed to be just adolescent fooling around. Of the 16% that appeared more adult, half seemed to be adult humor or anger, and half seemed to come from people whose intent was to leave a permanent but nearly invisible mark upon Wikipedia. For example, the perpetrator will carefully change the spelling of an obscure name to an incorrect form, or change a location to something that still looks plausible at first glance. I imagine them coming back over and over again to the page that they altered, to see if that subtle little change is still there. Perhaps this impulse is roughly the same as the one which causes people to carve their initials into trees, or to scratch them on rocks.
Conclusions
The fact that 50% of all vandalism is being detected and reverted within an estimated four minutes of appearance should go a long way to allay fears about the susceptibility of English-language Wikipedia articles to malicious vandalism. On the other hand, the fact that an estimated 10% of all vandalism endures for months and even years indicates that some new tools and strategies are needed for rooting out the most subtle and persistent forms of vandalism.
Raw data
The elapsed times (in minutes) to correction for the instances of vandalism found in this study were as follows: { 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 4, 5, 8, 9, 19, 73, 213, 490, 672, 2442, 14176, 152996 }. In addition, two cases of vandalism had never been corrected (until discovered by the author).
Discuss this story
I appreciate any and all commentary and criticism of this study — use the Discussion page for this. If you edit this report, please do so with extreme care. Aetheling (talk) 05:35, 15 June 2009 (UTC).[reply]
Sample size
Have you thought about using a larger sample? You'll admit, a sample size of only 100 has some pretty big error bars on it. I know it's tedious to do more, but hey ... that's what grad students are for :-P
Also, I'd like to see some way to take into account the "importance" of a page. You could use monthly page view numbers as a rough proxy. My guess is that vandalism reversion time and the popularity of an article are highly correlated. So while 4 minutes might be the median across all articles, the median across articles that people are actually reading (let's admit, most articles barely get read at all) might be smaller still. --Cyde Weys 03:09, 23 June 2009 (UTC)[reply]
Some thoughts on tools
Thanks for the study -- it was quite interesting to read. My first thought was, did you correct the two instances of vandalism that had not been corrected before? If not, tell me what they were, and I'll do it. JesseW, the juggling janitor 19:05, 16 June 2009 (UTC)
Regarding what tools might be helpful -- better history analysis tools would seem to be of considerable help. For quite a while now, I've wanted to take the time to craft a number of such tools: one to show all the text that has been added to an article over a given time-frame (even if it was removed within the time period); one to highlight the age of text; one to highlight text that has not stayed unaltered during a given time frame; etc. I think such tools would go a long way to rooting out vandalism that got lost in the history. The remaining problem would be intentionally subtle vandalism incorporate within otherwise correct changes, or subtle factual lies or bias, which is even harder to handle. Your thoughts would be certainly appreciated. JesseW, the juggling janitor 19:05, 16 June 2009 (UTC)
uncorrected vandalism
I have to ask: What were the two articles with uncorrected vandalism? Kaldari (talk) 00:49, 17 June 2009 (UTC)[reply]
Hmm...
I think this examination is a very good start, but the small size of the sample, combined with the use of such expressions as "50% of all vandalism is being detected" and "10% of all vandalism endures for months" (emphasis mine) makes me very uncomfortable. While those were the percentages that turned up in your (rather small) sample, it's a bit over-reaching to assert that 100 samples are absolutely representative of "all vandalism". – ClockworkSoul 05:23, 23 June 2009 (UTC)[reply]
Query
This is excellent work. I’ve been sceptical about the usual reassuring statements about vandalism reversion for a long time, having come across many instances of ancient vandalism persisting, even in rather high-traffic articles. This neatly describes what is going on. A question: is it possible to estimate from these results what percentage of articles are currently vandalised? (I realise that 2% of the sample was in this state, but I am not clear what, with any confidence, can be drawn from that.) Ian Spackman (talk) 06:30, 23 June 2009 (UTC)[reply]
It is probably impossible to prevent all vandalism - whether creative/humorous or destructive - and some examples will overlap with truthiness, POV-ism and/or genuine misunderstanding. Even if there was a drive to ensure that "every last article as of 1 January 2010 is free of error, vandalism, POV and other problems" a few examples will survive - and there will be a fresh crop of such things emerging.
I would guess that eg (present Pope, Prime Minister, Monarch, President, Sports Champion etc) will be more subject to vandalism than the equivalents from 100/200/500 years/other date ago. - — Preceding unsigned comment added by 83.104.132.41 (talk • contribs)
Possible bias
Thanks for this interesting study. I do agree with your general conclusions. However I think that the results may be biased (in the technical sense) due to the sample design, and in particular your investigation of only the most recent instance of vandalism from each article. This means that instances of vandalism in heavily vandalised articles would be given equal weight in the results to instances in less vandalised articles, and thus each instance of vandalism in a heavily vandalised article is less likely to enter your sample than instances in less vandalised articles. If vandalism to heavily vandalised articles is corrected more quickly and thoroughly, e.g. because people expect it and watch for it more closely, then your measures of time to correction would tend to be overstated.
It might be possible to correct for this effect, e.g. by weighting based on some measure of vandalism rate. However there are other potential biases lurking here too, e.g. due to some articles being older than others. Adjusting for everything may be difficult. Another option would be to think through any assumptions you are making, and hedge the results accordingly. -- Avenue (talk) 08:38, 24 June 2009 (UTC)[reply]
Some topics are vandal magnets, while "in the news topics" (in the broadest sense) are likely to suffer much vandalism, "errors arising from overlapping editing" and other sources of error, which will drop significantly after the event passes into history (eg articles on George W Bush and Tony Blair are likely to show this phenomenon). And "vandalism and errors" in articles on obscure topic are likely to remain undetected for some time. Could "someone statistical" be brought in to determine suitable bases for "low", "medium" and "high" activity articles? (A more technical analysis would involve comparing articles across the various languages in which they appear - to see the way in which particular controversy "travels." —Preceding unsigned comment added by 83.104.132.41 (talk)
Wikiproject
Ideally, I had set up the Wikipedia:WikiProject Vandalism studies to do just the sort of study that is mentioned here. Hopefully, with this study there may be more interest in getting that project going again. Remember (talk) 16:52, 24 June 2009 (UTC)[reply]
Methology
I see some problems with matching the conclusions to the results. We (or you) state that "50% of all vandalism is being detected and reverted within an estimated four minutes." I'm not sure you studied that. I think what you found was that 50% of previously vandalized articles had their most recent vandalism reverted within an estimated four minutes. I think there are two effects you are ignoring.
1. You're not taking a good sample of "vandalism." "Vandalisms" are edits, and so your sample should select randomly from edits. Instead, you sample randomly from articles. This substantially overweights Ted Chabasinski which is one article, but has 6 edits, and substantially underweights George W. Bush which has more edits, and thus more vandalism, but if both of those two articles were the entire sample, you would say that 50% of articles have never been vandalized and 50% of articles have their vandalizm reverted in seconds for a median of "in seconds." In fact, what you should have done was take a random sampling of edits, determined which of those edits was vandalism, and determined the reversion time on those edits. You can select a random edit from the database in multiple ways - I'm certain the more technically literate can help you figure out the most random way.
2. You're ignoring the "still exists" vandalism that was covered by more recent vandalism. Imagine an article was vandalized in a very subtle and damaging way a year ago (say, alledging that the person was involved in the assassination of JFK). Then, imagine that someone, 1 minute ago, wrote "PENIS PENIS PENIS PENIS" over the header, which was instantly reverted. Your study would show that the vandalism TTL on this article was instantly reverted, when, in actuality, the TTL in a sample of all-vandalism would show 1 TTL of instant, and 1 TTL of never reverted.
These two effects would seem, to me, to pull in different directions. My expectation is that if you gave a distribution, you would show a median TTL that was too long (4 minutes too large), but with a distribution that was far too normal (IE - vandalism has a fatter tail than even you discuss, consisting of subtle, damaging vandalism designed to disparage people the vandal does not like). This is also being discussed offsite, at [1] but one should have a very thick skin and be able to deal with all comers if they engage at that location. Hipocrite (talk) 16:41, 25 June 2009 (UTC)[reply]
Never?
I have to say, I like this study and what it sets out to do. We can learn from this, then repeat it, with bigger samples to see if we have improved.
I quibble with your use of the word "never" - why not just state the time between the vandalism and the time you found it? It could have been only 1 day for all we know. I can't really see what "never" could mean in this context. Stevage 01:13, 26 June 2009 (UTC)[reply]
Studies
Would a "compare and contrast" of "rearrangements and vandalism" to Michael Jackson, Farrah Fawcett and AN Other Minor Notable be useful?
Guestimating the likelihoodness of non-constructive rearrangements for the three persons.
Mistake?