Below are the findings from my recent disambiguation analysis. There were no real surprises, but a few conclusions did stand out for me.

Conclusions

edit
  • Between January 2007 and May 2009, the number of articles increased 64%; the number with "(disambiguation)" in their names increased 66%, suggesting the total amount of disambiguation needs has increased in line with the total number of articles.
  • United States-related terminology was common: "United States" the most common country to appear; and individual state names turned up more than the second country in the list, the United Kingdom. The trend, however, has been to remove state names from brackets-defined disambiguation; in January 2007, 7 different states were more common than "United States" and one state (Missouri) was more commonly found than all the 50 states put together today.
  • Music-related terminology was the most common theme: three of the top four terms were related. These accounted for just under a sixth of the whole population, and just under a fifth of terms used more than once (53,789).
  • The need to disambiguate between people was clear: a profession was a component of a sixth of the sample, though the profession-born-year notation was surprisingly uncommon, considering the number of times it causes controversy (only 1437 times in the entire population).
  • Since January 2007, the following other shifts in usage have occured:
    • movie has successfully been amalgamated into the term film;
    • both game and computer game have fallen out of usage, many instances becoming video game;
    • football player has become footballer;
    • every use of constituency (there were 565) now defines of what it is a constituency;
    • many use of single have been phased out in favour of other alternatives such as song, or the more specific EP;
    • MO, a contraction, has been deprecated;
    • television has been replaced by more specific alternatives;
    • the names of sports such as ice hockey and American football have been some of the biggest gainers.

Tabulated data

edit