Wikipedia talk:Statistics/Archive 1
This is an archive of past discussions about Wikipedia:Statistics. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
Discussion of article size from 2002
OK, somebody has to say this: The fact that we are patting ourselves on the back for intentionally undercounting our articles is just plain silly. I just now went and looked under "short pages" at all 28 pages with exactly 100 bytes, and 13 of them contained a comma. Not a single one of them deserves to be called an article, but almost half are counted. Next I looked at all 33 pages with exactly 200 bytes, and 27 of those contained a comma. A few of them (not eighty percent!) might be considered articles under an extremely lenient definition of article, but does anyone outside of Wikipedia consider a single, brief paragraph to be an article? Are ANY of Brittanica's articles under 500 bytes?
I estimate our median article size as 1000 bytes, because that's the size of our 18943rd longest page according to long pages. (18943 would be the median of 37886 total articles.) To my mind, a conservative count of articles would place an 1000-byte minimum, rather than a 1000-byte median, which would trim our total article count in half. But no matter how we count articles, let us at least prominently post the median size of the articles which are included in the count. And please, please don't call the count "unimpeachable". (For refernence, my little tirade (including this sentence) is 1367 bytes long, i.e. rather longer than our median article.)
--Fritzlein 02:55 Aug 17, 2002 (PDT)
I agree.
I'm sure that I found information about the total database size of Wikipedia recently, but I can't find it again. Could this be made available again, please.
It would be good to have the info on this page, or a link on this page to where it can be found.
If I get an answer - this time I'll try to keep the info! David Martland 15:20 Dec 13, 2002 (UTC)
Don't know if this is what you want, but:
$ ls -l total 2801904 -rw-rw---- 1 mysql mysql 8852 Aug 9 19:27 archive.frm -rw-rw---- 1 mysql mysql 21077440 Dec 13 17:19 archive.MYD -rw-rw---- 1 mysql mysql 1024 Dec 13 17:19 archive.MYI -rw-rw---- 1 mysql mysql 8586 Jul 20 19:30 brokenlinks.frm -rw-rw---- 1 mysql mysql 7599784 Dec 13 17:58 brokenlinks.MYD -rw-rw---- 1 mysql mysql 6398976 Dec 13 17:58 brokenlinks.MYI -rw-rw---- 1 mysql mysql 9114 Nov 22 08:21 cur.frm -rw-rw---- 1 mysql mysql 451396440 Dec 13 17:59 cur.MYD -rw-rw---- 1 mysql mysql 201449472 Dec 13 17:59 cur.MYI -rw-rw---- 1 mysql mysql 8756 Jul 20 19:30 image.frm -rw-rw---- 1 mysql mysql 8586 Jul 20 19:30 imagelinks.frm -rw-rw---- 1 mysql mysql 192136 Dec 13 17:51 imagelinks.MYD -rw-rw---- 1 mysql mysql 175104 Dec 13 17:51 imagelinks.MYI -rw-rw---- 1 mysql mysql 457448 Dec 13 15:53 image.MYD -rw-rw---- 1 mysql mysql 215040 Dec 13 15:53 image.MYI -rw-rw---- 1 mysql mysql 8706 Jul 20 19:30 ipblocks.frm -rw-rw---- 1 mysql mysql 7300 Dec 12 15:23 ipblocks.MYD -rw-rw---- 1 mysql mysql 3072 Dec 12 15:23 ipblocks.MYI -rw-rw---- 1 mysql mysql 8582 Jul 20 19:30 links.frm -rw-rw---- 1 mysql mysql 41151856 Dec 13 17:59 links.MYD -rw-rw---- 1 mysql mysql 26686464 Dec 13 17:59 links.MYI -rw-rw---- 1 mysql mysql 8898 Nov 22 08:43 old.frm -rw-rw---- 1 mysql mysql 8790 Jul 20 19:30 oldimage.frm -rw-rw---- 1 mysql mysql 54436 Dec 13 09:51 oldimage.MYD -rw-rw---- 1 mysql mysql 11264 Dec 13 09:51 oldimage.MYI -rw-rw---- 1 mysql mysql 2082194432 Dec 13 17:59 old.MYD -rw-rw---- 1 mysql mysql 19528704 Dec 13 17:59 old.MYI -rw-rw---- 1 mysql mysql 8598 Jul 20 18:49 random.frm -rw-rw---- 1 mysql mysql 85000 Dec 13 04:47 random.MYD -rw-rw---- 1 mysql mysql 1024 Dec 13 04:47 random.MYI -rw-rw---- 1 mysql mysql 8964 Oct 28 08:06 recentchanges.frm -rw-rw---- 1 mysql mysql 3014196 Dec 13 17:59 recentchanges.MYD -rw-rw---- 1 mysql mysql 1556480 Dec 13 17:59 recentchanges.MYI -rw-rw---- 1 mysql mysql 8700 Jul 20 18:49 site_stats.frm -rw-rw---- 1 mysql mysql 29 Dec 13 17:59 site_stats.MYD -rw-rw---- 1 mysql mysql 2048 Dec 13 17:59 site_stats.MYI -rw-rw---- 1 mysql mysql 8874 Aug 24 02:54 user.frm -rw-rw---- 1 mysql mysql 1838736 Dec 13 17:48 user.MYD -rw-rw---- 1 mysql mysql 174080 Dec 13 17:48 user.MYI -rw-rw---- 1 mysql mysql 8590 Nov 27 15:15 watchlist.frm -rw-rw---- 1 mysql mysql 291006 Dec 13 17:37 watchlist.MYD -rw-rw---- 1 mysql mysql 471040 Dec 13 17:37 watchlist.MYI
I'm sure I've seen some info regarding the total size of Wikipedia (in bytes, or Mbytes) and the availability of downloading the whole database.
I'd be glad if someone could point me in the direction of that info or the relevant page again. I'll save the info this time!
It'd be good to have that info on this page - or a link - too. David Martland 15:18 Dec 13, 2002 (UTC)
There really needs to be a more conservative total article count that makes a distinction between encyclopedia articles and almanac articles and that excludes more certain pages. What particularly troubles me is that there are now thousands of year almanac pages and that most of them can't be considered to even be almanac articles because they are just templates.
I therefore propose the following (in addition to the current criteria); 1) any page linked to centuries should be excluded from the total article count and should be given its own line in special:statistics at least until most of these pages become almanac articles (the vast majority are either templates or templates with one or two entries). 2) any page with a link to Wikipedia:Disambiguation be excluded from the count. 3) any page that is less than 500 bytes be excluded from the count (E. coli is 610 bytes). and 4) there should be three "total article counts" for everything not excluded by the above; one for anything with the string, list, chart, timeline or table in their titles (these would be "almanac-like" articles), one for everything left over (these would be "encyclopedia articles") and one grand total count that would still be the number displayed on the Main Page.
Our current count is exaggerating the true number of articles we have and is harming the project as a result. We need to be honest with our article counts and very conservative -- otherwise we will loose credit with passers-by who are at first impressed by our article count but then find out that it is bloated. --mav 13:36 Aug 28, 2002 (PDT)
- I think that sounds pretty reasonable. --Brion 13:47 Aug 28, 2002 (PDT)
As long the criteria you select are easily computable, I'm happy to make whatever change in the software is necessary to reflect a better count. I also don't think anyone is making any claims about the accuracy of the count--the statistics page itself is careful to point out that these are just estimates. But I agree, a more conservative estimate is entirely warranted. --LDC
- Great! While you are at it a link to Wikipedia:What is an article under the first occurance of that word on the special stats page would be nice. --mav
- I never thought I'd say this, but I'd like something to be more conservative. :-) (Just the article count, not any of Bush's cabinet). --KQ
- From a random sampling of pages, I would say that something like a third of our pages would truly count as useful articles in the eyes of a new user (that agrees with earlier observations from Kajakit on the mailing list). That would mean we have something like 10,000-15,000 'useful articles' in the database. We could proxy that by counting, say, articles over 1500 characters long. At the time of writing, that would give 14,148 'useful articles' compared to a headline number on the main page of 39,654.
- The 1500 character threshold has the advantage of being long enough to cover most of the non-articles according to the criteria suggested by mav (century pages, disambiguation pages etc) automatically.
- I would not like to see the headline count on the main page reduced - I think that would be confusing for new users and perhaps a bit demotivating for the rest of us. We could consider changing the main page wording to something like:
- ... We started in January 2001 and are already working on 6,912,651 articles, with more being added and improved all the time. We want to make over 100,000 complete articles, so let's get to work! Anyone, including you, can edit any article ....
- That would let us keep the headline count without creating the suggestion that they are all finished, polished articles. We could keep a running total of '1500 character articles', and perhaps other sizes too, on the statistics pages for those that are interested.
- Enchanter 17:19 Aug 28, 2002 (PDT)
I don't think 1500 characters is a particularly useful number -- there are many subjects for which 500 characters would suffice as a minimum article size (although something in the range of 500 - 1000 characters wouldn't bother me too much). Could you maybe run some quick numbers to see how much of a reduction would occur if my proposal were to be enacted (this could be done easily if there were an page count in "what links here")? I was thinking about a reduction of 5 - 7 (maybe 10) thousand. Even if it is more than that I don't think that a temporary reduction in the total article count would hurt. We've already been through one round of this back when we upgraded to phase II and it didn't hurt anybody's moral that I know of. All that we have to do is write-up an announcement that we are enacting a far more conservative definition of what we consider to be an article as far as automatic detection goes. --mav
- Mav - heres some figures broadly following your proposal above:
Total articles 45179 (including without comma) <500 -14314 -Disambiguation -289 -Year in review -1292 (pages with numbers as title) -'list' in title -386 -'century' in title -60 -'timeline' in title-59 -'table' in title -55 TOTAL 28903
- The main message here is that what really drives the numbers is the threshold for article size that you use. The other exclusions make a relatively small difference (although I'm sure there are in fact many more 'list like' non articles that aren't picked up by these criteria).
I also don't think we could ever be able to teach a computer how to dertermine just what is, or is not, a 'useful' article. --mav
- I agree. That's why I think the best process is to:
- Decide the proportion of pages we want to count, by randomly sampling recent changes and making subjective decisions.
- Choosing an article size that gives broadly the number we want.
- That's how I came up with the threshold of 1500. I absolutely agree that some articles that are shorter than 1500 are worthwhile, but these are offset by the longer articles that are not much use (according to my relatively strict subjective definition of an article).
- The impression I get that the average quality of Wikipedia has been fairly constant. That is, the tendency for the average quality to rise as articles are improved and the tendency for average quality to fall as new stubs are added broadly cancel out. If so, then picking an article size threshold should give a reasonably stable indicator of articles up to a certain quality.
- Enchanter 01:39 Aug 29, 2002 (PDT)
- Thanks for doing the numbers -- this should give the developers plenty to chew on. I Think a figure of 28,000 is about right. --mav
- I like the format of this count. I'd love to see it replicated (i.e. automatically generated) on the statistics page. I agree that the size threshold is the most important decision we have to make, although excluding the other types of articles should also be done if it doesn't bog down the server to cut them out on the fly. Just to make things more confusing, I vote for a minimum of 1000 bytes, which cuts our count roughly in half. However, I would support either 500 or 1500 in preference to what we have now.
- I don't care a great deal what number we use for the headline count, as long as the more detailed statistics are only click deeper. --Karl Juhnke
Would it be possible to now and again run a query (perhaps from crontab) that showed how many "article" pages we have with c characters, c<1000, how many 1000<=c<2000,2000<=c<3000, et cetera? DanKeshet
- At the moment:
=0: 2 <16: 3 ( 1-15: 1) <31: 21 ( 16-30: 18) <63: 111 ( 31-62: 90) <125: 1222 ( 63-124: 1111) <250: 4646 ( 125-249: 3424) <500: 14138 ( 250-499: 9492) <1000: 25474 ( 500-999: 11336) <2000: 34739 (1000-1999: 9265) <4000: 40849 (2000-3999: 6110) total: 45172 (4000+: 4323)
- The queries are on the form of SELECT COUNT(*) FROM cur WHERE LENGTH(cur_text)<500 AND cur_is_redirect=0 AND cur_namespace=0 --Brion 20:38 Aug 28, 2002 (PDT)
- Thanks, Brion! I think it's pretty interesting how it works out. DanKeshet
Can we have some figures for mean article word/character counts (ignoring markup and HTML), please? This would enable better comparisons with existing encyclopedias: see the article text for comparisons.
character count... bah. thats unreliable too. Can i suggest a simple, effective, and _working_ solution? yeah thats what i thought. Since wikipedia is usermoderated, why not add the option for registered users, or maybe unregistered too to vote on how usefull they found the article, including a reason why. think slash dot (-7 too short), +5 well written, etc... it wouldn't be hard to implement, and i think it would be good, and then you could count the "real" articles based on their user approval. Of course this affects articles which are voted really low, then majourly updated to help this that still have a low score. thats why i think ratings should be cleared every time there is a majour update (eg. not minor + sometype of change comparison with the .diff file) I think. ideas anyone? I think this is pretty good. and i'd be willing to implement it, if people like the idea, i dont know how long it'd take me, cause im not familiar with the codebase, but im very profficient in php and db work as well as other programming languages. i am already registered in SF too... so if nayone likes this idea, leave a comment here, mny talk page, or [e-mail me].
Lightning, Sept 29 3:17
Alexa
move to wikipedia talk:statistics
According to a recent Wikipedia:Announcement Wikipedia is as popular as Slashdot. I was quite surprised! Is it really true? Anyone know how Alexa measures popularity? I see they offer a toolbar to download... do they extrapolate data from toolbar downloaders? Are Wikipedians more likely to have a toolbar than other users? Alexa Website Pete 12:05, 4 Sep 2003 (UTC)
- Yes, everyone who have the Alexa toolbar installed effectively send the URL currently watched to the Alexa server, thus allowing them to monitor which sites are visited, and how often. How much valid these data are can of course be debated - those who worry about privacy will probably not install it for sure. But in the range of 1000th popular site I doubt that a few very active Wikipedians with toolbar can make that much change anymore, around the 100.000th it makes much more impact. andy 12:21, 4 Sep 2003 (UTC)
- Thanks for the info guys. I wonder if the nature of Wikipedia, where each edit means two page views (or more if you preview!), has an inflationary effect on our figures. I am pretty sure if we got another slashdotting we would still have to batten down the hatches pretty hard because of weight of numbers. And Tannin, just to check.. did you mean Alexa is activiated with every installation of the Windows OS?? That's a lot of data! Pete 14:49, 4 Sep 2003 (UTC)
- Alexa does separate between page views (e.g. the numerous views in an edit process) and number of viewers (independent IP addresses) - and then adds both together in a magic formula to get the actual rank. But don't forget that a big percentage of viewers will not edit, but just view. andy 14:53, 4 Sep 2003 (UTC)
- Pete: this page (which I found more or less at random on Google) has quite a bit of detail. Someone should write this up for the 'pedia. I see (from another page) that here is a class action against Alexa pending. As spyware goes, there are worse ones. But just the same, I don't like people messing with my computer without my knowledge, and (I understand) neither does the law in most countries. I think Alexa is installed as part of Internet Explorer, rather than as part of Windows - not that that distinction makes much of a difference these days. Tannin
- Re Pete, "nature of Wikipedia" - most users of Wikipedia never edit an article... Martin 19:22, 4 Sep 2003 (UTC)
- Wow I guess I had always just assumed that we were all writers and no writers... but Wikipedia:Statistics informs me that there are 40 page views per edit... This thread has certainly reduced my Doubting Thomas stance. Pete 23:28, 4 Sep 2003 (UTC)
- It should also be noted that editing a page is a different thing from reading one. Thus it is fair to count it twice. --mav
Erik's statistics pages are just what the doctor ordered for the geeky, stats-inclined people such as myself. Another popular page is Wikipedia:Wikipedians by number of edits. However it is rarely updated as the SQL script required to create the page is apparently fraught with difficulties. Looking at some of the stats on Erik's page (e.g. recently active contributors) it might be possible to provide the data currently at Wikipedia:Wikipedians by number of edits using Erik's code, making for much more frequent and hassle-free updates. Anyone else think this idea is worth doing? I would email Erik myself, but won't do so just in case he has been bombarded with similar requests since setting up the stats pages. Pete 10:52, 16 Oct 2003 (UTC)
Wikipedia in March 2004: a month in stats
I've been perusing the en.wikipedia stats for the last month's trends. Here's the headline figures:
- 115,080,901 hits
- 952,395,093 Kb transferred
- Daily average: 3,712,287 hits/day
10 most popular items
excluding Main Page, Current Events, Special pages, admin pages
- 100px-Beowulf.firstpage.jpeg - does anyone know why??
- I think that was a conflation of a URL and an image link; I've corrected it, but the actual image page is Image:Beowulf.firstpage.jpeg. Marnanel 17:04, Apr 1, 2004 (UTC)
- My suspicion is that somebody has been leeching this - including it inline in something other than a Wikipedia article (a manuscript as a forum avatar?). A hunt through the logs for the referer on requests for it would soon confirm that, and tell us who the culprit is. Either that, or its a pretty weird bug in the log analyser. - IMSoP 12:06, 2 Apr 2004 (UTC) (oh dear, must resist the urge to tidy up leeching and disambig avatar properly: too much work to do...)
- In fact, looking at the most popular referer stats, I'd say it was someone on this messageboard here - IMSoP 12:13, 2 Apr 2004 (UTC)
- Hmm. That site really doesn't have anything to do with Beowulf. I'm guessing few if any of the people featured are spear-danes (although I believe I did see Grendel's Mother) -- Finlay McWalter | Talk 20:38, 2 Apr 2004 (UTC)
- Um, I'm not sure if you're joking or not, but given that I haven't time to put any decent info on avatar, I'll explain briefly for anyone who is confused. People will put any image that they think looks cool into their preferences for a messageboard, just to make them stand out from the crowd. I notice one member there has a (badly squashed) image of a bank-note, for instance. You'll note that the image in question is a thumbnail, not the original - perfect size for such a use. This kind of leeching can actually be a real nuisance for smaller websites, because of the huge amount of bandwidth it eats - a friend of mine almost had to pay his host for excess use because someone liked his b3ta submission, but didn't even scale it down! It's perhaps not such a big deal for Wikipedia, but if its still happening, it might be worth tracking down the user responsible (through, as I say, the referer logs) and politely asking them to host the image themselves.
- If, on the other hand, you were making a subtle comment about the somwhat adult content of that messageboard, I apologise - I meant to warn readers when I realised, but became ensnared in other matters. - IMSoP 22:10, 2 Apr 2004 (UTC)
- Yes, I was trying to be a smart-alec, but only those who've followed the link (which hopefully is no-one) will get it. I have to go wash my eyeballs out now... -- Finlay McWalter | Talk 22:24, 2 Apr 2004 (UTC)
- Oh, come on, it's not that bad - it's not like it's some kind of goatse fan forum or something (if you don't know, you don't want to, trust me). In fact I glanced at their FAQ or whatever, and they seemed to have pretty decent rules, considering. - IMSoP 22:29, 2 Apr 2004 (UTC)
- Seriously, that's pretty damn tame. I was expecting a whole lot worse ;) →Raul654 22:33, Apr 2, 2004 (UTC)
- Oh, come on, it's not that bad - it's not like it's some kind of goatse fan forum or something (if you don't know, you don't want to, trust me). In fact I glanced at their FAQ or whatever, and they seemed to have pretty decent rules, considering. - IMSoP 22:29, 2 Apr 2004 (UTC)
- Yes, I was trying to be a smart-alec, but only those who've followed the link (which hopefully is no-one) will get it. I have to go wash my eyeballs out now... -- Finlay McWalter | Talk 22:24, 2 Apr 2004 (UTC)
- Hmm. That site really doesn't have anything to do with Beowulf. I'm guessing few if any of the people featured are spear-danes (although I believe I did see Grendel's Mother) -- Finlay McWalter | Talk 20:38, 2 Apr 2004 (UTC)
- In fact, looking at the most popular referer stats, I'd say it was someone on this messageboard here - IMSoP 12:13, 2 Apr 2004 (UTC)
- Seven dirty words
- United States
- World War II
- Goatse.cx
- March 11, 2004 Madrid attacks
- List of sex positions
- Wiki
- Sheikh Ahmed Yassin
- Mathematics
10 most popular search terms
- wikipedia
- wiki
- the answer to life the universe and everything
- encyclopedia
- penthouse
- saddam hussein
- ahmed yassin
- sheikh ahmed yassin
- sexual intercourse
- free encyclopedia
More at http://en.wiki.x.io/stats/usage_200403.html .
- I'd just like to give the obligatory plug for the autoupdating web links I wrote:
- [http://wikimedia.org/stats/en.wiki.x.io/url_{{CURRENTYEAR}}{{CURRENTMONTH}}.html Current month's hits]
- [http://wikimedia.org/stats/en.wiki.x.io/usage_{{CURRENTYEAR}}{{CURRENTMONTH}}.html Current month's webalizer]
- [http://mail.wikimedia.org/pipermail/wikien-l/{{CURRENTYEAR}}-{{CURRENTMONTHNAME}}/date.html Autoupdating link to the mailing list]
- →Raul654 16:21, Apr 1, 2004 (UTC)
- It would be really good if we could change the header of each page so that it says $PAGENAME - Wikipedia, the free encyclopedia - BROWSER SPECIFIC TAG at the top toolbar instead of just $PAGENAME - Wikipedia - BROWSER SPECIFIC TAG... it would be nice not to have such a low google rank for "encyclopedia" - and this might help a notch. Pete/Pcb21 (talk) 16:27, 1 Apr 2004 (UTC)
- I don't think this would be an improvement, i expect the opposite. Using just the title as the html title (without wikipedia) should increase our relevance for the searches mathching the title. No need to get a higher ranking for the search phrase 'wikipedia', there's nothing better than #1. Including 'Encyclopedia' in the title of the main page and/or in default keywords in the header of each page could help to improve the ranking for that search term though. A small skin hack could do this. -- Gabriel Wicke 13:33, 2 Apr 2004 (UTC)
- Your more refined approach sounds good. A specialized hack for the main page sounds like really good because "Main Page - Wikipedia" is awful. Pete/Pcb21 (talk) 13:52, 2 Apr 2004 (UTC)
When will Wikipedia reach??
- 300,000 articles??
- 400,000 articles??
- 500,000 articles??
- 600,000 articles??
- 700,000 articles??
- 800,000 articles??
- 900,000 articles??
A million articles??
66.245.104.154 02:09, 10 Apr 2004 (UTC)
I thought the statistics that allowed us to look at the number of hits to each page were really cool, however, there's one problem. That file is quite long, many many megabytes.
I know it would be a bit of work, but a really cool potential addition to Wikipedia would be something which allows us to request the number of hits to a given site. For example, we could input "downsizing" and it would tell us that there were 103 hits to that site in the month of March 2004, for example.
Also cool would be a feature which include the rank of each site, that would say that the site was, for example, the 46381st most-visited site during that month, out of 221682 (another made-up number). Mike Church 07:12, 18 Apr 2004 (UTC)
How many "Real" articles in the English Wikipedia
I see the number of articles every time I come to the English Wikipedia. It is now approaching 300,000 articles.
My question is: How many of these are actually real encyclopedia articles?
If I do a random page, is it actually random? If I do a hundred or a thousand random pages and keep track of how many are just summaries of census information for US geographical places, how many are detailed descriptions of some character in a video game, how many "really" belong on E! online (music and movies), how many have really no information on them (stubs), and how many are actually "real" encyclopedia articles, would that be a good estimate?
I know I am being a little snooty here, and I know that what I am thinking about is not the only true goal of the Wikipedia project, but I have just been thinking about how to evaluate this stuff from the point of view of someone who is using Wikipedia instead of another encyclopedia such as Encarta or Encyclopedia Brittanica.
There was some discussion of the true article count a couple of years ago on this talk page, and if the discussion is somewhere else, please just point me to it.
Thanks. nroose Talk 17:29, 10 Jun 2004 (UTC)
- The most recent survey in this area that I am aware of is at m:English Wikipedia Quality Survey conducted by Adam Carr in October 2003. According to that data, and if you have a reasonably exacting standard of what consitutes a "real" article, at least 20% and probably as much as 30% of articles are "real". Thus 60,000 seems a reasonable ball-park figure for number of real articles. I am sure a lot of people would be interested in an updated and expanded (1000 instead of 200 articles?) survey but these things take time. Pcb21| Pete 18:54, 10 Jun 2004 (UTC)
- Well, I don't have time to do in-depth analysis. I am really just curious about what a good estimate of the number of articles I would consider to be real articles. I am not saying that other articles should not be in Wikipedia. Actually I think it is great that Wikipedia has a broader range of stuff than other encyclopedias. But, since I was curious, I wrote an HTML/Javascript page (http://home.earthlink.net/~eroose/wikisurvey.htm - it resizes the page, so you probably don't want it to come up in this browser window) to make it easier to do a survey. Just click on real or not real for each page that comes up, and it keeps track of how many of each and immediately shows you the next random page in a different window. It does not send any information back to the server. It works OK in IE, but I have not tried it in other browsers. The numbers I got by doing 100 pages was that 65% of them were "Real" to me. I'm not very picky about length or completeness. Perhaps it was too few to really provide good stats. nroose Talk 06:37, 14 Jun 2004 (UTC)
- I did a similar exercise a couple of times, once a couple of years ago and once more recently. I rated 100 random pages for how much they resembled encyclopedia articles. I reckoned that about one third of articles were of real encyclopedia quality, about one third were nowhere near encyclopedia quality, with the third in the middle as promising works in progress. Interestingly, I could see no obvious sign that the average quality of articles was getting any better or worse; it looks to me that the increase in average quality through editing and the decrease through new stubs roughly cancelled out. Enchanter 18:48, Jun 14, 2004 (UTC)
Hi, don't know where to point this out but there seems to be a fluke with the usage stats. Please check it out && corrrect it
http://wikimedia.org/stats/en.wiki.x.io/
Notice that the stats for the last 12 months....<pasted> are crazy for Jan and Dec. These are not the stats for the las 12 months but rather a mixture of jan 2003 dec 2002. Maybe its a select statement bug???
Can it be corrected coz i wanted to use the stats for a statistics project...
regards
<pasted>
Jul 2004 6786854 5903348 4128308 221202 1057155 333444835 1548418 28898156 41323437 47507980
Jun 2004 8708699 7606574 5281634 295800 2406100 777885880 3845409 68661252 98885464 113213099
May 2004 3872154 3412794 2437365 274077 3506025 742530837 5755636 51184678 71668674 81315244
Apr 2004 2528076 2244552 1604463 189930 3484303 648796021 5697918 48133914 67336582 75842288
Mar 2004 3712287 3269369 2406952 280845 5134274 952395093 8706201 74615518 101350462 115080901
Feb 2004 3085573 2583379 3012502 265556 3861758 663607322 6638905 75312562 64584497 77139341
Jan 2003 258494 223980 141012 21995 409498 125092770 681872 4371372 6943405 8013325
Dec 2002 389507 343187 196038 49208 959749 146121521 1525456 6077188 10638822 12074727
Nov 2003 1630590 1157635 856691 139330 2090602 250582681 3343942 20560585 27783244 39134168
Oct 2003 1507552 1201196 721562 157068 2997839 337440023 4869113 22368431 37237095 46734120
Sep 2003 1479193 1181710 608869 155665 2894993 305957717 4669978 18266091 35451313 44375793
Aug 2003 998932 798697 421723 90404 1746797 228149162 2802537 13073417 24759634 30966904
<pasted />
Wikipedia's headline stats for July 2004
The July stats are in (see http://wikimedia.org/stats/en.wiki.x.io/usage_200407.html ) and they make some interesting reading...
July was the English Wikipedia's busiest month ever (I think), with:
- 9,439,508 hits
- 8,208,960 files were downloaded
- 5,672,051 pages were served
- 316,295 visits (not clear if this refers to unique visitors or just page impressions)
- 2,083,869,414 Kb of data was downloaded
Excluding project and special pages (and the Main Page), the 10 most requested articles were:
- Nick Berg (Iraq hostage)
- John Kerry (new entry)
- Kim Sun-il (Iraq hostage)
- OS-tan (deeply bizarre; a must-read) (new entry)
- List of sex positions
- United States
- Crushing by elephant (yay, go elephants! ;-)
- Bobby Fischer (former chess champion) (new entry)
- Wikipedia
- Wiki
For comparison, the 10 most requested for June were:
- Paul Johnson (hostage)
- Kim Sun-il
- Paul Marshall Johnson, Jr.
- Beheading
- Decapitation
- Redmond, Washington
- Goatse.cx
- SpaceShipOne
- Wikipedia
- United States
The top 10 search terms for July were:
- wikipedia
- wiki
- nick berg
- cristiano ronaldo
- teresa heinz kerry
- encyclopedia
- beheading
- harry potter and the half blood prince
- marlon brando
- ken jennings
From this, it looks pretty clear that Wikipedia is being heavily used as a resource for major ongoing news events, particularly Iraq. -- ChrisO 16:37, 2 Aug 2004 (UTC)
Editing experience
I've been keeping an eye on Combined live stats, and none of the graphs there seem to reflect the overall "slowness" I experience when browsing or editing. The second one, which deals with server response time, would seem in theory to reflect the overall experienced "slowness", but it doesn't seem to. Is there another stat that would be more meaningful for what I'm trying to look at? P.Riis 21:24, 31 Aug 2004 (UTC)
500,000 Articles!!!
Wikipedia has finally reached 500,000 articles! What do people have to say, I wonder? --Andrew 22:03, 17 Mar 2005 (UTC)
least links
Is there away to find the article that have the fewest or no links to them? Falphin 20:20, 9 Jun 2005 (UTC)
- Orphaned pages Nroose 12:46, 26 Jun 2005 (UTC)
Stats are over a month old
It appears that the Stats have not been updated in over a month (since May 16th). Why is that? Nroose 12:48, 26 Jun 2005 (UTC)
How to get statistics for a Wikipedia article
How do you get the statistics relative to an article?... at http://en.wiki.x.io/wiki/Don_Saklad Hits. Hourly hits. Referers. Et al.
Major error on special page
The special page for statistics includes this page [ttp://en.wiki.x.io/wikistats/EN/Sitemap.htm] which hasn't been updated since 16 May as a page that updates automatically. Could someone with the authority to do so sort this out please? Osomec 19:35, 3 August 2005 (UTC)
User statistics as of September 15, 2005
See: User:JIP/User statistics. — JIP | Talk 11:54, 15 September 2005 (UTC)
Edit count
Why my favorite Kate's Tools stopped working? Vald 10:10, 14 November 2005 (UTC)
Dubious link
The new link to wikiside.com is to an independent site that carries ads and I don't think the stats look correct or up to date anyway. I'm thinking of removing it. Any comments. Calsicol 16:53, 5 January 2006 (UTC)
Active users
How many users have actually made an edit this month? Probably like 3% are real users.Voice of AllT|@|ESP 02:54, 13 January 2006 (UTC)
- Compare to the number of those who have checked their watchlist over the same time period. Creating an account is the only way to get a portable watchlist. --James S. 17:51, 21 January 2006 (UTC)