2021 Article study

edit

In 2021 I did a study of 1,000 randomly chosen articles. The catalyst was some prominent discussions regarding potential gender bias in Wikipedia articles plus seeing a preponderance of certain types of sports articles in my New Page Patrol work, and the ongoing discussion about Nsports letting in too many non-notable articles. The preponderant types of sports articles I noticed at NPP were biographies and "intersection" (compound criteria) type articles such as "The xyz team's 2003 season".

Regarding potential gender bias in Wikipedia biographies, questions arise that Wikipedia just covers the real world, even if the real world is biased. With the real world shifting in a way that reduces gender bias, I decided to split biographies into "recent" and "not recent".

In addition to the particular gender and sports areas of interest above, I decided to also categorize the others of the of the 1,000 articles into common major divisions.

Selection methodology

edit

I used the English Wikipedia "random article" button. This was done over a few weeks in late Summer 2021. I classified every article that it generated except for disambiguation articles and went until I reached 1,000 classified articles.

Key Definitions

edit

Sports: Following the common meaning of the term, including involving some physical activity. There were few edge cases. One was automobile racing which was included.

For biographies, "recent" means that they were active at their "main thing" in the last 15 years (since 2006) and "not recent" means that they weren't.

Places, broadly construed. Mostly geographic, but there were many edge cases (buildings, train stations, schools when it is identified with a single physical facility, other objects in the universe) and they were all included.

Plants, animals and other individual life forms Self explanatory, but did not include humans

"Everything else" Note that this does not include disambiguation articles which were excluded from the study

Raw granular data

edit
Description Count % of all articles
Biographies sports male recent 41 4.1%
Biographies sports male not recent 34 3.4%
Biographies sports female recent 8 .8%
Biographies sports female not recent 4 .4%
Biographies, all except sports, male recent 43 4.3%
Biographies, all except sports, male not recent 106 10.6%
Biographies, all except sports, female recent 30 3.0%
Biographies, all except sports, female not recent 16 1.6%
Other sports 59 5.9%
Geography and places, broadly construed 277 27.7%
Plants, animals and similar 75 7.5%
All other articles 397 39.7%

Male vs Female comparisons

edit
Category % Male % Female
Biographies sports recent 83.7% 16.3%
Biographies sports not recent 89.5% 10.5%
Biographies sports all 86.2% 13.8%
Biographies, recent, not sports 58.9% 41.1%
Biographies not recent, not sports 86.9% 13.1%
Biographies all non-sports 74.4% 23.6%
Biographies all 79.4% 20.6%

% of articles that are primarily sports

edit
Category % Sports
Of biographies 30.9%
Of all articles 14.6%

Analysis on possible gender bias

edit

Wikipedia's rule for the existence of articles that is most relevant here is WP:notability which is ultimately mostly coverage-based. One could also question whether other non-rule realities of Wikipedia could affect gender bias in articles. Historically women participated less in prominent roles, and in the types of roles that are the subject of coverage. More recently this has been less the case, although most would agree that it has overall not reached a point of being 50/50. The study pointed out that more recent biographies trended towards more biographies on females.

There is also the possibility that sources give unequal coverage to equally prominent women and men. This seems implausible (I.E. give less coverage of a woman mayor of a town than a male mayor of the same town) but this study could not resolve that question either way.

The study also also pointed out that there is a comparative immense number of sports biographies, to the point where sports biography statistics heavily influence overall biography statistics. The recent removal of the "did it for a living for one day" criteria from NSports will eventually mitigate this. And sports biographies statistics are more heavily male than overall biography statistics.

Possibly the result most indicative on the question is the mix for more recent non-sports biographies which was 59% male and 41% female.

  • If women have achieved better than 41% presence in prominent roles in the real world, then this indicates that Wiki has a male bias in number of biographies.
  • If women have achieved less than 41% presence in prominent roles in the real world, then this indicates that Wiki has a female bias in the number of biographies.

The large quantity of sports biographies, combined with them being more "male-heavy" than Wikipedia as a whole has a significant impact on the mix of male vs. female biographies.

Solving long-running disputes

edit

(copy of my posts from village pump)

Causes

edit

"Cause" is a subjective term but I'd say that long term disputes usually have 2-4 of these as their primary causes:

  1. There's a conflict / contest out in the real world and they want to advantage their real-world "side" by how the Wikipedia article is written
  2. Using Wikipedia guidelines and policies for other than their intended purpose. Whether you call that weaponizing policies, wikilawyering, gaming, civil POV pushing
  3. Some psychological thing that tends to deepen during long battles. Like personal clashes, "gotta win" mentality, stubbornness
  4. Failure to recognize and handle mere definitional/terminology issues as such, and Wikipedia does more harm than good there
  5. High complexity of the question/ task at hand


Methods and fixes

edit

Trying to distill some ideas out of my responses: My one preface is that, the causes/ natures of the such disputes are variable and accordingly so are the ways to resolve them. My thoughts:

  1. Strategically, evolve policies and guidelines where they currently either contribute to the problem or fail to do their job of helping resolve it. easier said than done, but needs saying.
  2. The main "bones" of a solution are RFC's on the article talk page, albeit done much more effectively than they currently are. Ways to do that would be new help pages that show how to do that, and experts (given extra influence / a role) who can help orchestrate it.
  3. Experts who are given extra influence / a role to guide/ navigate the discussion. This is a different focus than mediation or dispute resolution.
  4. Find a way to paint a scarlet letter on those who use policies for other than their intended purposed. From mild versions through severe versions such as weaponizing Wikipedia policies/guidelines/systems to get rid of or wiki-deprecating "opponents". Things related to wikilawyering, gaming, "civil POV pushing" but we need a better and usable scarlet letter term than those. The is needed because such behavior is easily disguised as "just enforcing policies" or "just identifying problematic editors"
  5. Start recognizing disputes that are founded on "mere definitions of words" issues as being merely such, and deal with them accordingly.