This essay is an effort to clearly explain, and offer suggestions to improve, the Wikipedia category system and the Categorization guideline. It reflects only the opinions and understanding of User:Codrdan. It is not a Wikipedia guideline.
This essay is a work in progress. It may temporarily contain redundancies, omissions, or inconsistencies.

The category system

edit

Wikipedia's categories are sets of articles that help readers find articles on similar topics. Each category is defined by its own unique set of characteristics, which must be met by all articles in that category. Wikipedia's editors and software organize categories by associating articles with categories, and by associating categories with each other through subset relationships: If an article satisfies the requirements of a certain category, an editor will place a Category declaration in the article to declare it a member of that category. Similarly, if one category's member articles are a subset of the articles in a more general category, an editor will place a Category declaration on the more specific category's page to declare it a subcategory, or "child", of the general category, which is called the subcategory's "parent". The software will then automatically list each category's subcategories and/or member articles on the parent category's page. Articles and category pages both list their parent categories in a box at the bottom of the page called the "categories box".

Subcategories and subdivision

edit

Not all subcategories are part of a systematic subdivision scheme. Some are created because their member articles are particularly notable or interesting. These are called ad-hoc subcategories.


A category may be systematically divided into subcategories according to some specific criterion. For example, the category Films is divided into subcategories by genre. In fact, the parent may be subdivided in more than one way: Films is also subdivided by country, date, director, language, and several other criteria.

A regular category represents a subset of its parent category and is called a subset category. Each set of subcategories that are defined by the same criterion is called a metacategory. A metacategory contains all of the parent's subcategories that are defined by some specified property in addition to the defining characteristics of the parent. For example, Films by genre is a metacategory that divides Films into subcategories according to genre. It has many member categories, including Comedy films, Drama films, Action films, and Romance films. The purpose of regular categories (subset categories) is to organize content by topic, and the purpose of metacategories is to organize (sub)categories by criterion.

Note that some of the categories listed above overlap. When two or more categories partially overlap, the articles that belong in more than one of the overlapping categories each have more than one parent. This also applies to subcategories of the overlapping categories. For example, the article Gone with the Wind (film) has more than twenty parent categories, including 1939 films and American war drama films, and the category American romantic drama films has three parents: American drama films, Romantic drama films, and American romance films.

Visualizing the category system

edit

One can visualize the relationship between articles and categories by thinking of articles as physical objects and categories as boxes that contain their member articles. Similarly, a parent category can be visualized as a large box that contains other, smaller boxes, which represent the parent's subcategories. Any of the parent's member articles that are not contained in any subcategory can be thought of as "loose" items, meaning that they are not contained in any of the small boxes inside the large box.

Unfortunately, this picture has a serious limitation: Since Wikipedia categories are allowed to share member articles, the boxes in the box model would have to overlap each other somehow. Real physical boxes don't overlap, so it would be more accurate to visualize categories as abstract geometrical shells instead of real physical containers.

The category hierarchy

edit

A simpler way to visualize categories is as nodes in a graph or organizational chart, with parent-child relationships represented by lines that connect the nodes. Because many categories have multiple subcategories and multiple parents, the parent-child relationships between categories form a web of interconnected chains that relate more general categories to more specific ones.

Although articles and categories may have more than one parent, most categories have more children than parents, and there is one category, called Contents, that contains all Wikipedia content and all other categories. This means the web is a tree-like structure that organizes all Wikipedia content. It's called the category hierarchy. Users can find specific categories by following the relevant child links from the Contents page, and general categories can be found from more specific categories by following parent links. All categories lying on one or more chains between Contents and any particular category are called that category's "ancestors". Each category's parent is its most immediate (closest) ancestor, and Contents is every category's most remote ancestor. If one category is another category's ancestor, then the second category is called its ancestor's "descendant". If a category has any subcategories, they are its most immediate descendants.

The category hierarchy is not a simple tree. In technical terms, the hierarchy is a directed acyclic graph with a single node at one end (Contents). It is commonly visualized as an upside-down tree whose branches are allowed to merge, rooted at the top and branching downward. In this picture, each category lies below its ancestors and above its descendants.

Category pages

edit

Each category has its own Wikipedia page. A category's subcategories are listed in the top section of the parent category's page, with content in the Pages or Media section. Each category page has a lead section with a layout similar to that of an article, except the lead section of the category page may contain classifications.

Types of categories

edit

A category's title can describe the articles in the category in two ways: It can denote either a set or a topic. Such categories are called "set categories" and "topic categories" respectively. Articles in a set category discuss members of that set, and a topic category's member articles are about subjects that pertain to that topic. For example, Music is a topic category, and Musicians is a set category. Music contains all articles related to music, whereas Musicians contains only articles about specific musicians. Since a topic category may contain anything related to that topic, but a set category contains only members of that set, topic categories may contain set categories, but set categories should not contain topic categories.

Content categories are divided conceptually into project, stub, and maintenance (category page) categories. For more concrete descriptions and techniques, see Category:Wikipedia help.

Categorizing articles

edit

Categorization is the process of assigning an article to one or more categories. Every Wikipedia article should belong to at least one category, and many articles are listed in more than one category. Each article should be placed in all of the lowest-level (most specific) existing categories to which it logically belongs. In most cases, these categories will be childless. Higher-level (more general) category pages (parents) typically list only subcategories. Exceptions to this rule are listed below.

It should be clear from verifiable information in the article why it was placed in each of its categories. Use the {{Category unsourced}} template if you find an article in a category that is not shown by sources to be appropriate, or the {{Category relevant?}} template if the article gives no clear indication for inclusion in a category.

  • Disambiguation pages belong to special categories (see Disambiguation); most redirects are not categorized, though there are exceptions (see Categorizing redirects). For the categorization of pages in other namespaces, and categories used for project management purposes, see Project categories below.
  • Normally a new article will fit into one or more existing categories. Compare articles on similar topics to find what those categories are. If you think a new category needs to be created, see the section What categories should be created below. If you don't know where to put an article, add the {{uncategorized}} template to it. Other editors, such as those monitoring Wikipedia:WikiProject Categories/uncategorized, will find good categories for it.
  • Categorize articles by characteristics of the topic, not characteristics of the article. A biographical article about a specific person, for example, does not belong in Category:Biography (genre). (For exceptions, see Project categories below.)
  • An article should never be left with a non-existent (redlinked) category on it. Either the category should be created (most easily by clicking on the red link), or else the link should be removed or changed to a category that does exist.
  • Articles on fictional subjects should never be categorized in a manner that confuses them with real subjects. A set category such as Category:Countries in Europe or Category:Presidents of the United States should contain only real examples of those sets. If a set category for fictional subjects has a real-life counterpart, as with Category:Fictional presidents of the United States, its contents should be expressly identified as fictional in the name of the category itself. This is not necessary where the grouping is purely fictional, as with Category:Klingons. Fictional subjects may be mixed with real ones only in topical categories. In topical categories, there is no risk of confusing fiction with fact as with list categories.

The order in which categories are listed in an article is not governed by any single rule. In particular, it does not need to be alphabetical, although partially alphabetical ordering can sometimes be helpful. Normally the most important categories appear first. If an article has an eponymous category (see below), then that category should be listed first. For example, Category:George Orwell is listed before other categories in the George Orwell article.

Eponymous categories

edit

The name of a category may be the same as the name of an article. This phenomenon is called eponymy.

Often an article and a topic category will have the same name, as in George W. Bush and Category:George W. Bush, or occasionally similar names referring to the same thing, as with Mekong and Category:Mekong River. Such a category is called eponymous. Naturally the article itself will be a member of the category. It should be sorted to appear at the start of the listing, as described below under Sort order.

By convention, eponymous categories are an exception to the rule that topic categories should not be subcategories of set categories. Many eponymous categories are added along with their corresponding articles to the categories to which the article belongs. For example, Category:France is a subcategory of Category:Countries in Europe, which contains France, even though subjects pertaining to France are not themselves European countries.

In other cases, eponymous categories have been categorized separately from their articles. In this case it will be helpful to readers if there are links between the category page containing the articles and the category page containing the eponymous categories. An example of this setup is the linked categories Category:American politicians and Category:Wikipedia categories named after American politicians, using the template {{CatRel}}.

A clear link to the main topic article from an eponymous category page can be created using the template {{cat main}}.

What categories should be created

edit

Categories should be useful for readers to find and navigate sets of related articles. They should be the categories under which readers would most likely look if they were not sure of where to find an article on a given subject. They should be based on essential, "defining" features of article subjects, such as nationality or notable profession (in the case of people), type of location or region (in the case of places), etc. Do not create categories based on incidental or subjective features. Examples of types of categories which should not be created can be found at Wikipedia:Overcategorization. Discussion about whether particular categories should exist takes place at Wikipedia:Categories for discussion.

Categorizations appear on pages without annotations or referencing to justify or explain their addition; editors should be conscious of the need to maintain a neutral point of view when creating categories or adding them to articles. Categorizations should generally be uncontroversial; if the category's topic is likely to spark controversy then a list article (which can be annotated and referenced) is likely to be more appropriate.

Before creating a new category, check whether a similar category already exists under a different name (for example, by looking on the likely member pages or in likely parent categories).

Categories follow the same general naming conventions as articles; for example, common nouns are not capitalized. For specific rules, see Wikipedia:Naming conventions (categories).

For proposals to delete or rename categories, follow the instructions at Categories for discussion.

Subcategorization

edit

Small parent categories should list all of their member articles, even the ones in subcategories, in order to provide readers with a complete listing of the articles. If a category is too large to conveniently list all of its member articles, its page should indicate how the reader can find all of its articles in some convenient way. This is done by constructing one or more sets of subcategories that completely subdivide, or "break down", the parent according to some additional criterion. Each such set collectively contains every member article of the parent at least once, and can be called a "full", "complete", "exhaustive", "comprehensive", or "systematic" set of subcategories. For example, Category:Rivers of Europe is subdivided by country into the subcategories Rivers of Albania, Rivers of Andorra, etc..

A complete set of subcategories may be either listed in its own section of the parent's page, or placed in its own Wikipedia page, which is listed in the parent category's page in place of the individual subcategories. If this is done, the separated set of subcategories can be called a "subdivision category", "category list", "subcategory set", or "metacategory".

Subcategories that are not part of a complete set are called "ad-hoc" or "standalone" categories.

A category may be subdivided using several coexisting schemes; for example, Category:Albums is broken down by artist, by date, by genre etc..

To suggest that a category is so large that it ought to be broken down into subcategories, you can add the {{verylarge}} template to the category page.
This template should be renamed to largecat or bigcat.

Subcategorization issues include duplication, circularly referencing branches, and over-categorization (a sort of notability for categories). As articles are added to a category, new subcategories may be created to hold the articles. This prevents individual category pages from becoming too big.

When making one category a subcategory of another, ensure that the members of the subcategory really can be expected (with possibly a few exceptions) to belong to the parent. If two categories are closely related but are not in a subset relation, then a link to one can be included in the other's category description (see below).

The category Humans belongs to the category Primates. For the Human category to appear in the Primate category, the Category:Humans page makes the category declaration [[Category:Primates]]

If B is a subcategory of A, then A is said to be a parent category of B. The branch Humans-Primates-Mammals-Vertebrates contains four categories. If the category declaration in a category page is an arrow from itself (the subcategorized category) to the "parent" category, and if the root category is at the bottom, all the arrows point downward. (See the figure and its caption.) The direction of a category branch (a sequence of logical categorizations of pages) is counter-intuitive. Of the four category pages in the branch given, the first three contain the category declarations that make them subcategories, and the fourth category page does not contain a category declaration that points toward the other three. Vertebrates is a parent category of this branch.

Other sections

edit

Wikipedia:Categorization#Display of category pages

Wikipedia:Categorization#Project categories

Wikipedia:Categorization#Categorization using templates

Wikipedia:Categorization#Redirected categories

Wikipedia:Categorization#Interlanguage links to categories

Wikipedia:Categorization#Tips

See also

edit