Wikipedia:Participation by academic projects

This document is part of the AHRC/Wikipedian in Residence program, and aims to provide guidance for academics interested in working with Wikipedia as part of a broader research project.

Introduction

edit

Wikipedia is now ubiquitous—a free-content encyclopedia available online, covering millions of topics in up to 280 languages. It is one of the most high-profile examples of user-generated content, and by far the most heavily used non-commercial internet site—it and its sister projects collectively have a hundred thousand active contributors, five hundred million readers, and twenty billion pageviews a month.

Since around 2006, there has been a significant amount of public debate about the accuracy, suitability, and intended role of Wikipedia. The academic response remains divided, but there is a growing recognition that while it may be imperfect and some of its approaches may be problematic, there is nonetheless a massive audience willing to engage with complex and demanding subjects through Wikipedia. These readers may not have sought out this material were it presented in a more traditional model or a more formal context, and Wikipedia offers a remarkable opportunity to help them find better and more robust information.

A substantial number of researchers and academics have started working with the Wikipedia community to help achieve these goals over recent years, and have achieved positive results. We’ve learned a lot about the types of project that can work, and how to go about starting them. This guide will present a series of examples, discuss what does and doesn’t work in principle, and explain how to get started with a project and where to ask for help. We’ll also summarise ways to work with some of the other Wikimedia projects, including Wikidata, Wikisource and Wikimedia Commons; these are also user-developed content communities, but focus on different work, aggregating linked data, media content, and transcribed source material.

Wikipedia is an open community and an open resource, and we welcome all well-intentioned contribution. Ultimately, though, remember that you are working with an existing community, one that has evolved its own standards and practices over time. It has its own ways of working, and its own organisational structures; you’ll need to bear this in mind to get the best out of what you’re doing.

Working with Wikipedia

edit

From the perspective of public engagement, research dissemination, or knowledge exchange, Wikipedia is a powerful tool to use. It is a way of reaching out to an existing audience, already actively seeking to learn, and providing the information to them in a context with which they are familiar. As a result, the potential for public impact—in terms of number of readers—can be disproportionately high in terms of the work required. Wikipedia’s emphasis on sourcing and attribution means that it surfaces a large amount of primary and secondary research material, encouraging readers to move on to these original sources to continue their reading.

Through 2012-13, the Arts and Humanities Research Council supported a “Wikipedian in Residence” program at the British Library, aiming to study and experiment with ways in which Wikipedia could be used by academic research projects. This guide is drawn from the experiences of this program, as well as previous collaborations. It focuses on how to work with Wikipedia as part of a broader research program, rather than simply contributing as an individual—but if you’d like to do that, great! The best approach is just to stop reading here & wade straight in—the “Ten Simple Rules” article below, from PLoS Computational Biology, is probably the best short introduction to working with Wikipedia, and there are also practical guides on how to edit on the site itself.

Ten Simple Rules:

Links
  • *Logan DW, Sandal M, Gardner PP, Manske M, Bateman A (2010) Ten Simple Rules for Editing Wikipedia. PLoS Comput Biol 6(9): e1000941. doi:10.1371/journal.pcbi.1000941
  • The Wikipedia tutorial
  • Editing help - old system
  • Editing help - new system

General principles

edit

Let’s start with some basic principles of Wikipedia, and what it’s trying to achieve. While not a conventional reference work, Wikipedia does have its own key standards covering its scope, its content, and its structure.

Notability is the "significance" of a subject—whether it is notionally important enough to have an article covering it. Wikipedia’s threshold for notability is generally held to be "significant coverage in reliable sources that are independent of the subject". This is a flexible threshold, with a lot of room for debate around "independent", "reliable", etc., but the general principle is clear—if the subject has not had any significant amount written about before, it's probably not suitable for Wikipedia. Verifiability states that these sources should be linked and made visible to the reader as clearly as possible.

Associated with this is Wikipedia’s policy on original research—in short, Wikipedia is fundamentally a secondary or tertiary source. Articles should not include material not previously published elsewhere, not make any novel syntheses or interpretations of sources, and avoid personal opinion. Linked to this is neutrality—from an academic standpoint, this means that we are trying to give appropriate weight to all significant perspectives on a topic, rather than taking sides in a particular debate.

Finally, Wikipedia’s policy on conflict of interest states that you should avoid making significant changes or additions to articles about yourself or a project you are closely associated with, and you should not edit Wikipedia solely for the purpose of promoting an external resource. (It’s fine to suggest and discuss such changes—indeed, it’s encouraged - but making them yourself is generally discouraged).

These are all generally restrictive statements, but there are also more positive principles. Wikipedia does not have firm rules; it has a set of descriptive policies that explain what the community broadly agrees on, but it recognises that these policies may not be universally applicable. It is an open and collaborative project, and you are welcome to engage with and improve its content. We encourage people to be bold—experimenting and making mistakes is fine, and it’s very hard to permanently break anything.

Links

So what should you think about doing? Let’s look at some examples of past projects, grouped roughly by theme:

Contextualising research

edit

Wikipedia can be used to write about the subject of a research program, but in many cases this focus is very specialised—perhaps, for example, the social history of a particular area in a particular period—and may not feel suitable as a Wikipedia article. A productive approach here can be to use Wikipedia to build supporting materials explaining the overall setting of the research question; readers can then fall back on this material to give them a more general context for understanding the project’s results. Through Wikipedia, this material is made permanently available for other uses, serving not just to aid the original project’s audience but also hundreds or thousands of other readers.

 
Students from UCL at the Dunhuang event.

At the British Library, we worked on a program of contextualisation for the International Dunhuang Project. This was a research program focused on Central Asian archaeology of the Silk Road;the collection and its subjects are not well known to the general public, and information about the periods and themes covered is limited outside specialised academic material. As a result, most people encountering the material would need supporting material in order to engage with it, and would likely be looking at standard resources such as Wikipedia. The project looked at boosting the coverage of background material on Wikipedia; through a week-long program, we focused on writing articles about the languages and nations involved, the archaeological sites themselves, and the early archaeological expeditions.

Links

Capturing research

edit

This approach looks at using Wikipedia to collate work that might otherwise not be particularly visible, or might never be formally published. This can sometimes feel undervalued, especially if it is not an intended outcome of the project, but is nonetheless potentially useful in other contexts and for other users. Beyond its own policies, Wikipedia does not offer any barriers to publication; and suitable incidental material can be put on Wikipedia with relatively little work, making it permanently available in a central resource for future users and for future development.

For example, the Darwin Correspondence Project at the University of Cambridge produced a series of capsule biographies of all of Darwin’s correspondents. These were published on the project’s website, but were unlikely to be found by someone not already seeking information in the context of the correspondence. Many of these figures are interesting and apparently significant, but not well-covered in current historical writing. The project organised an “editing day” for a dozen people, aiming to rush out a series of articles; around a dozen Wikipedia articles were created or heavily expanded using the material assembled during the project.

Similar opportunities for capture might come from projects building provenance records or bibliographies, for example. Some material could be stored on the other Wikimedia projects—such as transcribed documents on Wikisource or incidentally-produced page scans deposited on Commons—and where the outputs are only notes rather than prose, Wikipedia’s article talk-pages can be useful for holding short notes and snippets of information for later work.

Links

Distributing digital content

edit

One of the simplest ways of working with Wikipedia in the past, especially for cultural institutions, has been to release a large amount of digitised material under a free license, upload it to Wikimedia Commons (which we'll touch on later), and use this as a vehicle for widespread dissemination and interpretation. Unlike many similar services, Commons places a high value on quality metadata, encouraging reusers to attribute content to its original source and to maintain license conditions.

Through Wikipedia, images uploaded to Commons are reused in reader-facing articles, giving them wide public visibility; the work done in order to use them, and community response, can provide valuable research about the content, scope and nature of the material. An early example of this work was with the German Bundesarchiv, whose donation of archive photographs in 2006 spurred several hundred corrections to captions and metadata which were integrated back into the original catalogue.

 
BL HS/85/10/13885, a young Cree man photographed in Western Canada, 1903

A similar project was carried out at the British Library, working with the Canadian Copyright Photograph Collection, which was digitised and released on Commons as well as through the Library's own site. The collection had not been heavily studied since it was catalogued in the 1980s, and working on the release produced a large amount of useful information just from the early checks. Once released to the public—and particularly to Canadian audiences—we were able to draw together a large amount of community knowledge to identify the subjects of unlabelled images, and to help reconstruct the context and significance of some of the material. All this material is being aggregated to pull back into the Library’s collections, and work can now begin to distribute it to Canadian heritage institutions.

Both of these examples deal with institutions releasing content, but the model can easily apply to a research project producing or collating suitable digital content.

Links

Exposing digital resources

edit

Rather than releasing the resources to the world, Wikipedia can also be used to help expose copyrighted resources through links and citations. A substantial amount of Wikipedia content is written from sources accessible online, and making historic or reference material available can support the creation of a large number of interpretative articles. These can be added as further reading to relevant Wikipedia pages, left on the discussion pages for future use, or projects can be contacted and advised on the availability of materials. This can stimulate the creation of new, transformative works dealing with the subject at hand, as well as providing a demonstrable use case for digitisation, especially of monographs.

This type of activity often takes place entirely unmediated by the originating project, and as such being actively involved may not require much additional labour. As well as its role in exposing existing digitisation programs, an interesting opportunity for academic programs here is linking self-archived "green open access" copies of material in institutional repositories to supplement Wikipedia references to the same content, which normally used the canonical publisher links.

Education & student programs

edit

Academic programs involving students working with Wikipedia have been around for a decade, the first having taken place in early 2003 with a group of computer-science undergraduates. (Interestingly, some of their contributions are still in place with very little change.) They've grown over the last few years, with large-scale programs in a number of countries & languages; there has not been much work in the UK to date, but this is anticipated to change over the next two years.

The majority of these projects focus around students writing (or heavily expanding) existing articles. If organised well, these can be very productive—students often respond well to producing unusual work, and benefit from having to research, summarise and synthesise material while also making it accessible to a lay audience. And, of course, there is lasting public value to a well-composed Wikipedia article. Other topics for student projects have included translation tasks, transcription, content curation, illustration, and various forms of critical analysis.

Links

Research projects

edit
 

Finally, it's worth discussing the wide body of research *about* Wikipedia. There has been a flood of papers studying Wikipedia—looking at topics as diverse as the reliability of its content or its social structures, or using it as a linguistic or interactional corpus. There’s far too many to go into here, but a partial bibliography is at [link]—since 2011 it’s been supplemented by the monthly research newsletter, which is well worth reading.

The main body coordinating both internal and external research on Wikipedia is the Wikimedia Research Committee. They are best placed to advise on research topics, and can provide limited access to non-public data in some circumstances. They can also advise on ethical issues surrounding any "active" experiments on the site; especially where these involve testing the community's response to breaching actions, it’s possible for them to be very contentious.

Problematic approaches

edit

The corollary to good examples is bad examples—and there's a good number of pitfalls we've learned from previous projects that haven't worked out so well.

The most common of these is the "project-centric approach"—engaging with Wikipedia by using Wikipedia to publish an article about the project. It makes sense, at first glance. But there are problems: firstly, as mentioned earlier, the Wikipedia community strongly discourages people from writing about their own projects/organisations. Many research projects have a very low public profile, and may not meet Wikipedia's inclusion guidelines. And most critically, such an article is of relatively limited value—the general public are vastly more likely to read about the subject studied than they are to seek out information on the particular project.

Another issue is scope and ambition. There are many examples of people announcing an ambitious “crowdsourcing” project and assuming that Wikipedia will direct a thousand active volunteers their way. While the Wikimedia community is large and active, it is not a bottomless pit of resource, and you will need to engage with existing volunteers and encourage them to work with your project. Likewise, simply dropping links or text sections into articles without attempting to discuss a planned project is often unproductive.

Finally, there are sometimes issues around style and content. Wikipedia has an unusual house style, which can be at odds with most academic writing; it can take a little getting used to, and it can sometimes be an unsettling experience to have your prose chopped around by a faceless copyeditor with no subject expertise!

However, all of these can be averted with some planning and familiarisation with the way Wikipedia works, and considering carefully what you aim to achieve.

Getting started

edit

Key decisions

edit

Before you start a project, there are some key issues you will want to consider.

 

Licensing is a vital issue. All of the content produced and disseminated through Wikimedia projects is required to be made available under a very open copyright license. This allows it to be reused and modified for any purpose, including commercial use, while requiring that authors and contributors are credited. If you aren't happy licensing your work this openly, then it won't be suitable for Wikipedia—the project is fairly strict on copyright licensing and won't accept restricted material, including a general "permission to use this on Wikipedia" or "permission to use for educational purposes". If your project includes content produced by other people (for example, user-submitted site photographs) then you'll want to check that they are willing to agree to license the material appropriately as well.

The specific license used is the Creative Commons Attribution Share-Alike (CC-BY-SA); note that this is not compatible with non-commercial or non-derivative Creative Commons licenses (CC-BY-NC\ND). If you're using Crown Copyright material sourced from a UK government agency, it may be available under the Open Government License (OGL) which is designed to be fully compatible with CC-BY-SA. If you're unsure, please ask—the Wikimedia community is well used to dealing with the vagaries of clashing licenses and will be happy to help advise.

(A frequently asked question is why Wikipedia won't accept "non-commercial" or "educational only" restrictions—the answer is simply that we want Wikipedia content to be reused and distributed as widely as possible, including to the half of the world without internet access. The more restrictions we have, the more difficult it becomes for reusers to feel confident they are acting within the scope of the license, especially when republishing material offline. See below for a more detailed discussion of the issues surrounding "non-commercial use".)

Closely linked to ownership is the issue of authority. Wikipedia is a collaborative, decentralised project, where articles are worked on by dozens of different people, none of whom have final control or ownership. There is no formal hierarchy, no accreditation of contributors, and no central editorial board; while there are various methods for dispute resolution, these tend to shy away from making binding statements on article content. This is very different from traditional publishing—or even from most forms of social media—where individual works are attributed to a specific author(s), who has ultimate control over what is and isn't said.

The practical effect of this approach is that a Wikipedia article may not remain the same in future; it may be expanded, or cut down; it may evolve over time to say quite different things from its original version. While the original author is welcome (and encouraged) to engage with this process, ultimately, the content (and existence) of any article is decided on by the broader community. To engage productively with Wikipedia, you should bear this in mind; it is a collaborative draft rather than a channel for publishing final works. If you're not comfortable with this, then Wikipedia may not be suitable for what you want to do.

Links

Finally, you will want to consider how you want to approach the project. There are two general ways that people tend to work with the Wikimedia community:

Firstly, they contribute; they write articles, or update content, etc., alongside the existing community. This is often the most effective approach—especially if you're aiming to use Wikipedia as a way of improving the context of your work—but it has some caveats. Mainly, this is that article writing can be more time-consuming than you anticipate, but also—depending on what the plans for your project are—there might be a risk of perceived conflict of interest.

Secondly, they collaborate with the existing Wikimedia community, offering help and support to people working on specific topics, or recruiting collaborators with specific technical skills. This can be a very productive approach, but it does mean that you need to find interested volunteers within the community!

In practice, many projects do a combination of these two approaches; it depends on what your project is aiming to do and how you feel most comfortable working.

So, let’s recap. Hopefully you now have an idea of:

  1. what topic your project will cover
  2. whether it is appropriate for Wikipedia
  3. what broad approach you plan to take

Let's begin!

First steps

edit

Once you've decided on your project and how you see it working, you'll want to make contact with the community. The most efficient way to do this is on the site itself. We won't look at the practicalities of editing Wikipedia in this guide, but it's important to remember that almost all communication and organisation is carried out on Wikipedia by editing pages, in much the same way that articles are written. Actually editing the site in some way is probably unavoidable!

As mentioned earlier, this guide isn't intended to cover the basics of editing Wikipedia itself; I recommend you use one of the existing guides to get familiar with the site and some of the terminology. If you haven’t yet read the Ten Simple Rules article from earlier, it’s also worth going over—it’s only two pages.

First, register an account. We recommend using an individual account, rather than one named for the project—many contributors dislike organisational names. Once you have registered this, create a user page—essentially your "profile" page—outlining who you are and what you’re planning to work on. You can access this through the link to your username in the top right-hand corner, and edit it as a normal Wikipedia page. It will now be visible to other contributors throughout the project, wherever you edit or leave a comment.

Secondly, we recommend you introduce your project to the Wikipedia community. Wikipedia has a number of centralised noticeboards, but it also has hundred of subject-specific "wikiprojects"—covering broad topics such as politics or visual arts, more specific fields such as feminism or numismatics, and groups dealing with specific geographical regions or historic periods. These are the best places to make contact with relevant users; a thematic list of projects is below. If there is no obviously relevant project, you could try looking for active users contributing to relevant topics and ask their advice. If you’re planning to run a student project of some form, we recommend also leaving a note at the school & university projects page, where these are coordinated.

If practical, you could also try getting in touch with the local Wikipedia community. There are regular or semi-regular meetings of Wikipedia volunteers in a number of locations around the country, and there is a local organisation, Wikimedia UK, with dedicated staff members for education & cultural-sector projects. At the time of writing, JISC has also funded a project to support academic programs working with Wikipedia, and this will be a useful point of contact while the project is underway, until early 2014.

Links

Once you've laid the groundwork—or while you’re doing it—jump in and get started! How you actually go about your first work is very dependent on what the project is, but we recommend you start relatively small. It’ll give you a chance to figure out what does and what doesn’t work, whether the outcome resembles what you expect, and what the time commitments are likely to be.

For a "contextualisation" or "capture" project focusing on writing articles, you could start by expanding parts of an existing article, to get familiar with the process. Once you've done this and you're happy with the outcome, work up—try a single new article, get an idea for the amount of time and effort it is likely to take, and adjust the plan accordingly. At this point you'll want to make contact with the Wikipedia community, if you haven't done already, and get some feedback on your article; you'll now be in a good position to move forward with the main project. Likewise for exposing resources—try with a couple of articles, gauge the response, and work onwards from there.

For disseminating digital content, you could identify perhaps a dozen varied images, then prepare the metadata, have the necessary internal discussions about licensing, then work out how the attribution should work and where it should point to. Upload these and add them to articles (or alert interested contributors) and see what the response is like. This should help build an idea of what did and didn't work, and the usage can help provide a basis for deciding whether to go ahead with larger licensing decisions. If you do plan to make a larger release, it's worth getting in touch with the Commons community directly in order to discuss tools for large-scale upload.

Finally, it’s harder to pilot a student-oriented project, but you could try running through their intended workload yourself and seeing if it feels achievable!

Asking for help

edit

If you run into trouble, Wikipedia has a number of different help venues. As before, all of these are linked below.

For practical help (in editing, technical issues, etc), there is an on-wiki help desk, and the “Teahouse” project offers for advice for new users. There is also a reference collection of help pages.

For content issues around specific articles, it is always good to discuss matters on the article’s talk page, or to refer matters to wider discussion by a wikiproject. Again, looking for active users in a field is also a good approach—most active users will be happy to help or advise if asked. Broader questions of how to handle policy can be taken to the community discussion fora, the "village pumps".

If you find yourself in a conflict, or there is a dispute about how best to handle an article, we have recommended guidelines on dispute resolution. Disputes on Wikipedia can get heated, as with any potentially anonymous internet environment, and it may be that the best option is sometimes to disengage rather than escalate; the Teahouse, mentioned earlier, is often reported to be very useful here for advice and support. It may be that simple changes to the way you are carrying out your project can defuse tensions—for example, leaving notes on article talkpages mentioning your intent to make major changes a day or two in advance will show that you are being transparent about your activity, and often makes people happier to support the actual changes!

There are also the off-wiki support networks mentioned earlier; in addition to local groups and the chapters, there are a number of mailing lists and IRC support channels. Major issues requiring confidential treatment can be reported by email to volunteers.

Links

Other projects

edit

We have focused so far on the English Wikipedia. It is by far the largest edition of Wikipedia (with around 4.3 million articles) but is one of around 280 language editions, each with its own community. While they are all working towards a similar goal, the different communities tend to have subtly different standards for inclusion, different editorial practices, and as a result can show quite different coverage both in scope and detail; they are not simply translations of each other. Working with the other Wikipedias will differ in small details, but the general principles above should be consistent.

Outside of Wikipedia, the Wikimedia community operates ten other main projects. These cover a wide range of fields, including a collaborative news site, a quotations directory, a travel guide, and a dictionary; as with Wikipedia, many of them exist in hundreds of multilingual editions. In total, there are around six to eight hundred smaller communities. There are three that may be of particular interest for external collaborative projects: Commons, Wikidata, and Wikisource.

Note that user accounts on the Wikimedia projects will function seamlessly between projects—there is no need to create separate accounts, and usually no need to log in to each site separately—but contributions and preferences are recorded separately.

Wikimedia Commons

edit

Wikimedia Commons was originally established as a central media repository for the various Wikimedia projects, but has grown from this internal focus to become one of the largest cross-disciplinary hosts of free content, made available for use inside or outside of the projects. The site is designed to support multilingual audiences, though almost all of the internal work is done in English.

It hosts almost eighteen million media files, all of which are either released under a free license, or in the public domain both in the United States and in their country of origin. Metadata is sometimes patchy, but can be very detailed, and there is an infrastructure in place for dealing with and attributing large amounts of uploaded content from a particular source

Any project involving a significant media element is likely to be working with Commons—it is much more efficient than working directly with Wikipedia, as it makes the content visible to a much larger audience.

Links

Wikidata

edit

Wikidata is the newest of the projects, and is a centralised multilingual database of structured linked data. At the time of writing, it only includes entities with Wikipedia pages, though this may expand in future; even with this limitation, it still covers eight to ten million concepts, entities or objects. The information is predominantly drawn from structured data already present in the Wikipedia articles, but is curated and extended by hand.

The database supports both data values and semantic relationships between entities, with the latter held in a language-independent fashion. The most critical part of Wikidata’s structure is that it can handle fuzzy and unreliable data, especially useful for contested or disputed information, with the system able to support multiple values for all properties, and attribute these claims to different sources. Future work aims to allow a declaration of "reliability" for any given claim, potentially marking results as contested or of unknown accuracy.

For an external project, its value lies in the fact that it offers a wide range of freely-licensed, easily queryable metadata, and that it provides a range of unique identifiers. Wikidata is currently aggregating a large number of identifiers which have been matched to Wikipedia articles, and a side-effect of this is that it will be possible to walk between them using only a single query. If you’re building a database of some form—particularly where most of its constituent elements will be covered in Wikipedia in some way—then it may well be worth investigating embedding Wikidata IDs at an early stage, for future use.

Links

Wikisource

edit

Wikisource is a project aiming to build a library of freely-licensed or public-domain source documents. As with Wikipedia, it exists in several independent language editions, with some differences in the way they operate. The focus of Wikisource is predominantly on transcribing scanned documents (or proofreading OCR), and it has built a number of tools to facilitate this.

A number of external projects have worked with Wikisource, either to host already-produced texts, or to use its transcription\correction features. Perhaps most interesting among these was a French project which used it to host a paleography course, training students in understanding archival documents while at the same time transcribing them for public access. In the United States, a project at the National Archives and Record Administration uploaded documents and used Wikisource to transcribe them, then linked out to these transcripts from within the catalogue for the benefit of researchers.

Sister projects

edit

The other projects, not covered above, may still be of interest for specialist cases, but probably are of less interest to most academic projects. Note that while all have names titled Wiki—, the term “wiki” is generic and there are many non-Wikimedia projects using similar nameswe don’t operate every wiki! In alphabetical order, they are:

  • Wikibooks (*.wikibooks.org) - a project for producing textbooks.
  • Wikinews (*.wikinews.org) - a project for collaborative reporting/citizen journalism.
  • Wikiquote (*.wikiquote.org) - a collaborative dictionary of quotations.
  • Wikispecies (species.wikimedia.org) - a project to index and taxonomically display all known species.
  • Wikiversity (*.wikiversity.org) - a project to create open educational resources and "learning communities".
  • Wikivoyage (*.wikivoyage.org) - a user-created travel guide. Derived in part from the (non-Wikimedia) Wikitravel project.
  • Wiktionary (*.wiktionary.org) - a dictionary project. Each language edition has multilingual coverage, with definitions and other structure in the local language. There have been various moves to produce a single project with localised translations, but these have not yet succeeded.

All but Wikispecies have multiple language editions; Wikispecies is multilingual in much the same way as Commons, with most work done in English.

See also

edit