Wikipedia:Wikipedia Signpost/Single/2012-07-02

The Signpost
Single-page Edition
WP:POST/1
2 July 2012

 

2012-07-02

Uncovering scientific plagiarism

Debora Weber-Wulff (User:WiseWoman) is a professor at Hochschule für Technik und Wirtschaft in Berlin. Both authors are active on the VroniPlag Wiki, WiseWoman on the German Wikipedia.

Have you ever found yourself sitting with some text, thinking: "Where have I read this before?" Wikipedians face this question every day, when they have to deal with plagiarized content. But plagiarism does not just affect the quality and credibility of articles; nor is it just an issue for university professors and school-teachers marking their students' assignments. It is found at all levels of university research, right up to the writing of scientific papers and doctoral theses.

Over the past year and a half, the German academic community has been rocked by continual plagiarism scandals. Two wiki-based groups have been instrumental in uncovering "text parallels" in doctoral theses by jurists, scientists, industry managers, and politicians. The latest plagiarism to have been exposed was a textbook warning about taking material from the German Wikipedia – while itself plagiarizing Wikipedia in at least 18 places.

Karl-Theodor zu Guttenberg, German minister of defence 2009–11. He was derided at the time as "Baron cut and paste", and "zu Googleberg".
On 16 February 2011 the daily newspaper Süddeutsche Zeitung published the suspicions of a law professor from Bremen, Germany, that the doctoral thesis of the minister of defence, Karl-Theodor zu Guttenberg, contained extensive plagiarism; zu Guttenberg called the accusations "absurd", insisting he would fix the odd erroneous footnote in a second edition.

This angered a number of scientists who had found blatant plagiarism just by googling pieces of text from the thesis. They tried documenting the plagiarism collaboratively using Google Docs, but the platform could not support the more than one hundred people who wanted to edit the document simultaneously. Some computer scientists in the group decided that a wiki would solve the problem, so they moved to the Wikia platform, founded by Jimmy Wales in 2004.

As one of the initiators, User:PlagDoc, describes in an essay recently co-authored with a journalist and published in German and in English, the choice of a wiki enabled an investigative crowdsourcing effort of tremendous proportions: GuttenPlag Wiki. When the dust settled, zu Guttenberg had his doctorate revoked (63% of the lines on 94% of the pages in the thesis submitted were plagiarised) and stepped down as a government minister, moving to the US to escape the heat. The GuttenPlag Wiki received the Grimme Online Award in the "Special" category in 2011; a representative of Wikia accepted the prize as a representative of a group of more than 20,000 occasional and daily editors on the site (press release).

It didn't stop there. In April 2011, large amounts of plagiarism were found in a PhD thesis by the daughter of a high-ranking former Bavarian politician. Those interested in investigating this decided to set up a new wiki for the documentation, VroniPlag Wiki (website). In quick succession, more and more plagiarised theses were documented on the same platform, because people did not want to have to set up a new wiki for each case. Although far fewer contributors are working on this wiki than on the Guttenplag Wiki, they have continually documented plagiarism since the site's inception.

The wiki has an anonymous drop-box where people suggest theses that should be scrutinised, but many tips come in by email, either to the anonymous email addresses set up for the purpose, or to the few people who are reachable by their real name. People come and go, often working intensively on a particular case. Some have stayed and been active on all of the new cases. The group has coalesced into a team of around ten administrators and a handful of sympathetic onlookers, along with the obligatory trolls. A workflow has been set up for collaboratively and transparently documenting plagiarism, announcing the name of the author only when it's clear that a document contains a significant number of text parallels.

Currently 26 cases are documented on the site. Of these, eight doctorates have been rescinded (with several lawsuits pending); three have been declared to be within the bounds of acceptability by the awarding universities, although those institutions have provided no explanations for the substantial numbers of text parallels. The extensive documentation has demonstrated that plagiarism is not just an occasional incident, but something that the German university system must now get serious about. Case 25, unusually not a thesis but a textbook for law students on scientific methods in the age of the Internet, was a striking case that would be humorous if it were not so serious: not only was the chapter on plagiarism plagiarized, it warned of the dire consequences of taking material from Wikipedia, while lifting a good 18 pieces themselves. The book was promptly withdrawn by the publisher after it was outed on VroniPlag Wiki.

Massive text parallels have been documented on VroniPlag Wiki in two dissertations from Poland and Denmark, suggesting that plagiarism in university research degrees is widespread. The Danish case is also interesting, as the plagiarist is a Pakistani citizen who published many papers as well as his dissertation on "terrorist" networks – partly by taking text blocks – often word-for-word – from older papers about criminal networks and just replacing the word "criminal" with "terrorist". Other cases not on VroniPlag Wiki have involved the Romanian minister of education, the Romanian prime minister, the Hungarian president, an official in Thailand, and a parliamentarian in South Korea. Documentation is also underway in Russia concerning the dissertation of their new education minister.

In Germany, many universities apparently seem unable to come to terms with the ethics of Internet-based research and publishing methods. The administrations have tended to react to revelations of plagiarism among their graduates in a way that might be labeled Kafka-esque; and there is no real in-university support for plagiarism education or detection, no training for tutors or teachers, no procedure for dealing with lower levels of plagiarism.

The work at these wikis shows how urgent it is to educate people about plagiarism and how to avoid it. Scientific online publishing would also contribute to reducing the amount of plagiarism: if it can be indexed by a search engine, it can more easily be found by software or a simple search on three to five terms from a paragraph.

GuttenPlag Wiki and VroniPlag Wiki are now taken seriously and have contributed to accelerating the otherwise glacial progress in this area in the (German) university system. The writing is on the wall now, with public reaction on-side, although there are significant pockets of resistance; for example, an open letter penned by eight high-ranking former heads of German universities and research organizations and published on 14 June in the Süddeutsche Zeitung requested that this "undignified spectacle" [of published evidence of plagiarism] cease immediately and that the universities be left to their own devices to carry on as before. Public discussion like this about scientific matters does not happen often in Germany.

The experiences of the past year and a half have shown that plagiarism is a widespread phenomenon – not only in Germany. It affects universities large and small, in many fields of study at all levels. Plagiarists may think they are being smart to be re-using electronically available materials for their own texts – but they forget that there are people well-versed with online research instruments and scientific texts who are no longer willing to let others achieve scientific merit by illegitimate means. Using wiki technology to collaboratively fight plagiarism, the latter have joined forces and have become major new players in the scientific community.

Reader comments

2012-07-02

Representing knowledge – metadata, data and linked data

Neil Jefferies is the research and development project manager at the Bodleian Libraries. He was involved with the initial setup of the Eprints and Fedora Repositories at the University of Oxford, and is now working on the implementation of a long-term digital archive platform. He is the technical director of Cultures of Knowledge, a collaborative project launched in 2009 "to reconstruct the correspondence networks central to the revolutionary intellectual developments of the early modern period".

This piece examines a key question that new Wikimedia projects such as Wikidata are concerned with: how to properly represent knowledge digitally at the most basic level. There is a real danger that an inflexible, proscriptive approach to data will severely limit the scope, capabilities and ultimate utility of the resulting service.

At one level, the textual representation of information and knowledge in books and online can be viewed as simply another serialisation and packaging format for information and knowledge, optimised for human rather than machine consumption. Within the Wikipedia community – Wikidata and elsewhere – there is a perceived utility in using more structured, machine-friendly formats to enable better information sharing and computer-assisted analysis and research. However, there remains a lot of debate about the best approach, to which I will contribute the views I have developed over nearly a decade of research and development projects at the Bodleian Library[1] and before that, through my involvement with knowledge management in the commercial domain.

A graphical depiction of a very simple XML document
My first point is that metadata and data are really different aspects of a continuum. In the majority of cases, data acquires much of its meaning only in connection with its context, which is largely contained within so-called metadata. This is especially true for numerical data streams, but holds even for data in the form of text and images: when and where a text was written are often critical elements in understanding the meaning.[2] Data and metadata should be considered not as distinct entities but as complementary facets of a greater whole.

Secondly, there will be no single unifying metadata "standard" (or even a few such standards), so deal with it! For example, biosharing.org lists just under 200 metadata standards for experimental biosciences alone. The notion of a single standard that led to the development of MARC, and latterly RDA, in the library sphere is simply not applicable to the way in which metadata is now used within the field of academic enquiry. This means that any solution to handling digital objects must have a mechanism for handling a multiplicity of standards, and ideally within an individual object – for example, bibliographic, rights and preservation metadata may quite reasonably be encoded using different standards.[3] The corollary of this is that if we have such a mechanism there is no need to abandon existing standards prematurely. This avoidance of over-proscribing and premature decision-making will be familiar to Agile developers. Consequently, Wikidata developers would be ill-advised to aim for a rigid, unitary metadata model – even at a basic level, representing knowledge is too complex and variable for such an approach.

So how do we balance this proliferation of standards with the desire for sharing and interoperability? We can find several key areas in which a consensus view is emerging, not through explicit standard-setting activities but through experience and necessity. This gives us a good indication that these are sensible points on which to base longer-term interoperability.
The logo of the Text Encoding Initiative (TEI)
  1. An emergent data/object model. Besides the bibliographic entities, such as digitised texts, images and data, a number of key types of "context-object" recur when we start to try to build more complex systems for handling digital information. This can be seen in such diverse areas as the specifications for TEI, Freebase, CERIF and schema.org. The most important of these elements are people, places, vocabularies/ontologies and the notion of time dependency. Indeed, for many projects in the humanities, these objects actually form the basis for expressing ideas and framing discourse using the conventional bibliographic objects to provide an evidentiary base.
  2. Aggregations as a key organising tool for this expanded universe of digital objects. In many cases, these aggregations are also objects in their own right, representing content collections, organisations, geopolitical entities and even projects – each potentially with a history and other attributes. An essential characteristic of aggregations is that they need not be hierarchical, but rather a graph capable of capturing the more unstructured, web-like way people have of organising themselves and their knowledge.[4]
  3. Agreement on essential common properties. For each object type there is usually a general consensus on a minimal set of properties that are sufficient to both uniquely identify an object and provide enough information to a human reader that the object is the one that they are interested in. Often, the latter is actually a less strict requirement as a person can use circumstantial evidence such as the context in which an object occurs for disambiguation. While it is desirable to try to capture contextual information systematically, we have to accept that this is frequently not done. Sources for this common baseline could include Dublin Core (or dcterms to be explicit) records, DataCite records, gazetteers, and name authority lists, for example.

These common properties are obviously very amenable to storage and manipulation in a relational database. Indeed, for large-scale data ingestion with the following clean-up, de-duplication and merging of records/objects, this is likely to be the best tool for the job. However, once this task has been completed and we delve into the more varied elements of the objects, the advantages of a purely relational database approach are less clear-cut.

Instead, we can treat each object as an independent, web-addressable entity – which in practice is desirable in its own right as a mode of publication and dissemination. In particular, we can use search engines to index across heterogeneous fields – Apache Solr excels at faceting and grouping, while ElasticSearch can index arbitrary XML without schemas (i.e. all of the varied domain-specific metadata). These tools give users ways into the material that are much easier to use and more intuitive.

The objects alone are only a part of the picture – the relationships between objects are critical to the structure of the overall collection. In fact, in many cases (especially in the humanities) a significant proportion of research activity actually involves discovering, analysing and documenting such relationships. The Semantic Web or, more precisely, the ideas behind the Resource Description Framework (RDF) and linked data, provide a mechanism for expressing these relationships in a way that is structured, through the use of defined vocabularies, but also flexible and extensible, through the ability to use multiple vocabularies. While theoretically it is possible to express all metadata in RDF, this is not practical for performance[5] and usability[6] reasons, and is unnecessary.

This model of linked data, combining a mix of standardised fields and less-structured textual content, should not be entirely unfamiliar to people used to working with Semantic MediaWiki, sharing their metadata on Wikidata, or using data boxes in Wikipedia! However, when applying this model to practical research projects it emerges that a critical element is still lacking. Although we can describe relationships between objects using RDF, we are limited to making assertions of the form <subject><predicate/relationship><object> (the RDF "triple"). In practice, relatively few statements of this form can be considered universally and absolutely true. For example: a person may live at a particular address but only for a certain period of time; the copyright on a book may last for 50 years, but only in a particular country. Essentially, what is needed is a mechanism to define the circumstances under which a relationship can be considered valid. A number of possible mechanisms could do this – replacing RDF triples with "quads" that include a context object; annotation of relationships using OAC.

These examples are really just special cases of a more general requirement that is of great interest to scholars. This is the ability to qualify a relationship or assertion to capture an element of provenance. Specifically, we need to know who made an assertion, when, on the basis of what evidence, and under which circumstances it holds. This may be manifested in several ways:

  • Differences of scholarly opinion – it should be possible for there to be contradictory assertions in the data relating to an object, provided we can supply the evidence for each point of view.
  • Quality of the evidence – information can be incomplete, or just unclear if we are dealing with digitised materials. In this case we want to capture the assumptions under which an assertion is made.
  • Proximity of evidence – we may have an undated document but if we know the biography of the author we can place some limits on probable dates. This evidence is not intrinsic to the object but can be derived from its context.
  • Omissions – collections are usually incomplete for various reasons. It is important to distinguish the absence of material as a result of inactivity or specific omission from subsequent failures in collection building.

These qualifications become especially important when we try to use computational tools such as analytics and visualisation. Indeed, projects such as Mapping the Republic of Letters (Stanford University) are expending significant effort to find ways of representing uncertainty and omission in visualisations.

I believe there needs to be a subtle change in the mindset when creating reference resources for scholarly purposes (and, arguably, more generally). Rather than always aiming for objective statements of truth we need to realise that a large amount of knowledge is derived via inference from a limited and imperfect evidence base, especially in the humanities. Thus we should aim to accurately represent the state of knowledge about a topic, including omissions, uncertainty and differences of opinion.

Notes

  1. ^ In particular, Cultures of Knowledge.
  2. ^ Usefully, most books come with a reasonable amount of metadata (author, publisher, date, version etc.) encapsulated in the format, but this represents somewhat of an anomaly. Before the advent of the book and, more recently, in online materials, metadata tends to be scarcer.
  3. ^ However, I concede that it is not unreasonable to expect that things are generally encoded in XML with a defined schema.
  4. ^ Our own experience of trying to model the organisational structure of the University of Oxford (notionally hierarchical) convinced us that this was essential.
  5. ^ RDF databases (triple stores) currently scale to the order of billions of triples – this limit can be reached quite easily when you consider that the information in a MARC record for a book in a library may have well over 100 fields.
  6. ^ RDF is a very verbose format. Existing domain-specific XML formats can be much easier to read and manipulate.


Reader comments

2012-07-02

RfC on joining lobby group; JSTOR accounts for Wikipedians and the article feedback tool

WMF RfC on lobbying

On June 28, the Wikimedia Foundation started a request for comment (RfC) on whether the community feels the foundation should participate in the Internet Defense League, a proposed lobbying organization with the goal of protesting future anti-piracy legislation.

According to the RfC, the organization to be launched aims to build a network of stakeholders interested in activism against legislation such as SOPA and the PROTECT IP Act. League members would be notified if protests such as the SOPA blackout (Signpost coverage) are proposed, but no one would be bound by their membership to take part in any action. The proposed network is a cooperative effort of Mozilla and Fight for the Future. Organizations such as the Electronic Frontier Foundation, WorldPress, and Reddit have already joined.

The WMF's legal and community advocacy department published an evaluative statement. While the proposal could turn out to be "very valuable", it says, the initiative involves many uncertainties yet to be clarified, and joining such a network might lead to perceptions that Wikimedia projects are becoming more political.

At the time of writing, two users have supported membership of the league, while more than 30 are opposed. Supporters pointed out that it is possible to deal with the problems raised and that membership would not be politically problematic. The opposers tabled reasons such as negative implications for the perceived character of Wikimedia as an educational organization and the questioning of our neutrality. Eight users were undecided, primarily saying that insufficient information is presented to judge the merits of such an initiative appropriately. The WMF asks editors to share their views at RfC's discussion page to ensure wide participation from its communities.

Brief notes

  • 100 JSTOR for Wikipedians: On June 26, it was announced that the WMF is looking at an agreement over 100 JSTOR-accounts for Wikimedians. Applications for access can be filled at the project page.
    Jury and winner of the Zedler prize 2012 at the German Wikipedia Academy 2012
  • Article feedback tool: The fifth version will be expanded from 0.6% to 10.0% of the English Wikipedia's articles by July 3. Central notice information for the release will soon be available.
  • Zedler Award: The VroniPlag Wiki, a wiki community examining and documenting the extent of plagiarism in German doctoral theses, has won the German Wikimedia chapter's Zedler award for free knowledge as the best external project in 2012. More details on the work of this community can be found in this week's Signpost analysis. The other two main category prizes – article of the year and internal project of the year – were awarded to the German Wikipedia article on the Fukushima nuclear disaster and the Austrian monument list project, aiming at the improvement and creation of Austrian monument articles. Two special rewards went to the flower of the week, an award for significant contributions to the project and its community, and the authors of the good article pizza boxes.
  • WMF board resolution on board visitors: The WMF board of trustees published a resolution on its visitors. It was decided to formalize the 2011 experiment and have up to two visitors taking part in the board's meetings. Visitors will have significant content, financial or other relevant expertise.

    Reader comments

2012-07-02

Public relations on Wikipedia: friend or foe?

Conflict of interest guide

The Chartered Institute of Public Relations (CIPR), a British association of public relations professionals, has released the first version of a conflict of interest guideline in collaboration with Wikimedia UK (WMUK).

Paid editing has long been a contentious topic on Wikipedia. The Signpost has reported on the topic many times in the last several years, like the MyWikiBiz debacle ("Account used to create paid corporate entries shut down" [2006]), Microsoft's attempt to monitor articles ("Microsoft approach to improving articles opens can of worms" [2007]), issues surrounding diploma mills ("Report of diploma mill offering pay for edits" [2007]), and a public relations firm's edits ("The Bell Pottinger affair" [2011]), but it received its most substantial treatment in 2012, with "Does Wikipedia Pay?", a series of interviews with paid editing supporters ("The Facilitator: Silver seren", "The Consultant: Pete Forsyth", and "The Communicator: Phil Gomes"). The short answer is that paid editing has traditionally been severely discouraged on Wikipedia, but recent attempts by the public relations industry to forge links with Wikipedia have garnered some support for the idea, though it is far from a consensus.

[W]here there is a clear conflict of interest created by the relationship between the public relations professional and the subject of the Wikipedia entry, such as a client or employer, they should not directly edit it.

—Jane Wilson, CIPR CEO

After the Bell Pottinger incident, CIPR and WMUK began a collaboration to draft 'best practice guidelines' for public relations professionals' relationship with Wikipedia's articles. The CIPR guidelines demonstrate the fruits of this, as the document has much in common with Wikipedia's own conflict of interest guidelines and was developed on the Wikimedia UK wiki. Among its most notable provisions is the highly visible and repeated stipulation that public relations professionals should not directly edit articles they have a conflict of interest with except in extremely limited circumstances, along with its recommendation to "operate within the system" and a step-by-step guide to addressing problems in a topic they are being paid to correct.

Statements from the association also reflect the Wikipedia guideline's influence: "The main theme of the guidance is quite simple – where there is a clear conflict of interest created by the relationship between the public relations professional and the subject of the Wikipedia entry, such as a client or employer, they should not directly edit it. Such an activity would be unethical and lacking in transparency and therefore potentially against CIPR’s own guidance on digital communication and social media" (CEO Jane Wilson). The Wikimedia Foundation, through its head of communications Jay Walsh, also reacted positively, saying "CIPR's basic message, ... that PR folks editing Wikipedia directly is problematic, echoes what we hear from the community of Wikipedia contributors. Those who come to Wikipedia with a clear conflict of interest are generally going to face real challenges in terms of editing and contributing to the project."

There is some resistance to the guidelines as they currently stand, however. The Public Relations Society of America released a statement praising the CIPR-WMUK collaboration, yet cautioning that working within Wikipedia's guidelines is not their ideal solution, and collaboration with entities like the Corporate Representatives for Ethical Wikipedia Engagement is needed (see previous Signpost coverage on CREWE).

The association plans to release future versions as the guidelines mature and receive more attention from Wikipedians and public relations professionals. (more information in the Holmes Report, Mediabistro; CIPR guidelines)

In brief

  • Australian political scandal moves on-wiki: Despite the Health Services Union expenses affair's diminishing popularity in the "real world", the Sydney Morning Herald has reported on edit wars revolving around the articles relating to the scandal. After the article was published, one Wikipedia editor commented that they had "one good thing [to say about that Fairfax article. It seems to have silenced those Wikipedia editors it suggested may have conflicts of interest.]"

    Reader comments

2012-07-02

Discussion reports and miscellaneous articulations

The following is a brief overview of the current discussions on the English Wikipedia.

Extend use of authority control
Authority controls are unique identifiers to differentiate objects. Only about 4,000 biography articles on the English Wikipedia use these controls, while the German Wikipedia has about 220,000 such articles, and Commons an unknown number.
New user group: moderator
"Moderators" would face the standard RFX process but would receive fewer tools. Moderators would not receive tools that "deal with the assessing of editor behaviour", like block or protect. The number of user rights has been reduced from an administrator's 54 to 17, just a few more than autoconfirmed users. As the originator of the proposal, Jc37 states, "The goal here is to not add to admin's work, but to give the moderator ... [the] ability to assess consensus [and] handle content-related issues without needing to run to an admin, because the moderator, in these situations, will be as trusted as an admin to perform them."
3rd party unblock requests
Comments have been requested regarding allowing a third party to request an unblock. An editor can see a block they believe is unjust and request a review of the block. Experienced editors who can interpret policy can help new editors who may have a harder time understanding the vast policies of Wikipedia.
Can it be verifiable and not the truth?
A look into the rewording of the content of the verifiability policy. Five versions of the lede are under discussion, along with 12 questions regarding the content. The last request for comment on this topic ended without a consensus.
Updating level-one user warnings
A study looking at how users reacted to warnings was conducted to improve the warnings so they do not bite the newcomers, while still deterring vandalism. Comments are requested on whether the new warnings are adequate or still need improving.
Internet Defense League and the Wikimedia Foundation
The Internet Defense League has approached the Wikimedia Foundation to join their ranks. The Wikimedia Foundation is requesting your input as to whether or not they join the league. The Internet Defense League is a group of websites that will join in any future protests against anti-piracy legislation (see related Signpost coverage).
Global ban policy
Since the update of the terms of service in May, a process for a global ban policy is yet to be decided. The policy is for problematic editors who have been blocked from multiple communities.

Reader comments

2012-07-02

Summer sports series: burning rubber with WikiProject Motorsport

WikiProject news
News in brief
Submit your project's news and announcements for next week's WikiProject Report at the Signpost's WikiProject Desk.
Formula One racing involves single-seater open-wheel cars
NASCAR is the largest sanctioning body of stock car racing in the United States
Motorcycle racing can come in a variety of formats including road racing, oval track racing, and motocross
Rally racing takes place on public or private roads using modified yet still road-legal cars
Demolition derby involves competitors deliberately ramming their vehicles into one another
Lawn mower racing in Australia

This week, we caught up with WikiProject Motorsport. The project, which dates back to December 2006, oversees a wide variety of child projects covering auto racing around the world. Among the project's 4,636 articles are 14 featured articles and featured lists. Members of WikiProject Motorsport work on a list of important tasks and burn through a large backlog of unassessed articles. The project also maintains a portal and two task forces. We interviewed NaBUru38 and Royalbroil.

What motivated you to join WikiProject Motorsport? Do you prefer one form of auto racing over others? Are you also a member of any child projects of WikiProject Motorsport? Have you been involved with the history of motorsport or touring cars task forces?

NaBUru38: I've been a motorsport fan since I was a young child. My family tells me that I could pronounce car brands before the simplest words! One of the reasons that I joined Wikipedia was to find and share information and stories about my passion – both in Spanish (my mother tongue) and English – so joining the projects was natural.
I enjoy all types of motorsport. I'm better informed and more interested in some disciplines (formula racing, sports car racing, touring car racing, rallying), but Wikipedia is special in that it includes less known disciplines, and I like helping to build those articles too.
Royalbroil: I was involved with the WikiProject since inception. We started the WikiProject Motorsport to coordinate items that apply to multiple genres of motorsports. Many drivers have switched genres throughout their career.
I prefer attending motorsports over television. I like to see the local weekly drivers who are doing it for the fun of it knowing that they'll probably lose money in the process. My favorites are off-road racing and stock car racing.
I'm a member of WikiProjects NASCAR, American Open Wheel Racing, Sports Car Racing, was in IROC before the series became defunct, and the History of Motorsport taskforce.

Since WikiProject Motorsport serves as an umbrella project for an array of auto racing projects, how does WikiProject Motorsports differ from Wikipedia's other umbrella projects? Is it difficult to attract contributors from the child projects? How has WikiProject Motorsport been able to foster lively talk page discussions when other umbrella projects like WikiProject Sports and WikiProject Arts have failed?

NaBUru38: I only know the Motorsport and Automobiles project in Spanish and English, so I can't compare them with other projects. I guess that umbrella projects lack the unconditional passion of fans of specific varieties of sports and arts. It's hardcore fans like us who build most of Wikipedia. However, my view is that Wikipedians should get involved in more subjects. We never know what can fascinate us, so just explore the millions of articles.
Royalbroil: WikiProject Motorsport has been used as the binding decision maker for things affecting child WikiProjects. Also, some types articles don't fit neatly into the child WikiProjects such as those relating to drag racing.

Are there particular aspects of motorsport that are poorly covered by Wikipedia? What can be done to fill these voids?

NaBUru38: In the English Wikipedia, motorsport in English-speaking regions tends to be well covered, as well as world, European and German competitions. Other regions lack coverage, in particular Eastern Europe, Latin America and Asia (except Japan). The alternatives are to attract more fans from those regions and to translate articles from other Wikipedia editions (e.g. Spanish and Portuguese in the case of Latin America).
Another issue is that there are so many competitions, drivers and teams that articles get outdated very quickly, especially the less popular ones. For that reason, in Spanish I don't maintain articles on seasons, and edit articles on drivers only once or twice a year. My weekly work is to add race winners to the tables in articles on races and circuits.
Recentism hurts us too: there are lots of articles on current drivers, even the less successful, but retired and deceased drivers are often ignored. There a reason for that: Internet has more information on current drivers. As a project, we should tackle that issue.
Royalbroil: Major American professional drivers in the off-road racing and drag racing genres are poorly covered. There don't appear to be many interested contributors and I have worked on a bunch of the biggest names. WikiProject Formula One has done an excellent job with developing articles on their drivers.

How does the project handle the notability of races, teams, owners, and drivers? What are some helpful resources you've used when sourcing motorsport-related articles?

NaBUru38: Sports has a rule of thumb rule on notability: winners are more relevant than losers! That doesn't happen with academics and artists, so there's an advantage for us. Some project members dedicate a lot of time in chasing particularly irrelevant articles, so the obvious cases are dealt with already. Anyway, as with other subjects, notability rules in motorsport are more relaxed that I would prefer. We haven't found a solid policy, which we need. For example, I would allow much less content on development series. They often enjoy considerable media coverage, but aren't as relevant to me as national semi-professional championships.
The best sources are closed, for example Autosport requires subscription for stories older than a month. There are some very interesting open databases, which are useful to find results. However, they lack prestige, which can be an issue with non-motorsport editors. More importantly, they lack the stories behind the numbers. Motorsport is always about racers first. A solution could be to use newspaper archives.
Royalbroil: WP:NMOTORSPORT was developed as part of WP:ATHLETE to include only professionals. I disagree with NaBUru38 and think it's a bit too restrictive to not allow some of the top semi-professional / development people who have made major accomplishments like national championships – but only if they meet the general notability guideline. Sources vary depending on the series and the country. Some sanctioning bodies have lots of articles about their drivers. Other sources including sports television networks like ESPN/Speed, newspapers, and magazines.

Is it difficult to obtain images for motorsport-related articles? Is it harder to find images for particular periods in the history of motorsport? How can auto racing fans contribute images from their next trip to the racetrack or tour route?

NaBUru38: Wikimedia Commons hosts lots of images of contemporary motorsport in the United States, United Kingdom and Germany, mainly in circuit racing and rallying. We need to contact fans from other disciplines and other countries to take pictures and upload them, which is very easy. Promoters don't mind – actually a fellow Uruguayan promoter was happy for my work.
Another matter is older motorsport, where most of the material is privative and in print. A way of to find pictures of older vehicles is historic events, where collectors show old vehicles – and often race them! Nevertheless, it's crucial to find historic motorsport pictures. Crowds and safety measures were very different back then! A more relaxed copyright policy would help us a lot.
Royalbroil: I get around a lot and have uploaded images from many types of motorsports from my collection. I've used Flickr to find excellent photographers from many genres who have agreed to freely license their images. All you have to do is ask. I've been successful about 50% of the time, as long as you ask someone who is currently active on flickr. Two of my flickr friends used to photograph professionally, one for NASCAR and another for a major national motorsports magazine in the United States. I've been able to find free photographs from most American drivers back into the 1970s.

Has WikiProject Motorsport planed any collaborations between the various motorsport projects? Are there any initiatives WikiProject Motorsport could undertake on its own?

Royalbroil: WikiProject Motorsport has developed standardized infoboxes that are used in most genres of motorsports. New members usually find out about us from child WikiProjects.

Anything else you'd like to add?

NaBUru38: My two favourite motorsport quotes in fiction are:
Royalbroil: I'd add "Second place is the first loser" – Dale Earnhardt

The summer sports series continues next week with the most popular sport on Earth. Until then, kick around our previous articles in the archive.

Reader comments

2012-07-02

Heads up

This edition covers content promoted between 24 and 30 June 2012
A new featured picture showing a flock of James's Flamingos during their mating ritual. The males vocalize and stick their necks and heads straight up in the air, turning them back and forth; females initiate mating by walking away.
A collection of animal penises at the Icelandic Phallological Museum, subject of a new featured article
L'Oceanogràfic in Spain; a new featured picture

Six featured articles were promoted this week:

  • Smith Act trials of Communist Party leaders (nom) by Noleander. A series of trials were held from 1949 to 1958 in which 144 leaders of the Communist Party of the United States were accused of violating the Smith Act. The prosecution argued that the Party's policies promoted violent revolution, while the defendants countered that they advocated a peaceful transition to socialism, and that the trials violated their First Amendment rights. The trials led to the US Supreme Court decisions Dennis v. United States and Yates v. United States.
  • Arthur Mold (nom) by Sarastro1. Mold (1863–1921) was an English professional cricketer who played first-class cricket for Lancashire as a fast bowler between 1889 and 1901. He began his career in the mid-1880s, and quickly became successful after qualifying at the county level. He became one of the best bowlers in the country, but his achievements were tainted by charges of throwing, which eventually led him to leave the sport. He took 1,673 wickets in first-class matches.
  • Grand Teton National Park (nom) by MONGO. Grand Teton National Park is a United States National Park in northwestern Wyoming that was established in 1929 and measures approximately 310,000 acres (130,000 ha) in size. The park includes the major peaks of the 40-mile (64 km) long Teton Range, as well as most of the northern sections of the valley known as Jackson Hole. Humans have had a presence in the area for 11,000 years, but the ecosystem remains pristine.
  • Icelandic Phallological Museum (nom) by Prioryman. The Icelandic Phallological Museum in Reykjavík, Iceland, houses the world's largest collection of penises and penile parts. The collection of 280 specimens from 93 species of animals includes 55 penises taken from whales, 36 from seals and 118 from land mammals, as well as a single human penis; the museum hopes to find a younger, larger specimen of the latter. The museum, established in 1997, has thousands of visitors annually – mostly women.
  • Doc Adams (nom) by Giants2008. Adams (1814–1899) was an American baseball player and executive who is regarded by historians as an important figure in the sport's early years. With the New York Knickerbockers he served as a player, and later president, vice president, treasurer, and director. While still maintaining his medical practice, Adams helped write the rules of baseball, although this was forgotten for nearly a century.
  • Byzantine civil war of 1341–1347 (nom) by Cplakidas. The civil war of 1341–1347 was a conflict that broke out after the death of Andronikos III Palaiologos over the guardianship of his young heir, John V Palaiologos. It pitted the emperor's chief minister, John VI Kantakouzenos, against an alliance from the regency. The war polarized Byzantine society along class lines, with the aristocracy backing Kantakouzenos and the lower and middle classes supporting the regency. John VI eventually won, but the instability proved disastrous for the foundering empire.

Four featured pictures were promoted this week:

Middle Teton and Grand Teton in the winter; both mountains are in Grand Teton National Park, the subject of a new featured article.


Reader comments

2012-07-02

Three open cases, motion for the removal of Carnildo's administrative tools

No cases were closed or opened, leaving the number of open cases at three. One motion was filed this week.

Open cases

(Week 6)

The case concerns alleged misconduct with regards to aggressive responses and harassment by toward users who question his actions. The case was brought before the committee by MBisanz. The other parties are Michaeldsuarez and Delicious carbuncle. A decision is expected on 6 July.

In response to a workshop proposal calling for the removal of his adminship, Fæ's administrator rights were removed at his request on 18 June; he has declared he will not pursue RfA until June 2013, and that should another user nominate him and he feels confident to run, he will launch a reconfirmation RfA rather than requesting the tools back without community process.

Falun Gong 2 (Week 5)

The case was referred to the committee by Timotheus Canens, after TheSoundAndTheFury filed a "voluminous AE request" concerning behavioural issues related to Ohconfucius, Colipon, and Shrigley. The accused deny his claims and decried TheSoundAndTheFury for his alleged "POV-pushing". According to TheSoundAndTheFury, the problem lies not with "these editors' points of view per se "; rather, it is "fundamentally about behaviour". A decision is expected on 8 July.

Perth (Week 3)

The case, filed by P.T. Aufrette, concerns wheel-warring on the Perth article after a contentious requested move discussion (initiated by the filer) was closed as successful by JHunterJ. The close was a matter of much contention, with allegations that the move was not supported by consensus. After a series of reverts by Deacon of Pndapetzim, Kwamikagami and Gnangarra, the partiality of JHunterJ's decision was discussed, as was the intensity of Deacon of Pndapetzim's academic interests in the topic. Questions were also raised about the suitability of the new move review forum.

In a workshop proposal, uninvolved user Ncmvocalist outlined in proposed principles the need for administrators to lead by example, behave respectfully and civilly in their interactions with other users, learn from experience, and avoid wheel-warring irrespective of the circumstances or nature of the dispute; and that WikiProjects are not platforms for point-of-view pushing or the pushing of one's own agenda and where consensus cannot be reached other venues of discussion should be sought out. Proposed decisions are due on 12 July.

Motions

A motion was filed by arbitrator PhilKnight calling for the removal of Carnildo's administrative tools for "long-term poor judgement" in his use of the tools. Carnildo may regain the tools via a successful request for adminship. At the time of writing, seven arbitrators are in unanimous support of the motion, a majority of 8 is needed for the motion to pass.

Reader comments

2012-07-02

Initialisms abound: QA and HTML5

What is: QA?

[The Wikimedia Foundation's] strategy is to focus on two areas: [testing] automation; and building a testing community. We’re hiring people to coordinate these two areas.

—WMF QA Lead Engineer Chris McMahon

The logo of Jenkins, a key piece of software used to provide automated testing facilities and the subject of a query on the wikitech-l mailing list this week as to its exact role in that process.

This week a blog post by WMF engineer Chris McMahon put the spotlight on an area that does not often reach the pages of the Signpost: quality assurance (QA), a diverse remit spanning interface testing, process improvement, and project monitoring.

McMahon is currently the only employee of the foundation with specific responsibility for quality assurance; the WMF is currently seeking a volunteer QA coordinator and a QA engineer to work alongside him. Their work will centre not only on discovering defects, McMahon writes, but investigating software to provide valuable information about that software from every point of view [and] examining the process by which the software is created, from design to code to test to release and beyond". If recent experience is anything to go by, McMahon and the two new hires will have their work cut out: many, if not all, Wikipedians can recite a list of bugs that have affected them in the recent past.

What makes QA across MediaWiki (the software that powers Wikimedia wikis) and the day-to-day running of those sites so difficult? "The development process involves so many contributors, with code coming in from so many sources and projects," writes McMahon, who also hints at the problems of being leader rather than follower in the world of rapid website testing. When finished, the processes currently being formulated are "intended to be a reference implementation, an industry standard for high-quality browser test automation".

According to the blog post, the foundation is also cultivating two relationships in the world of QA: the first with crowdsourcing website Weekend Testing; the second with technology non-profit OpenHatch.org, for whom MediaWiki testing constitutes their first foray into the world of software testing (the WMF is also employing OpenHatch in an area closer to its expertise – technology education (previous Signpost coverage). With the WMF QA department still in its infancy, the long-term utility of the measures they are now embarking on are not yet known.

HTML5 coming (again) (maybe)

The HTML5 logo

Version 5 of the HTML standard may once again be enabled for use on Wikimedia wikis, well over a year after the first attempt to flick the switch was abandoned almost immediately (see previous Signpost coverage). WMF Director of Platform engineering Rob Lanphier this week expressed renewed interest in the switchover, suggesting a late July date for what would be the second attempt to implement the increasingly common standard (wikitech-l mailing list).

Fundamentally, the change is not a difficult one, requiring only the simple replacement of a single line of code. However, as the Signpost reported in February 2011, changing even that one line has the potential to break any tool reliant on so-called "screen-scraping" – reliant, in other words, on reading a page's HTML rather than a more machine-friendly version, such as that provided by the MediaWiki API. Then, even major tools like Twinkle were vulnerable to such problems; thankfully, all of the big-name tools are now far less reliant on the exact code used to generate the page, and as such will almost certainly survive the switchover. But other less well-maintained tools may not be so lucky, requiring the change to be well-trialed. The other bug raised at the time, relating to citation IDs, looks to have been resolved since, making a July switchover look all the more feasible.

Enabling HTML5 mode signals to browsers that they should display Wikimedia wikis in HTML5 mode, complete (once MediaWiki's own support is improved) with <video> tags, canvases and native support for form validation. Users should note that certain, long-deprecated markup will cease to function, most notably <font> and <center> tags, which are common in user signatures and on user pages, despite not being officially supported by MediaWiki itself.

In brief

Signpost poll
Visual Editor
You can now give your opinion on next week's poll: Which of these best sums up your view about HTML5 on Wikimedia wikis?

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.

  • MediaWiki 1.20wmf6 hits Wikimedia wikis: 1.20wmf6 – the sixth release to Wikimedia wikis from the 1.20 branch – was deployed to its first wikis on 25 June and, in a condensed schedule due to the demands of US national holidays and Wikimania, is now in use across all Wikimedia wikis. The release incorporates some 158 changes to the MediaWiki software that powers Wikipedia, comprising 87 "core" changes and 71 changes to affected WMF-deployed extensions. Among the changes (themselves the production of approximately two week's worth of development time) are the creation of a {{PAGEID}} magic word and the recapitalisation of the language names used in the sidebar. A release to external sites including the same selection of bug fixes and new features is not expected for some time.
  • Three bots approved: 3 BRFAs were recently approved for use on the English Wikipedia:
    • AvicBot's 11th BRfA, performing Category (re)moves as listed on WP:CFD/W;
    • DPL bot's 3rd BRfA, tagging and removing tags from articles based on whether they should have the {{dablinks}} template;
    • TowBot's 1st BRfA, purging the templates Cite web, Cite news, and Cite book daily so that they show the correct date;
At the time of writing, 14 BRFAs are active. As usual, community input is encouraged.

Reader comments
If articles have been updated, you may need to refresh the single-page edition.