Wikipedia:Wikipedia Signpost/2016-08-04/News and notes

News and notes

Foundation presents results of harassment research, plans for automated identification; Wikiconference submissions open

Maggie Dennis's presentation (13:35–23:20) and Ellery Wulczyn's presentation (23:30–34.25)

Among the most common forms of harassment reported in the slide at 17:31 were content vandalism (27%), trolling or flaming (24%), name-calling (17%), discrimination (14%), and stalking (13%). Less prevalent but more concerning were threats of violence (6%), outing (6%), impersonation (5%), hacking (3%), and revenge porn (2%). Unspecified experiences were rated at 15%.

What becomes clear from viewing Maggie Dennis's presentation is that harassment is a highly prevalent behaviour at the interface of three problematic phenomena that continue to plague the WMF's sites: the gender gap, the flatlining of editor numbers, and the maintenance of the quality of the sites for readers. The Foundation is investigating measures to address the harassment problem in the communities; proposals for impending action include the default protection of user pages, the creation of a help page on all Wikipedias, and research into current mechanisms for dealing with harassment.

Dennis then introduced Ellery Wulczyn, from Wikimedia Research, who explained the progress of a program to develop an algorithmic approach to detecting personal attacks on the English Wikipedia—a collaborative project between the WMF and Jigsaw, a division of Alphabet, a holding company for Google. The project has created a data "pipeline" of examples of personal comments on the site, used this to develop a model for automated detection of harassment, and analysed the data to try to develop a system with the same level of accuracy as humans. Samples of comments were judged by 10 humans and a scale was derived of how likely each comment was to be harassing. From this a model was developed, and the claim was made that this is a 95% match with a later pooled human assessment of whether examples constituted harassment. A demonstration is at wikidetox.appspot.com, which readers are invited to visit and test for themselves. The algorithm determined that there is an 82% likelihood that this statement of mixed but ultimately insulting intention was harassment:

"Congratulations. I don't know whether you are aware of this fact or not, but you have shown your qualified stupidity."

The algorithm determined a 69% probability that "F#@$ you, a$$h0l3" was a personal attack; and the different grammatical contexts of "I will punch your lights out" and "Let's drink punch" were rated at 59% and 17% likelihoods of harassment, respectively. However, Wulczyn pointed out that the system is only as good as the depth of the corpus of personal attack patterns to which it has been exposed, with human rankings; for example, "Your intellect is lacking" was determined as having only a 10% probability of being an attack.

The intention now is to continue the program of "training" the system to achieve scores approaching zero false positives. The immediate goal is to explore the prevalence, dynamics, and impact of personal attacks on the English Wikipedia, and to create a complete historical dataset of talkpage comments with probability scores (which will be released publicly) for input to the "training" process.

The program is still at an early stage. Among the next goals is to integrate the algorithm with the ORES API system to enable extensions and tools to be built on top of the model. Readers with questions or suggestions are welcome to visit the dedicated page on Meta. T

Wikiconference submissions open

2016 Wikiconference North America, which will occur in San Diego, California, from October 7–10. Shown here: the San Diego Central Library, envisaged to be one of the venues for the event.

The 2016 Wikiconference North America, which will occur in San Diego, California, from October 7–10, invites interested editors to submit proposals to host a workshop, seminar, panel, tutorial, or other program during the event. Submissions can be made here. GP

Brief notes

  • Revamped app unveiled: A revamped Android app was released last week for Wikipedia. Its new homepage has been designed to help users access information more quickly and efficiently, and the app is now available worldwide in the Google Play store. More information was released in a Foundation blog post. GP
  • Language detection added to Wikimedia search engine: Wikipedia co-founder Jimmy Wales's vision to provide everyone in the world with an encyclopedia "in their own language" took another step last week when a feature was deployed on the English, French, German, Italian, and Spanish-language Wikipedias that will detect unsuccessful searches that may have been intended in another language. The blog post that announced the feature did not define a timeline for deploying the feature globally (it did state a few other language Wikipedias that will get it next), but did encourage users to test the feature in an online demo. GP