Wikipedia:Wikipedia Signpost/2015-07-29/Recent research

Recent research

Wikipedia and collective intelligence; how Wikipedia is tweeted

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

An article[1] in Social Science Computer Review presents an argument that Wikipedia is an example of collective intelligence. It is primarily a theoretical piece, but the author is well-informed about Wikipedia's everyday workings, illustrating the theory with his knowledge of Wikipedia. The article heavily relies on Pierre Lévy's notion of "humanistic collective intelligence". The author argues that Wikipedia displays some key characteristics of a collective intelligence process, such as software optimized for stigmergy (a mechanism of indirect coordination between agents or actions, such as the existence of edit history, talk pages, etc.); distributed cognition (such as existence of bots, and division of tasks between various tools and individuals, facilitating their actions), and possibly, through it is not possible to prove beyond any doubt, emergence (a process whereby larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties). The author concludes that Wikipedia thus exemplifies a special kind of collective intelligence, the aforementioned humanistic collective intelligence proposed by Lévy.

#Wikipedia and Twitter

review by Kim Osman

This study from OpenSym '15[2] analysed 2.5 million tweets, collected over a five-month period on Twitter, that linked to Wikipedia pages. The authors found tweets referencing Wikipedia in both English and Japanese linked to pages from their respective language versions of Wikipedia nearly all the time (97 and 94 percent respectively). However, in other languages, tweets often linked to a different language version of Wikipedia - roughly one fifth of the time. Interestingly, tweets in Indonesian referenced another language version more than half the time (linking to English Wikipedia in half the tweets) and of the links to English Wikipedia the authors found that 75% of linked articles did not have an equivalent Indonesian version. There was a long tail distribution of articles among the analysed tweets, with the authors noting certain “events” (like the Gamergate controversy) generating multiple tweets. Of the Top 20 Twitter users in the dataset, 19 were bots, with the most prolific tweeter being Wikipedia Stub Bot (@wpstubs). The authors do note that in their study there is not enough evidence to support the relationship between “how actively edited a certain article is and its popularity on Twitter.” This study does however raise interesting questions about the platform relationship between Wikipedia and Twitter and the role of bots in creating and maintaining this association. The authors note future research could consider the role of events in popularising Wikipedia articles on Twitter along with further examining motivations for inter-language linking on Twitter.

Briefly

"As of early 2015, the typical edit [on the English Wikipedia] is made by an account that is over 5 years old."
  • How old is the account making an average edit? Among other charts recently created by Dragons flight to visualize statistical data about the English Wikipedia community, this one shows that "the long-term trend is for the active community to gain about 6 months in average age for every year of time that passes in real life."
  • Simplifying sentences by finding their equivalent on Simple Wikipedia: A preprint[3] by researchers at the University of Washington describes a method to automatically align sentences on the English Wikipedia and the Simple English Wikipedia about the same facts. Besides a hand-annotated dataset of corresponding (and non-corresponding) sentence pairs used to test and adjust the algorithm, their approach uses a "novel similarity metric" between of pairs of words which is based on synonym information from Wiktionary, resulting in a weighted graph called "WikNet" that consists of "roughly 177k nodes and 1.15M undirected edges. As expected, our Wiktionary based similarity metric has a higher coverage of 71.8% than WordNet, which has a word coverage of 58.7% in our annotated dataset". These datasets are available online. The following pair of sentences are presented as an example for good match found by the resulting method:
    "The castle was later incorporated into the construction of Ashtown Lodge which was to serve as the official residence of the Under Secretary from 1782" (en:Ashtown Castle) vs.
    "After the building was made bigger and improved, it was used as the house for the Under Secretary of Ireland from 1782." (simple:Ashtown Castle)

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

  • "The Virtues of Moderation"[4] presents "a novel taxonomy of moderation in online communities", including a case study of Wikipedia (p.88).
  • "Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation"[5] From the abstract: "We show that using the full graph is more effective than just direct links by a large margin, that non-reciprocal links harm performance, and that there is no benefit from categories and infoboxes ..."
  • "Wikidata through the Eyes of DBpedia"[6] From the introduction: "All DBpedia data is extracted from Wikipedia and Wikipedia authors thus unconciously also curate the DBpedia knowledge base. Wikidata on the other hand has its own data curation interface ... While DBpedia covers a very large share of Wikipedia at the expense of partially reduced quality, Wikidata covers a significantly smaller share, but due to the manual curation with higher quality and provenance information."
  • "WikiMirs: A Mathematical Information Retrieval System for Wikipedia"[7]
  • "Content Translation: Computer-assisted translation tool for Wikipedia"[8]
  • "Peer-production system or collaborative ontology development effort: what is Wikidata?"[9] (to be presented at the OpenSym 2015 conference in August)
  • "Big data and Wikipedia research: social science knowledge across disciplinary divides"[10]
  • "Comparing language development in Wikipedia in terms of page views per Internet users"[11] See also Wiki-research-l mailing list discussion
  • "Understanding Graph Structure of Wikipedia for Query Expansion"[12]
  • "Turning Introductory Comparative Politics and Elections Courses into Social Science Research Communities Using Wikipedia: Improving Both Teaching and Research"[13]
  • "Utilizing the Wikidata System to Improve the Quality of Medical Content in Wikipedia in Diverse Languages: A Pilot Study"[14]
  • "Is it Possible to Enhance our Expert Knowledge from Wikipedia?"[15] From the English-language abstract: "In September 2013 two different questionnaires about medical issues were given to medical students, resident physicians and one medical specialist. The questioning was about diseases/symptoms, examinations/classifications and conservative therapy/surgery of the department of orthopaedics and traumatology. ... The survey has proven the up-to-dateness of Wikipedia articles and their listing on the first or second position on Google. Wikipedia contains a lot of bibliographical references, high-quality images and video material. Almost half (42,5 %) of all evaluated articles are appropriate for use in medical exams and in the daily clinical work."
  • "Predicting elections from online information flows: towards theoretically informed models"[16] From the conclusions: "We have shown good evidence that an 'uncertainty effect' drives much Wikipedia traffic: newer parties which attracted a lot of swing voters received disproportionately high levels of Wikipedia traffic. By contrast, there was no evidence of a 'media effect': there was little correlation between news media mentions and overall Wikipedia traffic patterns. Indeed, the news media and Wikipedia appeared biased towards different things: with news favouring incumbent parties, whilst Wikipedia favoured new ones." (See also coverage of an earlier preprint by the same authors: "Attempt to use Wikipedia pageviews to predict election results in Iran, Germany and the UK")

References

  1. ^ Livingstone, Randall M. (2015-06-26). "Models for Understanding Collective Intelligence on Wikipedia". Social Science Computer Review. 34 (4): 497–508. doi:10.1177/0894439315591136. ISSN 0894-4393. S2CID 60657789. Closed access icon
  2. ^ Zangerle, Eva; Schmidhammer, Georg; Specht, Günther (2015). "#Wikipedia on Twitter: Analyzing Tweets about Wikipedia" (PDF). OpenSym '15. doi:10.1145/2788993.2789845. S2CID 5959813.
  3. ^ William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu: Aligning Sentences from Standard Wikipedia to Simple Wikipedia. NAACL-HLT, 2015. PDF
  4. ^ James Grimmelmann. "The Virtues of Moderation.…" Yale Journal of Law and Technology. 17.42 (2015) http://yjolt.org/virtues-moderation
  5. ^ Agirre, Eneko; Barrena, Ander; Soroa, Aitor (2015-03-05). "Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation". arXiv:1503.01655 [cs.CL].
  6. ^ Ali Ismayilov, Dimitris Kontokostas, Sören Auer, Jens Lehmann, Sebastian Hellmann. "Wikidata through the Eyes of DBpedia". http://arxiv.org/abs/1507.04180
  7. ^ Hu, Xuan; Gao, Liangcai; Lin, Xiaoyan; Tang, Zhi; Lin, Xiaofan; Baker, Josef B. (2013). "WikiMirs: A Mathematical Information Retrieval System for Wikipedia". Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. JCDL '13. New York, NY, USA: ACM. pp. 11–20. doi:10.1145/2467696.2467699. ISBN 978-1-4503-2077-1. Closed access icon
  8. ^ Laxström, Niklas; Giner, Pau; Thottingal, Santhosh (2015-06-05). "Content Translation: Computer-assisted translation tool for Wikipedia articles". arXiv:1506.01914 [cs.CL].
  9. ^ Müller-Birn, Claudia; Karran, Benjamin; Lehmann, Janette; Luczak-Rösch, Markus (2015-05-24). "Peer-production system or collaborative ontology development effort: what is Wikidata?". doi:10.1145/2788993.2789836. S2CID 15126336. OpenSym 2015
  10. ^ Schroeder, Ralph; Taylor, Linnet (2015-02-24). "Big data and Wikipedia research: social science knowledge across disciplinary divides". Information, Communication & Society. 18 (9): 1039–1056. doi:10.1080/1369118X.2015.1008538. ISSN 1369-118X. S2CID 144817168.
  11. ^ Liao, Han-Teng (2015-03-15). "Comparing language development in Wikipedia in terms of page views per Internet users". Blog of Han-teng Liao, Oxford Internet Institute.
  12. ^ Guisado-Gámez, Joan; Prat-Pérez, Arnau (2015-05-06). "Understanding Graph Structure of Wikipedia for Query Expansion". Proceedings of the GRADES'15: 1–6. arXiv:1505.01306. doi:10.1145/2764947.2764953. ISBN 9781450336116. S2CID 8058094.
  13. ^ Kennedy, Ryan; Forbush, Eric; Keegan, Brian; Lazer, David (April 2015). "Turning Introductory Comparative Politics and Elections Courses into Social Science Research Communities Using Wikipedia: Improving Both Teaching and Research". PS: Political Science & Politics. 48 (2): 378–384. doi:10.1017/S1049096514002157. ISSN 1537-5935. S2CID 147555546. Closed access icon / Author's copy
  14. ^ Pfundner, Alexander; Schönberg, Tobias; Horn, John; Boyce, Richard D; Samwald, Matthias (2015-05-05). "Utilizing the Wikidata System to Improve the Quality of Medical Content in Wikipedia in Diverse Languages: A Pilot Study". Journal of Medical Internet Research. 17 (5): 110. doi:10.2196/jmir.4163. ISSN 1438-8871. PMC 4468594. PMID 25944105.
  15. ^ Rechenberg, U.; Josten, C.; Klima, S. (2015). "Is it Possible to Enhance our Expert Knowledge from Wikipedia?". Zeitschrift für Orthopädie und Unfallchirurgie. 153 (2): 171–176. doi:10.1055/s-0034-1396207. ISSN 1864-6743. PMID 25874396. S2CID 196457871. Closed access icon (German, with English abstract)
  16. ^ Yasseri, Taha; Bright, Jonathan (2015-05-05). "Wikipedia traffic data and electoral prediction: Towards theoretically informed models". EPJ Data Science. 5. arXiv:1505.01818. doi:10.1140/epjds/s13688-016-0083-3. S2CID 256241960.