Wikipedia:WikiProject Video games/Newsletter/20161003/Feature
Feature interview: A brief look at Wikidata
editInterviewed by Thibbs
This quarter saw the addition of a new WP:VG guide aimed at assisting editors in the use and management of Wikidata. Located at WP:VG/WD, this guide was drafted by ferret and expanded by Izno. In this issue of the WP:VG Newsletter we sit down with both of them to hear their thoughts and to help us understand the concept a little better.
- When did you first hear of Wikidata? When did you first start editing at Wikidata? Do you find it easier or more difficult than editing at Wikipedia?
I started editing Wikidata in February 2013, which was about a half-year after the project became available for public editing. From memory, that's right around the time that Wikidata started gulping inter-language wikilinks for the majority of wikis, in what is now-known as "phase 1", though my involvement may have been inspired by The Signpost. I believe I had heard of it earlier, but only in the rare announcement.
Wikidata is definitely easier to edit than Wikipedia, for a couple of reasons:
- The UI is very intuitive, especially for digital natives. The editing process is fairly straightforward, and there's lots of auto-completion to help.
- The help pages are fairly brief but comprehensive, just-the-same.
- There are a large number of gadgets and semi-automated tools.
- The community is quick to help with problems.
- It's like Wikipedia in the olden-days, when there was a great interest in "make content" and a lesser in "make good content" (which is not to suggest that "content" and "good content" differ except in verifiability).
- Probably the biggest part that sucks about editing Wikidata is the addition of references. There is a Phabricator task for Citoid support in Wikidata, and there is a gadget to "duplicate" references, but that's about where "easy" ends. I have some other issues with Wikidata, especially with certain elements of the community, but the direct act of editing doesn't really fall into them. --Izno (talk)
- I heard of Wikidata through enwiki. It would come up occasionally on various talk pages, notably template pages I had on watchlist. Interactions with Izno and Czar, in particular, led me to researching use with WP:VG. I believe that started in late 2015, and we've worked at it in spurts since then. Most of my editing in Wikidata has been in support of those efforts. Ease of editing depends greatly on understanding the concept. Once you wrap your head around properties, statements and relationships between items, it's pretty straight forward, with an easy to use user interface. Like Izno, I find the reference tools to be a sore point, especially when adding lots of MetaCritic references. :) -- ferret (talk) 14:20, 3 October 2016 (UTC)
- What is it like editing at Wikidata? Are there Welcome committees and Teahouses like we have at Wikipedia or do editors tend to stick more to themselves? I've heard that there is a Wikidata Game that has been designed to encourage editing Wikidata. Have you heard of it and do you know if it's an educational game like Wikipedia's The Wikipedia Adventure, or whether it's more like gamified editing?
We have an analog to the village pump (in English, that's the project chat) where people tend to gather. Most of the change on Wikidata is driven by solo-editors on a mission and the majority of that effort is (semi-)automated. Most of the collaboration happens when setting up the data model for a particular domain, or a local problem of modeling one specific piece of data on an entity--especially given the disparity of information contained in sources. Sometimes these efforts take place on the project chat page and sometimes they take place on one of the many WikiProjects. Once a data model is set or a problem has been solved, people move forward fairly quickly with the automation.
As for the game, it's definitely more gamification than it is "The Adventure". I think the best analog to "The Adventure" might be the tours we have for users to take (I've never done "The Adventure" nor a tour), though I don't know how up-to-date those are. --Izno (talk)
- Wikidata also has an administrator's noticeboard, much like enwiki. I keep it on watchlist and issues seem to be tended to quickly when reported, including typical tasks like protecting items, performing rollbacks, etc. -- ferret (talk) 14:20, 3 October 2016 (UTC)
- How active is Wikidata.org? One of the key safeguards against vandalism at en.wikipedia is the practice of watchlisting and the vigilant eyes of our many volunteers. How easily detectable is vandalism at a place like Wikidata?
- "Fairly" active? Wikidata has the advantage of being multi-lingual, and having some limited client watchlist integration, so we get a lot of help from the client wikis. Gadgets like ORES help too. There's a lot of database though--some 23 million items, not all of which have the advantage of being inter-language linked--so more help is always welcome. As a result, the routine vandalism (blanking, false values, and that sort of thing) is easy to deal with. I can't comment much on the less-routine--hoaxes and such--but from my 10k-item watchlist I might suggest that there is very little of that kind of vandalism. --Izno (talk)
- My watchlist is not extensive at Wikidata, but it is possible to see the changes on enwiki as well. In your preferences, under the Watchlist tab, there is a check box to "Show Wikidata edits in your watchlist". This will cause your watchlist to show changes to Wikidata items that are linked in your watched articles. While it may not be a perfect implementation, it has helped me see some of the typical drive by vandalism that video game articles get hit with on Wikidata, all while just reviewing my normal enwiki watchlist. -- ferret (talk) 14:20, 3 October 2016 (UTC)
- Wikipedia's page on Wikidata describes it as a knowledge base - essentially a database of structured data. From its name alone, "structured data" sounds very orderly, but in practice it seems to require a huge number of tiny gnomish edits by individuals (any number of which might be inaccurate or susceptible to misuse by biased editors). Can you break the concept down for us a little? What do you see as the primary benefit of the Wikidata system?
- When the phrase "structured data" is used, it refers to the notion that I have established some relation between entities A and B that, with the proper interface, a human could understand, as well as a computer, without applying heuristics for that computer to understand the data. So in this case, we're not referring to what the users are doing, but whether the data can be understood by a computer. As for the second question, probably the biggest benefit Wikidata brings is increasing inclusion of WMF-operated wikis into the Semantic Web--which enables a whole chunk of interesting things. I'll defer to Wikidata:Introduction for what those things might be. :D --Izno (talk)
- I'll defer to Izno on this one. :) -- ferret (talk) 11:33, 10 October 2016 (UTC)
- It seems like the idea here is that down the road WP:VG's "Infobox video game" templates should draw their content from Wikidata which will act as a central repository of this kind of data which can then be used in many other contexts (other Wikipedias, Google Knowledge Panel, etc). Why can't Wikidata fill in empty entries by drawing its information from our "Infobox video game" existing templates (many of which are already filled in)? Has nobody bothered to write programs to automate this, or is the manual re-entry of all of this info intended to act as a double check?
- Yes, that's the plan. Wikidata has, in-the-past, pulled information from the wikis, especially the Wikipedias, to populate its data. This has slowed in recent times, as Wikidata has calcified in its requirements for sourcing, and lack of contribution in certain domains, and no-active bots in others. That said, you don't have to manually re-enter this information (well, sometimes)! There exists a tool called HarvestTemplates, though I don't know exactly how it works. --Izno (talk)
- Yes. Bots have been used in the past to import infobox data to Wikidata. But this creates an issue for some people who view it as using Wikipedia as a source, i.e. Wikidata sourced by enwiki, who then sources from Wikidata... This is where reference tools sorely need improvement. Additionally, it can be complex work to handle some of the more advanced fields, such as Infobox Video Games' release field. That field currently is not "Wikidata enabled" due to the many rules surrounding it's format. -- ferret (talk) 14:27, 3 October 2016 (UTC)
- From the names of the property variables (P136, P400, P444, etc) it sounds like a new property is defined for every new attribute possible for every new class of object. In other words a concept like "review score" may be a property unto itself distinct, let's say, from related concepts like "polling numbers" (for a political race) or "race results" (for a horse race). Is that accurate? Let's say that WP:VG editors agreed that "review score" wasn't narrowly enough tailored to the needs of WP:VG articles and that a "video game review score" property was needed? How would this be accomplished? Can first time Wikidata editors simply create new properties or can this only be accomplished by requesting an addition from the admins or is there a pre-existing master list or a plan like that defining the structure of the Dewey decimal system?
Not quite the case. Properties are what establish a relationship between two entities, or an entity and some numerical value, and etc. Those properties may be used across multiple domains, if it makes sense to do so across those domains; in some cases, they are used only in one domain, or in all domains. For example, with one large exception of a domain, the use of the "instance of" property establishes the kind of entity you're looking at (and it's based on well-established concepts in the Semantic Web). Another example which is instead fairly specific to video games is the Steam ID, which is the number assigned to a video game published on the Steam service. Properties run the gamut from generic to specific.
To answer your question about adding properties, there is the property proposal process. This process was instituted not-long after Wikidata phase 2 began (which was when Wikidata became more than an inter-language link host and started to classify entities), when the Wikidata community decided (with evidence to do so) that the process of creating properties should be restricted--prior to that time, anyone could create properties.
What is missing from the question: Do properties change? Yes, they sometimes do. Almost always, it is to increase their scope (or delete them). This change usually occurs because some domain has had its model fleshed out, a property was identified as being "the way" to model that information, and so a discussion was held and subsequently assessed to see whether the property could support that use case.
To answer the hypothetical "video game review score" question, I might suggest that, were it to run the gamut of the community, the proposed property would not be able to be shown to differ significantly from the currently existing "review score". Sometimes this leaves certain users in a limbo as to what-to-do--in which case RFC can be held.
Overall, the property proposal process works to ensure the ontology of Wikidata is preserved, even with new users and domains to-be-modeled. We've got most of the pieces in place for most relationships, and so usually the trouble is identifying which pieces those are. :D --Izno (talk)
- As Izno noted, there is a property proposal process for creating new properties. We have actually had to request several for WPVG, including the properties for archive-date (To use with citations that needed archive-url) and game artist (for the artist parameter of the infobox). -- ferret (talk) 14:27, 3 October 2016 (UTC)
- Are there any internal checks and safeguards in place? And how flexible are the properties in terms of the values they allow? Let's take the example of review scores. If I were to fill in a review score for Famitsu of 150 (normally Famitsu has a score from 1 to 40 or less commonly from 1 to 10), would such an entry raise any automatic flags? Or let's say that a Nintendo Power score (Nintendo Power shut down in 2012) was assigned to Pokémon Go (a 2016 game). Would an entry like that get caught by the system or would it require human review to detect?
Wikidata is deliberately designed to accept the sometimes-insanity of the world (and certainly of the fictional), where a census may count cats and city may belong to 6 countries at a time... or none at all! So to answer the question in a word: "flexible". That said, each property has a datatype, meaning it can only accept certain kinds of data. An "item datatype" property will only accept other Wikidata items; a "number datatype" property will only accept hard-coded numbers; and so forth. The correct type is often a point of discussion at the property proposal. Besides those limits, there are a number of "soft" constraints which provide a bot with the information to produce a series of reports of presumed "bad" values. These are the most-basic of checks, however. In this example, no, nothing of that sort would be caught in the filters. One of the points of failure right now for Wikidata that it needs to improve is the ability to detect "odd" values; a number of solutions have been proposed but nothing implemented. It's a point of future discussion. --Izno (talk)
- Basic format rules are in place, but they are global and not specific to the reviewer. I.e. the property should only contain numbers, /, or A-F grade scores. However the UI will allow you to enter anything. Wikidata has a reporting bot that notes when an item has a property with invalid formatting. Errors require human detection. There is nothing to automatically catch issues like referencing a 2012 magazine for a 2016 game, anymore than it would be on enwiki unless something like Cluebot caught it. -- ferret (talk) 14:27, 3 October 2016 (UTC)
- Outside of WP:VG, especially in areas like WikiProject Classical music and Wikipedia:WikiProject Opera, the concept of infoboxes is a controversial one. The issue has arisen time and time again at Wikipedia's "drama boards" and has come to be known as "The Infobox wars". There are still occasional flare-ups. Does the Wikidata model provide some kind of a path toward compromise? If not, which side (Team "infoboxes should be used whenever possible" or Team "infoboxes should only be used when necessary or not at all") does Wikidata support?
- Wikidata just holds the data; it's up to the wikis to use it. And while the Wikidata community would like Wikidata to be used, it's up to those wikis's communities to decide how to use it. That said, the next phase of Wikidata development is "automated list development", for which use cases are presently being solicited for understanding, so this is another point about which the WMF-operated wikis will have to decide. --Izno (talk)
- I don't think Wikidata solves the infobox issue. It is a source for data, and doesn't really affect the basic arguments around infobox usage. While it can improve consistency across the infoboxes of many projects, if adoption is widespread, it won't ever address concerns that infoboxes are simply awkward or cumbersome. Wikidata is a useful tool for those that support infoboxes, but it is unlikely to change those who oppose them. -- ferret (talk) 11:33, 10 October 2016 (UTC)
- Are there any specific WP:VG-related areas at Wikidata.org that are particularly in need of support and assistance? What would be the best way to start editing Wikidata?
- Pick your favorite video game or series, navigate to its Wikidata item page, and add data (and sources)! Specific needs include the more-human side of video games--developers and designers--as well as fully expressing data about a video game's release. The current infobox currently only includes certain information about a video game's release, when that may not be the entire story! --Izno (talk)
- Our first effort here was the review score template (The infobox came later). I see a lot of benefit in using Wikidata for MetaCritic and other aggregators on games that have series. This allows the score to be consistent between the series article and the main article. If it changes, one update handles both articles, keeping the score (and reference) in synch. Setting this up requires a few steps, such as ensuring that the Wikidata item for the series lists all of it's parts. Once setup though, you can have something like Fallout (series) where it is no longer necessary to put any local data values in the review score template. -- ferret (talk) 11:33, 10 October 2016 (UTC)
- Do you have any other advice (not listed above or in the guide) for WP:VG editors hoping to move into this new area?
- Wikidata is easy for video gaming enthusiasts. So, bug me later when you start. ;D ALSO Ferret teechs u 2 WIKIDATA![1] --Izno (talk)
- Definitely read the Wikidata guide I put together for the VG project. :) Izno has done some proofing over it as well. If you find anything confusing feel free to contact me, and I'll work to make the guide clearer. -- ferret (talk) 11:33, 10 October 2016 (UTC)
- ^ A reference to Alamo of World of Warcraft fame.