Lua Task 9 - Using Wikidata in Wikipedia

edit

Prerequisite: Lua Task 6 - MediaWiki libraries. This task requires a lot of research and independent learning and really is a lot more difficult than any of the preceding tasks. It is not suitable for beginners to programming, although students new to Lua with previous experience in other programming languages may be able to produce acceptable work. Some familiarity with Wikimedia projects, especially Wikipedia and Wikidata, will be helpful. This is likely to take a considerable amount of your time, so don't embark on it lightly.

Wikimedia Projects

edit

There are over 300 Wikimedia projects, including local Wikipedias in many languages. A list can be found at meta:Wikimedia projects. Some projects make their contents available to other projects. For example, Wikimedia Commons contains free images, videos and sound files, which are used by all the other projects when they need a media file.

Wikidata

edit

Wikidata is the free database of facts. It can be used by many tools and programs which search and collate information, but it can also be used to provide facts for other projects such as Wikipedia. The information in Wikidata can be accessed by simple calls like {{#property:P19|from=Q47447}}, which gives the result Halifax. Wikidata is designed to be language-independent, so each entry is uniquely defined by an "entity-id" which is the capital letter "Q" followed by a number. Q47447 is the entry for Ed Sheeran, and it can be found on Wikidata at d:Q47447.

Look through the page at d:Q47447. Most of the facts there are part of statements, which give the value for a given property. In a similar way to the entries, properties are identified by the capital letter "P" followed by a number. P19 identifies the property "place of birth", and it has a page on Wikidata at d:Property:P19. So the call {{#property:P19|from=Q47447}} will retrieve the place of birth for Ed Sheeran, which is Halifax (Q826561).

This is adequate for simple cases, such as when the property has a single value, but we often want to display multiple values in a particular way and to have them linked to an existing article on Wikipedia, where possible. There is a Lua library that gives access to the Wikidata database, and it is documented at mw:Extension:Wikibase Client/Lua. You will need to read through that page to get an idea of what functions are available.

Requirements

edit

You will create a function that is similar to the #property call, with the following differences:

  1. It will display multiple values as a list on separate lines;
  2. Each value that has an article on Wikipedia will be linked to that article.

To demonstrate that your function works, you will create a table similar to this:

Wikidata for Ed Sheeran
Name Ed Sheeran
Place of birth Halifax, West Yorkshire
Occupation Singer-songwriter
Musician
Composer
Spouse

The values in the second column will be the output of your function for given name (P735), family name (P734), place of birth (P19), occupation (P106), spouse (P26), and should be linked where possible. A simple link is made by placing [[ ]] around the text.

A sitelink is the text corresponding to an article on English Wikipedia, so sometimes that sitelink has a disambiguator in parentheses. For example, [[Ed (given name)]] is the article for the name "Ed". We can use that sitelink to create what is called a piped link, like this [[Ed (given name)|Ed]]. The text before the | is the article title and the text after it is what is displayed. You can get the display text by removing whatever is in the parentheses, along with the parentheses and the preceding space.

You will need to create similar tables showing that your function works even when data is missing. Include a table for Ed Sheeran (Q47447) and Richard Burton (Q151973) and at least two others of your choice.

You must work in a fresh module sandbox and user sandbox. If I were doing the task, I would use Module:Sandbox/RexxS/Wikidata and User:RexxS/Sandbox/Wikidata.

Hints and tips

edit

Use the function mw.wikibase.getBestStatements( entityId, propertyId ) to retrieve a table from Wikidata. This is what the table looks like for the place of birth (P19) of Ed Sheeran (Q47447):

table#1 {

 table#2 {
   ["id"] = "q47447$B84F18FA-1B8B-48F3-ADEB-1B6F2053B47A",
   ["mainsnak"] = table#3 {
     ["datatype"] = "wikibase-item",
     ["datavalue"] = table#4 {
       ["type"] = "wikibase-entityid",
       ["value"] = table#5 {
         ["entity-type"] = "item",
         ["id"] = "Q826561",
         ["numeric-id"] = 826561,
       },
     },
     ["property"] = "P19",
     ["snaktype"] = "value",
   },
   ["rank"] = "normal",
   ["type"] = "statement",
 },
}

You can generate similar output by putting {{examine |P18 |Q1396889}} into your user sandbox.

If the table is stored in a variable called statementstbl, then statementstbl[1].mainsnak.datavalue.value.id will be the id of the entity for the value (if statementstbl[1] exists). In Ed Sheeran's case, this is Halifax (Q826561).

Use the function mw.wikibase.getSitelink( id ) to get the sitelink for an article like Halifax (Q826561), and create a link by surrounding it like this "[[" .. articlename .. "]]". If there is no sitelink just supply the label using function mw.wikibase.getLabel( id ) instead. Get this part working first.

In case there are multiple values, you'll need to step through the values of statementstbl[1], statementstbl[2], etc. Use for k, v in pairs(statementstbl), and v.mainsnak.datavalue.value.id. For each value found in the loop, store it as the next value in a table that you have declared to store the output of the function. At the end of your function, you will return the table converted to a single string with separators that produce new lines in html. Review mw:Extension:Scribunto/Lua reference manual #Table library for the table.insert and table.concat functions.