Report on the Biodiversity Information Standards Conference 2023 (TDWG2023)

edit

Siobhan Leachman (User:Ambrosia10) ORCID: 0000-0002-5398-7721 Licence: CC0 DOI: 10.5281/zenodo.10005437

Background

edit

Wikimedia Aotearoa New Zealand provided me with funding to attend in person the Biodiversity Information Standards Conference (TDWG 2023). More about the society can be found here https://www.tdwg.org/. This conference was held in Hobart from 9th to the 13th of October and was hosted by the Atlas of Living Australia https://www.ala.org.au/ and the National Research Collections Australia https://www.csiro.au/en/about/facilities-collections/collections. The conference website is https://tdwg2023.zohobackstage.com.au/TDWG2023.

Pre conference Engagement

edit

Conference paper and slides

edit

Prior to the conference I had been meeting and collaborating with a group of internationally affiliated natural history institution professionals on data standards for research expeditions. This informal working group was formed to discuss standards and share best practices and recommendations regarding terminology, data modelling and contextualisation of research expeditions. See Wikidata WikiProject Research Expeditions https://www.wikidata.org/wiki/Wikidata:WikiProject_Research_expeditions for more information on this collaboration and our aims.

We created a TDWG2023 presentation titled Modelling Research Expeditions in Wikidata: Best Practice for Standardisation and Contextualisation intended to update the wider community on the formation of this group, explain our aims and efforts to date and invite other participants to join in discussions. We also created slides for that presentation. My colleague Dag Endresen, Global Biodiversity Information Facility (GBIF) Node Manager for Norway, University of Oslo Natural History Museum, presented our paper at the conference.

Abstract for presentation Slides

Also some of my collaborators on the Women Genera research project (See this poster for background ) were either attending the conference in person or online. I had conversations, particularly with Sabine Von Mering, in the leadup to the conference, coordinating strategies on communicating topics of interest discovered during the conference with these collaborators. I will be reporting back to this group on the 18th of October. Social media

As I was keen to engage with the wider TDWG community prior to my attendance at this conference I joined the TDWG 2023 Slack Channel. This allowed discussions between both virtual and physical attendees. I also posted to my personal twitter and mastodon account as well as my linkedin profile keeping any followers who may be interested up to date about my upcoming attendance at the conference.

Pre-Conference meetings and discussions

edit

I kept the Aotearoa New Zealand Wiki meetup and the Wellington Wiki meetup informed about the event, including my application for funding, my travel planning and my work on the Research Expeditions presentation and with the working group.

I also had discussions regarding the conference with fellow Aotearoa New Zealand Wikimedia member, Wiki editor and Te Papa Digital Channels Outreach Manager Lucy Schrader as she was also attending the conference.

We had discussions about the conference in the lead up to the flight out of New Zealand. Lucy attends the Wellington Wiki Meetup and so I was well aware of her work with Te Papa but I also wished to ensure her work with the Te Papa Wiki project and particularly its documentation is more widely known in the Biodiversity Standards Community.

Lucy also explained that Leanne Elder https://orcid.org/0000-0001-5244-9780 from Manaaki Whenua Landcare Research would also be attending the conference. I was keen to catch up with her as I wanted to meet her in person for the first time. I also wanted to discuss my current workflows reusing Manaaki Whenua species images in Wikicommons, Wikipedia and Wikidata.

I arrived in Hobart on Sunday the 8th. TDWG members were on the bus with me travelling from the airport to various hotels. During the 40 minute bus ride I had a conversation with Vince Smith, Research Leader and Head of Digital, Data and Informatics at the Natural History Museum, London. https://www.nhm.ac.uk/our-science/departments-and-staff/staff-directory/vincent-smith.html catching up on his most recent outreach work for the Natural History Museum, London and also discussing the Research Expeditions working group as well as the Women Genera research work I had been contributing to.

After I had checked into my hotel I met Nicole Kearney https://www.linkedin.com/in/nicole-kearney-22904925/, the manager of Biodiversity Heritage Library Australia. We spent the afternoon of the 8th together discussing our mutual projects, updating each other on our personal and professional lives and discussing the upcoming TDWG conference as well as the wider Biodiversity Data Standards community.

This was an invaluable chance to gain further knowledge into the current state of BHL, both in Australasia as well as the progress being made to digitise biodiversity content across the world. I also learnt more about BHL Australia’s exploration into creating a Wikipedian in residence post. Nicole is particularly interested in creating a time limited post with the aim of increasing the quality and coverage of a specific list of species. We discussed how the post might work and strategized which types of species might benefit from better quality articles in Wikipedia. We also discussed other possible outcomes that BHL might wish to result from such a post. We also discussed Auckland Museum’s contributions to BHL and during the conference this blogpost by Auckland Museum on their BHL contributions was released. https://blog.biodiversitylibrary.org/2023/10/flora-fauna-photography-five-years-of-digitising-content-for-bhl-in-aotearoa.html

We later joined Simon Sherrin from the Atlas of Living Australia for coffee. We discussed his work with national species lists in Atlas of Living Australia as he is the technical lead for the project. It was very interesting to discuss the work of trying to reconcile multiple datasets and uses of species names and the issues of the use of synonyms across datasets. We also had a discussion about the Catalogue of Life where I explained my workflow for taxonomic issues for species - that is raising an issue on the Catalogue of Life github for data issues platform, and how raising these issues bring me in contact the necessary people working resolve taxonomic issues in the Global Biodiversity Information Facility (GBIF).

In the case of New Zealand endemic moths this is Donald Hobern https://www.linkedin.com/in/donald-hobern-355b6012/ who is leading a project to update the Natural History Museum, London database LepIndex via the platform Taxonworks. LepIndex is one of the more influential databases for Lepidoptera and still guides Wiki editing but unfortunately the dataset is currently out of date. This discussion led onto a more general discussion on resolving name conflicts of endemic species names and difficulties of Australian/New Zealand taxonomy vs international taxonomy.

Welcome reception

edit

I attended the TDWG2023 Welcome Reception at the Tasmanian Museum and Art Gallery. Here I managed to catch up with many of the attendees whom I had met at previous biodiversity related conferences I had attended, mainly virtually, over the last few years.

My main and successful objective was to meet up with Dag Endresen who was presenting our paper on Research Expeditions and check in with him regarding presentation preparation and any last minute edits. Dag had this well in hand.

I also had an opportunity to finally meet in person and have a discussion with Leanne Elder, the digitisation lead with Manaaki Whenua Landcare Research. We had a very detailed discussion about Manaaki Whenua’s funding of digitisation and how they prioritise the same. She explained how she will be collaborating with the Manaaki Whenua lepidopterist Robert Hoare to reinstate and update his large moths project onto the Manaaki Whenua website. I was very much in favour of this as it was this project that was one of the main inspirations for me commencing my endemic moth Wiki project. I was also very gratified to hear that funding would be allocated to more moth digitisation and image work. We had a general discussion on issues with GBIF and uploading data (including images) where taxonomy is an issue and how to resolve this issue. Again, my “workaround” of raising a Github issue with the Catalogue of Life, was discussed and this conversation laid the groundwork for me introducing Leanne to Donald Hobern during the conference.

I had a discussion with David Fichtmueller https://www.bgbm.org/en/staff/david-fichtmuller from the Botanic Garden and Botanical Museum (BGBM) Berlin, generally catching up with him and his work and discussing the topics of interest at the conference.

I had a discussion with Shelley James (Collections Manager at Western Australian Herbarium and TDWG conference co-organiser https://www.linkedin.com/in/shelley-james/ ) about her project of visiting the Cambridge University Herbarium. This herbarium is currently managed by another contributor to our Women Genera project Lauren M. Gardiner https://orcid.org/0000-0002-8843-0317. Shelley is obtaining funding to digitise Australian botanical specimens in the Cambridge Herbarium collection, particularly type specimens. She then intends to upload the same into DigiVol, get the labels transcribed by the public as well as curated by Australian botanical experts, and once complete, uploaded into GBIF. I discussed the possible reuse licensing of the images as I was keen to ensure the images were openly licensed to empower reuse in the Wikiverse. Shelley assured me she would be advocating for the use of open licences. ALA dinner invitation After the welcome reception Nicole Kearney arranged for me to receive an invitation to a dinner with staff and affiliated people from the Atlas of Living Australia (ALA) https://www.ala.org.au/. As a result of this generous invitation I had the opportunity to network with both the Atlas of Living Australia staff as well as folk who are members of the wider Biodiversity Data Standards community. ALA is a consumer of and also provides data to the Wikiverse. ALA direct ingest of Wikipedia articles Of particular interest was a discussion I had with Peggy Newman https://www.linkedin.com/in/peggydnewman/, the data manager at the Atlas of Living Australia, about the recent news that ALA is now directly ingesting appropriate English Wikipedia species articles onto their site. Prior to this, ALA had ingested the Encyclopaedia of Life website information, which may also have contained information sourced from English Wikipedia. ALA decided to ingest English Wikipedia articles directly in order to have more control over both the linking to Wikipedia and the ability to refresh those articles.

I also talked with Ely Wallis https://people.csiro.au/W/E/ely-wallis, the conference co-organiser and the TDWG chairperson, about the planning of next year's joint conference with SPNHC & TDWG. https://spnhc.org/update-on-the-proposed-joint-2024-conference-with-tdwg-in-okinawa-japan/ She is already in the midst of organising the 2024 conference and anticipates more attendees from the Asia Pacific region as a result of the location - Okinawa, Japan. Organisers also anticipate a significant number of presentation submissions for this joint conference to come in from around the work. We discussed how important it was that these two overlapping but diverse communities engage with each other to gain an understanding about the challenges and opportunities faced by both communities.

TDWG 2023

edit

Overall impression of the conference

edit

This conference was an amazing opportunity to reconnect and enrich existing relationships with attendees and participants in the biodiversity information standards community and also to network and make new connections with fellow attendees. It gave me the opportunity to meet New Zealanders I had yet to meet in person, such as Leanne Elder, as well as attendees with whom I had previously only engaged with online, such as David Iggulden and Pieter Huybrechts. I also had the opportunity to learn from the multitude of presentations given.

The conference also gave me the chance to advocate for more engagement with various Wiki projects but most particularly Wikidata. It also gave me, along with Dag Endresen, the opportunity to help raise awareness of our Wikidata WikiProject Research Expeditions https://www.wikidata.org/wiki/Wikidata:WikiProject_Research_expeditions

I had the opportunity to commence organising potential outreach events, to arrange individual support for Wiki engagement efforts by particular attendees and to advocate for the use of open licensing in certain projects undertaken by attendees. Just some examples of this type of engagement included Alison Vaughan, Royal Botanic Gardens Victoria who suggesting that another Wikidata training session with her staff would be timely, Elspeth Haston of Royal Botanic Garden Edinburgh supporting my continued outreach efforts to assist a technician at the Edinburgh Botanic Garden to learn to edit Wikidata, and a conversation with Shelley James emphasising my hope her imaging project with the Cambridge Herbarium would result in openly licensed images.

I also co-led a session at the “unconference” portion of the conference explaining to attendees how to propose a Wikidata property. This was highly useful as it elicited interest from multiple institutions and led to further conversations about donating data to Wikidata.

What follows below is a summary of just some of the sessions that were of interest or resonated with me. I have also included some of the many conversations I had and networking opportunities I took advantage of engaging with attendees at the conference.

Plenary Sessions

edit

Jess Melbourne-Thomas - Monday

edit

https://en.wiki.x.io/wiki/Jessica_Melbourne-Thomas

Jess is from CSIRO https://people.csiro.au/m/j/jess-melbourne-thomas and presented on standardised, large scale ecosystem assessment for the southern ocean and the underpinning role of biodiversity data.

She discussed her work and the massive effect Antarctica and the Southern Ocean has on the earth's climate. She presented on the effects of climate change on the Southern Ocean and how researchers obtain results from their research and their engagement with policy makers. The key takeaway from Jess’s talk as expressed by Nicole Kearney is that the southern ocean may (just) survive a warming of 1.5 °C, but things get dire at 2 °C and the ecosystem may never recover from the species losses & ice melts at 2°C+.

Maui Hudson - Monday

edit

Maui is from the University of Waikato https://profiles.waikato.ac.nz/maui.hudson and presented on Recognising Indigenous Provenance in Biodiversity Records and Traditional knowledge.

He discussed Traditional Knowledge Labels and Bicultural Labels and how these can be used to transform data infrastructure to recognise indigenous provenance. He used the work he has undertaken with Manaaki Whenua Landcare Research to show how adding traditional knowledge labels to their data has helped ensure that the rights of the Māori people over these data are better able to be recognised.

Manaaki Whenua Landcare Research, in consultation with the appropriate communities, has added these labels into their data management system. They map specimens via geolocations to work out where the specimens come from. They then consult with the iwi or hapu of that location regarding those specimens. The iwi or hapu decide what appropriate traditional knowledge and/or bicultural labels should be placed on the specimen records. Manaaki Whenau then updates their system so the Biocultural label sits on the record.

Hudson explained these labels are not legally binding. They sit separately from other legal instruments, for example creative commons licences. But he emphasised it is a method to raise awareness of the cultural status of these records and data, and helps encourage conversations and engagement with the indigenous communities from where these specimens were collected.

In a later discussion with Maui he expressed interest in learning more about Wikidata but as he was leaving half way through the conference plans were made to meet at a later date.

Tim Sherratt - Tuesday

edit

https://timsherratt.org/

Tim is an historian and hacker and is well known throughout the GLAM community for his digital humanities research and the creation of digital tools to encourage engagement with, research into ,and visualisation of data relating to GLAM collections. He is particularly well known for his research into Trove content using the web based interactive computer platform Jupyter Notebook.

I had previously met Tim and was extremely excited to watch him present to a community that I believe needed to be exposed to the potential of this type of research and engagement with their data. Tim emphasised his open approach and gave inspiring summaries about the type of work that can be undertaken via Jupyter Notebooks. He also explained how to access knowledge on how to use Jupyter Notebooks encouraging those in the audience to experiment.

He also discussed his engagement with the Australian Research Data Commons (ARDC) https://ardc.edu.au/ and how they may be looking to expand their engagement with other organisations.

I had the opportunity to have a conversation with Tim both prior and subsequent to his presentation and I enthusiastically advocated for him and ARDC to consider engaging with the Biodiversity Heritage Library (BHL). BHL has an API, a huge corpus of digitised literature from multiple centuries and also in multiple formats including handwritten content in multiple languages. I also reached out to JJ Dearborn https://www.linkedin.com/in/jjdearborn/, the BHL Data Manager, updating her about this conversation and encouraging her to reach out to Tim to encourage this type of engagement.

Arthur Chapman - Friday

edit

Arthur Chapman https://www.linkedin.com/in/arthur-chapman-35288812, Australian Biodiversity Information Services, gave a fabulous overview of the historical and current Australian biodiversity digital landscape of biodiversity data. It was extremely interesting and also educational to hear the pivotal role Arthur has played in ensuring Australian biodiversity data was shared and reused widely and also the progress made from pre world wide web to the digital technology used nowadays to share biodiversity data.

Monday Sessions and Engagement

edit

Morning

edit

Keynote by Jess Melbourne-Thomas (See above in Plenary Sessions section)

edit

Keynote by Maui Hudson (See above in Plenary Sessions section)

edit

After the two keynote presentations at the start of the morning there were several other papers I particularly want to highlight.
Do our Project Delimitations Display a Continued Legacy of Colonialism? Towards an independant Flora of Cambodia. តើការកំណត់ព្រំដែននៃគម្រោងរបស់យើងបង្ហាញពីការបន្តនៃអាណានិគមនិយមទេ? ឆ្ពោះទៅរករុក្ខជាតិឯករាជ្យរបស់កម្ពុជា។ by Visotheary Ung
This presentation I found particularly impactful on Monday was by Visotheary Ung https://orcid.org/0000-0002-4049-0820. It addressed the effect of colonisation on a country particularly in relation to the contribution and access to biodiversity knowledge and how the project Ung is involved with intends to attempt to help rectify this. She discussed Cambodia, which is a developing country and a biodiversity hotspot, and how France’s colonisation of the wider Indo-China area has affected Cambodia. France’s influence in this area also affected biodiversity publications. For example the journal General Flora of Indo-China began its publication in 1907 and continued until 1951 and was edited by French editors. In 1960, this flora was reinitiated as the Flora of Cambodia, Laos, and Viêt-Nam. Since 2013, this flora has been jointly edited by the Museum National d'Histoire Naturelle and the Royal Botanic Garden, Edinburgh and has been produced in English and French.

The Flora of Cambodia project intends to compile an up-to-date understanding of Cambodia's plant life, including an inventory of collections housed at the Museum National d'Histoire Naturelles and those collections the data of which can be obtained via the Global Biodiversity Information Facility (GBIF) and other online sources. The ultimate goal is to produce a comprehensive flora of Cambodia in multiple languages including Khmer, the official and national language of Cambodia. This project seeks to empower both Khmer botanists and the broader local community, allowing them to reclaim and cherish their intrinsic knowledge of native plants.

Documenting Biodiversity in Underrepresented Languages using Crowdsourcing by Mohammed Kamal-Deen Fuseini, Researcher (User:Dnshitobu) and Wikimedian User Agnes Abah, Volunteer Wikimedia, Wiki Mentor Africa.

Unfortunately due to technical difficulties, the presenters were not online to present their paper.

Afternoon

edit

The action packed afternoon had multiple presentations that were of interest to me. These included the following:
Can Biodiversity Data Scientists Document Volunteer and Professional Collaborations and Contributions in the Biodiversity Data Enterprise by Paul Flemons
Paul Flemons https://www.linkedin.com/in/paul-flemons-481b1512/ gave a presentation on obtaining better attribution and credit for citizen science work. He emphasised the value of citizen science and the need for respect as well as attribution for the work of citizen scientists. He used the Frog ID project https://www.frogid.net.au/about-frogid as an example of a successful citizen science project that has collected important research data which otherwise would have cost institutions significant funding to obtain. He emphasised that individuals, the platforms that enable citizen science, and citizen science as a discipline, are not recognised sufficiently. The Darwin Core standard uses the term “Recorded by” to reference/attribute work. He argued this is not effective for citizen science and that standards should be developed to ensure citizen scientists get attribution for their contributions. He encouraged the development of data standards that recognise the contributions of individuals, platforms and Citizen Science in general.

Global Lepidoptera Index: Progress and Issues with Developing a Comprehensive Global Checklist of Moths and Butterflies by Donald Hobern
Donald Hobern https://www.linkedin.com/in/donald-hobern-355b6012 gave a very interesting presentation on developing a global checklist of moths and butterflies. I have been an active Wikipedia editor of New Zealand endemic moth articles and as a result of this work have made a very small contribution to this effort. If there are taxonomic issues shown in the Catalogue of Life for moth species I am writing Wikipedia articles on I raise a Github issue with the Catalogue of Life that, in turn, normally ends up in front of Donald for him to work through. Once resolved and updated in the Catalogue of Life, this has a flow on effect in GBIF and helps ensure that the taxon names used by institutions such as Auckland Museum and Manaaki Whenua Landcare Research as well as iNaturalist resolve to data they have all uploaded into GBIF.

Safeguarding Access to 500 Years of Biodiversity Data: Sustainability planning for the Biodiversity Heritage Library by David Iggulden
This was a particularly pertinent presentation as it emphasised how the funding model for BHL may not be sustainable and how alternatives needed to be considered. Given how pivotal BHL is for Wiki (BHL provides a rich source of information, data and images) I was extremely interested to hear how this issue might be relieved or solved. David mentioned that the Smithsonian Archives and Libraries is undertaking a BHL 20th Anniversary Assessment. See this link https://s.si.edu/bhlassessment to provide your contact information if you would like to participate in this assessment by providing feedback as a BHL stakeholder. I have signed up to ensure my voice is heard.

Monday conversations and engagement

edit

I introduced Leanne Elder from Landcare Research to Donald Hobern after Donald, in his presentation, requested assistance from taxonomists the world over to help update the LepIndex. Leanne is keen to help sort out NZ moth taxonomy in the Catalogue of Life and said she intends to discuss this with Robert Hoare, the lepidopterist at Manaaki Whenua Landcare Research.

I had a discussion with Bob Mesibov https://en.wiki.x.io/wiki/Robert_Mesibov He had undertaken a review of our dataset for the Women Genera project, and had given very helpful feedback, giving us guidance on how to improve the same. I thanked him profusely for his help and we had a discussion about the women genera group’s process for creating the dataset and how the dataset was being improved as a result of his useful feedback.

I met David Iggulden https://www.kew.org/science/our-science/people/david-iggulden in person for the first time. He is the Head of Data & Digital in the Library, Art & Archives at the Royal Botanic Gardens, Kew and current Chair of the BHL Members’ Council. We had a discussion about BHL and about his presentation. We also discussed the current digitisation project of Kew’s herbarium digitisation project. I asked if there were similar efforts for the library and archives section of Kew and learned to my disappointment that the current digitisation project is prioritising herbarium sheets only.

I also talked with Alison Vaughan https://www.linkedin.com/in/alison-vaughan-59a80333/, Manager Collections, National Herbarium of Victoria. We had an amazing discussion relating to the research expedition project I’m participating in. Her herbarium is involved in research about the Hann research expedition which is being undertaken by the direct descents of the Aboriginal guide, the geologist and the naturalist involved in that expedition. See https://openresearch-repository.anu.edu.au/handle/1885/279715 This led to a discussion about the research expeditions project and how this might help connect, document and support this type of research. We covered such topics as getting recognition for the indigenous guide, improving the map of expedition given the descriptions of places in the field notebooks, and the researchers obtaining current indigenous knowledge about landscapes descriptions in order to improve the location data of specimens collected.

I talked with Paul Flemons from the Atlas of Living Australia and WeDigBio (https://wedigbio.org/), about citizen science and about him attending the CitSciOz conference in November at which I will be keynoting. Paul is also on the advisory board for WeDigBio with me and so it was wonderful to have a conversation with him in person. We discussed his presentation and how he wants to get the TDWG community thinking about how to credit and acknowledge citizen scientists and their contributions.

Deb Paul (Biodiversity Informatics Community Liaison for the Species File Group https://www.linkedin.com/in/debbie-paul-64b1bb1a) and I discussed round tripping of data and how this needs to be added into grant applications to ensure data isn’t just created, enriched and cleaned but also pushed out to the communities that need it. Funding applications need to ensure the researchers have the funds to cover the costs of integrating the new data into existing systems. It was an interesting conversation that inspired me to think about funding grants in general and for them to include funding to ensure any pertinent data is added to Wikidata.

Dinner was with Shelley Jame, Deb Paul, Julia Percy-Bower, Lucy Schrader, Leanne Elder, Alexander Amies and Maui Hudson. Wide ranging discussions were had by all including a discussion on data.govt.nz, licensing and archiving of data, imaging herbariums and equipment used.

Tuesday Sessions and Engagement

edit

Morning

edit

This morning was another densely packed period of presentations, many of which were relevant to my work or of interest to me. Just some of the highlights included:

Modelling Research Expeditions in Wikidata: Best Practice for Standardisation and Contextualisation by Dag Endresen, GBIF Node Manager for Norway, University of Oslo Natural History Museum.
Abstract for presentation: https://biss.pensoft.net/article/111427/
Slides: https://docs.google.com/presentation/d/1MibCiFMq27UU-sfg9zhOSuiMv8kn804v/edit?usp=sharing&ouid=102472176143891481589&rtpof=true&sd=true
Obviously this presentation was of interest to me as I helped co-author the abstract and the slides. Dag did a fabulous job presenting our paper at the conference. Several of the attendees expressed interest in joining the group or following our work. After the presentation and during question time Alison Vaughan, Manager Collections, National Herbarium of Victoria, stated she would be contacting me in the near future to undertake more training on Wikidata with her staff.

Celebrating BHL Australia through the Eye of the (Tasmanian) Tiger by Nicole Kearny, BHL Australia Manager
Nicole’s presentation was inspirational. It covered her work on expanding the impact of BHL Australia throughout Australia, gathering content from multiple institutions and organisations and digitising the same, ensuring everyone has access to Australian biodiversity knowledge. She discussed the impact of her work improving the breadth and depth of works covered in BHL, generating retrospective DOIs on historic literature and how this ensures that descriptions of species can be connected to the taxonomic names of those species. She called for more participation to ensure biodiversity literature can be found where most folk look for it and emphasised this was normally via a Google search and/or via Wikipedia.

On a BiCIKL to Wikidata: Harmonizing the chaotic universe of natural history collectors by Mathias Dillen, Biodiversity Data Scientist, Meise Botanic Garden
This particular presentation was extremely interesting and very inspiring, particularly in relation to Mathias’ use and enrichment of Wikidata . Mathias has been doing a lot of work on collectors and identifiers of specimens, attempting to ensure they are connected to their ORCID or Wikidata item and then using that connection to enrich the links to the specimens collected and identified. Of particular interest to me was the work done by Mathias to bulk upload “collection items at” statements in Wikidata linking collectors of specimens to the institutions that hold those specimens. I had not been adding those statements in detail as I had assumed that, as a result of my work with Wikidata and Bionomia https://bionomia.net/, this information could be added in bulk at a later date. I was so pleased to see that this belief was accurate.

Connecting the Dots: Aligning human capacity through networks toward a globally interoperable Digital Extended Specimen (DES) infrastructure by Nicky Nicolson Senior Research Leader, Biodiversity Informatics, Royal Botanic Gardens, Kew
Nicky discussed how the digitisation of specimens is leading to Digital Extended Specimens (DES) which in turn can improve research outcomes. However to create DES, standards are needed. These standards are being discussed and created by an international partners group that meets regularly. Nicky emphasised that one of the more important processes the group went through was to fill out a template which asked the question “what is the problem”. The individual answers to this question were helpful as they ranged from “identifiers of papers using collections aren’t being returned into an institution's collection management system” to “there is a biodiversity crisis”.

She discussed the practical workshops being held at a conference getting participants to discuss how they would build an extended digital specimen and how useful those workshops were. She feels these workshops helped solidify the extended digital specimen concept and raised practical issues and assisted participants to brainstorm solutions to those issues.

This presentation was really interesting to me as it followed on from the workshops on DES I took part in when attending SPNHC 2023 as well as other DES presentations during that SPNHC conference. Looking at Nicky’s presentation through Wikidata eyes I’m immediately thinking of the role Wikidata can play as a crosswalk or linking platform for DES elements, and the identifiers and institutions linked to and via the DES. Wikidata can provide links to all sorts of information and identifiers that can help enrich a digital extended specimen.

Demonstration of Taxonomic Name Data Services through ChecklistBank Olaf Bánki, Executive Secretary Catalogue of Life
Olaf gave a fabulous presentation about ChecklistBank which, although I was aware of, I had yet to use. He provided a link to a tutorial https://docs.gbif.org/course-checklistbank-tutorial/en/ and showed how ChecklistBank can be used to compare taxonomic names and also how it is possible to compare datasets. It was immediately obvious to me that this is a tool that would be very useful for discovering synonyms of species and also to assist with solving issues with taxonomy when editing Wikipedia and Wikidata.

You log in to Checklist Bank https://www.checklistbank.org/ via your GBIF account. While the presentation was on going I searched for Wikipedia sourced checklists. I found https://www.checklistbank.org/dataset/54203 - a checklist for the Taxonomy of the Coccinellidae ladybird beetle family. This list was generated based on information found in Wikipedia see https://en.wiki.x.io/wiki/List_of_Coccinellidae_genera. I also came across a checklist provided by Rod Page see https://www.checklistbank.org/dataset/223917/about. After the conference I want to explore this resource more, both to use it as a tool for my Wikipedia and Wikidata species work but also to see if this is an area I am able to make a contribution. I also intend to bring this resource to the attention of the Wikidata WikiProject Biodiversity group and to those New Zealand editors of biodiversity articles.

Afternoon

edit

Keynote by Tim Sherratt (See above in Plenary Sessions section)

edit

Ongoing Work with the Global Registry of Scientific Collections by Marie Grosjean Data Administrator, GBIF Secretariat

GRSciColl is a registry of scientific collections and their associated institutions. Currently Wikidata has a property, titled Biodiversity Repository ID (See Wikidata property https://www.wikidata.org/wiki/Property:P4090), that ensures an institution can have the GRSciColl code representing that institution attached to its wikidata item. However one of the issues raised during the conference was that the GRSciColl identifier for the scientific collections in institutions has yet to have a Wikdiata property created for it. Discussions were undertaken regarding this amongst the attendees with some Wikidata experience, and it is intended that a property will be proposed in the near future. I intend to support the property proposal when it is drafted.

Unearthing the Past for a Sustainable Future: Extracting and transforming data in the Biodiversity Heritage Library for climate action JJ DearbornData Manager, Biodiversity Heritage Library Smithsonian Libraries and Archives

In this presentation JJ outlined the progress BHL is making in extracting the biodiversity and climate data contained in BHL's corpus. She gave examples of five projects which are piloting approaches assisting with this aim.

These included a project to upgrade legacy OCR text using Tesseract OCR https://github.com/tesseract-ocr/tesseract; a project to evaluate handwritten text recognition engines such as Microsoft Azure Computer Vision, Google Cloud Vision API and and Amazon Textract to improve scientific name-finding in BHL’s handwritten archival materials using algorithms developed by the Global Names Architecture https://globalnames.org/; a project to extract data from collecting events using HTR coordinate outputs with Python library Pandas DataFrame to create structured data; a project to classify BHL page-level images using OpenAI's CLIP, a neural network model, to accurately identify the handwritten sub-corpus of primary source materials in BHL; and a project to run an A/B test to evaluate the efficiency and accuracy of human-keyed transcription data extraction to provide high-quality, human-vetted datasets that can be deposited with data aggregators.

Although I work closely with BHL I was unaware of some of these projects and as a result I found this presentation very educational.

Tuesday conversations and engagement

edit

I had a discussion with Pieter Huybrechts https://orcid.org/0000-0002-6658-6062 about his iNaturalist tool see https://pietrh.github.io/2023_tdwg_uncommon_inaturalist/ This is based on a snapshot of iNaturalist Data. If a person adds their ORCID id to the end of the url they can get a summary of their iNaturalist observations and can filter by rareness. We also generally discussed Wikidata and iNaturalist.

After the discussion with Pieter I talked to Paul Flemons about Pieter’s work as an example of how similar tools might be used to get iNaturalist citizen scientists attribution and credit for their work.

I had discussions with Sarah Tassell https://www.linkedin.com/in/sarah-tassell-91a2aa5a about the Wikidata WikiProject Research Expeditions https://www.wikidata.org/wiki/Wikidata:WikiProject_Research_expeditions and also iNaturalist.

I also talked with Tim Sherratt about his presentation (see the plenary sessions for more information).

Dinner
edit

That evening I went to dinner with Elspeth Haston (https://www.rbge.org.uk/about-us/who-we-are/staff/herbarium-staff/dr-elspeth-haston-deputy-herbarium-curator/), Alison Vaughn and Nicole Kearney. We had wide ranging conversations about multiple topics including the conference and the presentations we had seen, general updates on what each of us were working on and also discussions about such topics as me tutoring Courtney Kemnitz, a digitisation technician at Royal Botanic Garden Edinburgh, in Wikidata. This had been facilitated by Elspeth, and I particularly wanted to ensure that Elspeth explained to Courtney that Courtney could email me any time with questions or issues she might be having.

Wednesday Sessions and Engagement

edit

Morning

edit

This was the excursion day. In the morning I worked on my report back to Wikimedia Aotearoa New Zealand and the wider community about the conference.

Afternoon

edit

In the afternoon I visited the Tasmanian Herbarium, located on the Sandy Bay campus of the University of Tasmania. The Herbarium is responsible for the development, maintenance and management of the botanical collections of Tasmania and is internationally recognised as the most comprehensive record of the Tasmanian flora in the world. The Herbarium houses more than 312,000 plant specimens, with flowering plants being the largest group represented. See https://www.tmag.tas.gov.au/collections_and_research/tasmanian_herbarium for more information.

I visited the herbarium with Shelley James and Julia Percy-Bower, both of the Western Australian Herbarium as well as Alexander Amies from Manaaki Whenua Landcare Research https://www.landcareresearch.co.nz/about-us/our-people/alexander-amies. While there we were also joined by Alison Vaughan.

We had interesting discussions on the practical workings of the herbarium as well as the digitisation of herbarium records, imaging of specimens and their workflow of getting their data into GBIF for reuse. Their images are openly licensed. I learned more about the high endemism in Tasmania and the threats invasive species play. There were definite parallels between Tasmania and New Zealand in this regard.

Another discussion that occurred concerned WeDigBio https://wedigbio.org/. Shelley mentioned that this may be a topic of the unconference to be held on Friday. I am on the advisory board of WeDigBio, which is a global data campaign to mobilise participants to create digital data about biodiversity specimens. The campaign was underway during the conference and I supported the idea of holding a “WeDigBio” transcription meeting for the conference attendees. I was keen to see attendees use the extended transcription workflow of taking the collector or identifier and ensuring they have a Wikidata item, or researching them to find their ORCID and then adding them to Bionomia https://bionomia.net/.

Conference dinner

edit

I was seated beside Nicky Nickolson https://www.kew.org/science/our-science/people/nicky-nicolson and Nicole Kearney with Quentin Groom and David Iggulden nearby.

I had discussions with Nicky about Tim Sherratt’s keynote and Jupyter Notebooks etc. We talked about her and her students using Jupyter Notebooks and in particular one student who is studying scientific illustrations and attempting to discover in bulk whether those illustrations were illustrated from specimen sheets and also which particular specimen sheet was used. I had seen a presentation on a specimen sheet being identified as the inspiration behind an illustration previously and promised Nicky that if I could find it I would let her know.

I caught up with Quentin Groom https://be.linkedin.com/in/quentin-groom-9075a812 Quentin is a botanist at the Botanic Garden, Meise. I had previously worked with him and our other co-authors on The disambiguation of people names in biological collections https://doi.org/10.3897/bdj.10.e86089. He will be presenting on Thursday and Friday. We had a general catch up about what he has been working on and how the conference was proceeding.

I also had a really interesting discussion with Visotheary Ung about her paper on generating a Cambodian flora, mentioned earlier in this report. We discussed issues surrounding the colonisation of knowledge resulting from the language used and difficulties of obtaining biodiversity information in the Khmer language as opposed to French or English.

Thursday Sessions and Engagement

edit

Morning

edit

Planning the Migration to a New Database: Implications for the Collections of Meise Botanic Garden by Henry Engledow Database Manager, Meise Botanic Garden

This presentation covered how the complexity of the physical collection can have implications for both the migration of data and also the planning for a physical move. The presentation emphasised how the digitisation of the herbarium and the data generated is helping ensure the physical move of the herbarium is successful and is also assisting with the ongoing enrichment of data about the collection. He discussed further work needed and aims, one of which is that the Meise Botanic Garden is keen to link collectors of botanical specimens to their Wikidata items.

Elevating the Fitness of Use of GBIF Occurrence Datasets: A proposal for peer review by Vijay Barve, Researcher, Natural History Museum of Los Angeles

Vijay’s presentation on improvements to datasets prior to upload to GBIF was very interesting as was his proposal of a peer review structure. I was aware that Lucy Schrader from Te Papa is currently preparing data for upload to GBIF and thought I should introduce her to Vijay. I managed to achieve this goal later in the day.

Afternoon

edit

Name IDs and Name Matching for Catalogue of Life: Existing Services and Prospects by Markus Döring ChecklistBank & GBIF Backbone Developer GBIF / Catalogue of Life

This presentation explained that the Catalogue of Life (CoL) checklist is one of many lists in ChecklistBank but that it is also an aggregation of more than 160 checklists, all of which are contained in ChecklistBank. Each of those checklists are issued with a digital object identifier and a dataset key. Each of the taxon names in the Catalogue of Life are issued with a name usage identifier. The combination of a name usage identifier and the data set key allows for the tracking of names between the various CoL Checklist versions. Recent developments in ChecklistBank has ensured that a new name usage (i.e., taxon or synonym) matching service against any dataset in ChecklistBank is now available. I particularly valued this presentation as it improved my understanding of the interrelationship between the Catalogue of Life and ChecklistBank.

Planetary Knowledge Base: Semantic Transcription Using Graph Neural Networks by Qianqian Hiris Gu Postdoctoral Researcher, Machine Learning Natural History Museum

This was the most amazing presentation and I admit much of it was outside my area of expertise. Hiris has been working with her colleagues to develop a Planetary Knowledge Base https://github.com/NaturalHistoryMuseum/planetary-knowledge-base, a comprehensive graph network comprising data on all specimens, collectors, and localities. In the prototype they concentrated on botany specimens, using all plant taxa and specimens in GBIF https://www.gbif.org/ and then combined this with geographic data from GeoNames https://www.geonames.org/ and biographic data from Wikidata, Bionomia https://bionomia.net/, Harvard Index of Botanists https://kiki.huh.harvard.edu/databases/botanist_index.html , TL2 https://www.sil.si.edu/DigitalCollections/tl-2/ and Tropicos https://www.tropicos.org/home.

Sarah Vincent from our Women Genera research project works with Hiris and was aware she was presenting at TDWG. Sarah recommended I attend her presentation and I’m so pleased she did so. I have been working on botanical collectors in Wikidata, Bionomia and also disambiguating and adding collectors from the Harvard Index of Botanists and TL2 to Wikidata for years. It was extremely gratifying to see others making use of this work. After her presentation I was able to talk to Hiris and suggested she make use of the BHL creator ID dataset for her knowledge base. I gave her JJ Dearborn’s email (JJ is the BHL data manager) as a contract to reach out to and I look forward to watching the progress of this knowledge base develop.

William Ulate’s memorial

edit

William was Project Coordinator of the World Flora Online. He had also worked as Technical Director of the Biodiversity Heritage Library. I met him at the 2018 TDWG and SPNHC joint conference in Dunedin. He enthusiastically welcomed me into these two communities and I was looking forward to seeing him again in person at TDWG 2023. I was so sad to learn of his death and felt honoured to be able to attend his memorial.

Thursday conversations and engagement

edit

I introduced Vijay Barve to Lucy Schrader with the aim of ensuring Lucy had a wider network of people she knew in case she needed to obtain advice or trouble shoot Te Papa’s future uploads of data to GBIF.

I had a detailed discussion with Elspeth Haston from the Royal Botanic Garden Edinburgh about how Te Papa adds Wikidata, Wikipedia as well as archive links such as the Alexander Turnbull Library links to their collection management system and their website person pages. We then discussed the potential for the Royal Botanic Garden Edinburgh doing the same. I showed Elspeth how BHL adds identifiers onto their creator profile pages and how they have imported selected identifiers from the Wikidata item into those pages. See https://blog.biodiversitylibrary.org/2023/02/round-tripping-persistent-identifiers-with-wikidata-query-service.html for further information on this process. I subsequently emailed Elspeth the link to that blog as I am keen to encourage the Royal Botanic Garden Edinburgh to do something similar.

I had a discussion with David Fichtmueller about us collaborating on an “Unconference” workshop showing attendees how to create a proposal for a Wikidata property for their institutional identifiers. See the Friday Unconference section below for information on how this progressed.

I talked with Nicky Nicholson about Hiris' presentation and how impressed I was by it. We had a general discussion about both the presentation and the knowledge base.

Later in the evening I attended drinks after the memorial for William and then had dinner with Ely Wallis, Arthur Chapman, Nicole Kearney, Nicky Nicholson, Visotheary Ung and David Iggulden. We socialised and talked about our memories of William, we discussed the conference, presentations, BHL and other biodiversity data related topics during the evening.

Friday Sessions and Engagement

edit

Morning

edit

European Taxonomists in Profile: A Data-Driven Approach by Quentin Groom, Researcher Meise Botanic Garden A really interesting presentation on research undertaken by Quentin and others into taxonomists, the associations between taxonomists, their areas of specialisation and in what areas that work was being funded by various institutions. The aim of the research is to help inform policy makers on what areas need taxonomists and in what areas taxonomists concentrate their work. One unsurprising result discussed that was of particular interest to me, was that if taxonomists concentrate on working on threatened species, the number of species allocated to the IUCN red list (https://www.iucnredlist.org/) increases. Big Data for Beginners by Pieter Huybrechts Research software engineer Research Institute for Nature and Forest (INBO) This was a really insightful presentation by Pieter pointing out solutions to barriers to the large-scale use of biodiversity occurrence data. He went through several challenges such as the lack of facilities for local storage of large and rapidly changing datasets, the computational power required for processing, unfamiliarity with existing toolsets, and insufficient resources to maintain big data infrastructure. recommendations and offered practical solutions to each of these challenges. He encouraged less technically minded researchers to use local workstations and cloud computing services such as Databricks Community Edition https://docs.databricks.com/en/getting-started/community-edition.html or Apache Arrow https://arrow.apache.org/. He explained that by using these tools researchers can incorporate larger datasets into their research and can uncover new insights as a result.

Poster session

edit

The posters from the conference can be seen at the link below. https://drive.google.com/drive/folders/133nzdE-0pveoHqbkzOcTCPNNAzLnysKr

Of particular interest to me was a poster by Anja Schwarz, Fiona Moehrle and Sabine von Mering titled Collections from Colonial Australia in Berlin's Museum für Naturkunde and the Challenges of Data Accessibility https://doi.org/10.3897/biss.7.111980 This poster outlines the efforts of Berlin's Museum für Naturkunde to digitise its collection in order to ensure its collection is appropriately documented. The Museum aims to share its collections with those countries from which their collections originate and also possibly repatriate items whose possession is unethical by today’s standards. The museum wishes to enter into a dialogue and exchange with the communities of origin of those items.

Afternoon

edit

Unconference

edit

David Fichtmueller and I gave an impromptu workshop on how to go about making a property proposal to the Wikidata community. We showed what a property looked like, showed the discussion page on a property, explained the process of proposing a property, gave links to property proposal documentation, and answered questions on editing in general, editing properties particularly, explained who created property once it was supported by the community, who is responsible for using properties, and how to add data to Wikidata.

We discussed the use of OpenRefine and the quickstatements tool and also explained about the mix’n’match tool. We went into further detail about Wikidata items, explaining about linking, how statements are added to items. We emphasised throughout the workshop how important it is for biodiversity institutions to share their identifiers, particularly identifiers for people. This emphasis on people identifiers was as a result of my working in this area but participants quickly realised that institutional identifiers can be added to all sorts of items if the institution has appropriate data.

We discussed how some institutions eg BHL have multiple identifiers that have resulted in the creation of multiple Wikidata properties, ensuring that links to BHL datasets can be added to different types of items. Finally we ended the workshop by giving a demo of the chrome and firefox extension entity explosion. https://www.wikidata.org/wiki/Wikidata:Entity_Explosion

Keynote by Arthur Chapman (See above in Plenary Sessions section)

edit

Friday conversations and engagement

edit

I had a discussion with Bob Mesibov about Tim Sherratt’s presentation and about Trove in particular. He explained he had found an article about a research expedition that listed all the participants, the dates and places the expedition went. We discussed how Trove contains newspaper articles from US and UK newspapers which had been copied in Australian newspapers which made it extremely useful when researching those research expeditions that may have been covered in newspaper articles. I promised to point out this resource to the Wikidata WikiProject Research Expedition group.

I also had a conversation with Ben Scott, https://www.nhm.ac.uk/our-science/departments-and-staff/staff-directory/ben-scott.html, the Head of AI & Innovation Natural History Museum, about the possibility of them donating some collector data and identifier links to Wikidata. He intends to email me with the data to see if the datasets he were considering will be fit for this use.

Conclusion

edit

This conference was an amazing opportunity to network with multiple attendees, to learn from the community and also to advocate for the use of various Wiki Projects, but particularly Wikidata. I felt privileged to have been able to attend, to learn and to contribute to community discussions concerning biodiversity data and data standards. I am extremely grateful to Wikimedia Aotearoa New Zealand for providing me with this opportunity.