Skip to main content

Authors: Kirsty Wallis, Thomas Kaarsted, Simon Worthington, Alisa Martek and Dragana Janković.

Library Infrastructures and Citizen Science
Section Editor Kirsty Wallis
v1.0, 2023
Series: Citizen Science for Research Libraries — A Guide

DOI: 10.25815/tz0x‑m353


An activity module intended for inclusion in citizen science projects. This is an activity that researchers can plan into their citizen science project to create a bespoke reader with project participants and to introduce participants to using scientific literature, open access, and discovery services.

By Team semanticClimate (

Article DOI: 10.25815/hp6r-bm71

The activity module is designed to enable a hands-on activity by members of the public for the easy use of modern open access infrastructures involving finding, using, and sharing scientific literature. semanticClimate has developed open-source software to search across multiple research literature repositories such as Europe PMC or bioRxiv (pronounced “bio-archive”) enabling the creation of an automated literature survey within minutes — presenting the user with a summary of findings and allowing the download of the full-text open access articles.

semanticClimate is an example of an open search framework based on text data mining (TDM). Having ‘open search’ systems is important as search engines are the gateway to scientific knowledge. Search engines can be gamed to bias certain outcomes or be based on faulty algorithms (Kraker 2018). semanticClimate is built using open science methods, with all parts of the system being open and verifiable.

Who is the activity module for?

The module is for researchers using citizen science in their research project to add a module to engage the participants in conversations about formulating research questions and consulting on what is known about a topic in the existing scientific literature corpus.

The activity can be used to share results on a public webpage and to update the search as often as is required — say once a week.



Infobox: About semanticClimate open-source software

semanticClimate is a project that aims to develop knowledge resources and tools to help tackle the research questions on a global scale – climate change, viral epidemics, etc.

As an example, despite over $100 Billion being spent on medical research by NIH (2010-2014) (Galkina Cleary et al. 2018), much knowledge is behind publisher paywalls. Moreover, it is usually badly published — using PDFs which are not interoperable in format, or dispersed without coherent knowledge tools. This particularly disadvantages the Global South. The project aims to use modern tools, especially Wikidata (and Wikipedia), text mining, with semantic tools to create a modern integrated resource of all current published information on climate change for example the IPCC Reports. It relies on collaboration and gifts of labour and knowledge — to find out more see the Getting Started Guide or the How Can I Help section.

semanticClimate practices Open Notebook Science (“Open-Notebook Science” 2008) which means there is no insider knowledge and all work is open and licenced for the freedom of reuse.

FigureClimate Knowledge Hunt video. The video explains how semanticClimate tools can be used for searching IPCC reports



How semanticClimate works for citizen scientists

Below is an example of how semanticClimate can be used. The process is carried out as a group, either in person, hybrid, or online. Each step can be coached and explained to the participants. The final outcome of the process is a collection or papers, which itself can be made public for other, and the learning experience for the participants.

In this example, a citizen science project is looking at the topic of ‘zero-carbon plans’ for use in regional environment and climate policies.

semanticClimate carries out two types of search:

  1. Firstly, it searches repositories on the net and retrieves the papers;
  2. Secondly, it then analyses the local full text copies of the papers that have been downloaded. The result is a swift and verifiable literature survey that might have once taken days, weeks, or months to complete if done manually.

The researcher (or citizen) formulates a question they are interested in, e.g.,

What ‘zero-carbon’ plans for tackling the problem of climate change are reliable enough for further adoption in cities or regions around the world, for example in: green energy, transport, and housing plans For example the EU’s ‘A European Green Deal‘ (“A European Green Deal” 2020) or IPCC Climate Mitigation Reports (IPCC 2022)?’

Or, a simpler version of the same question could be,

What zero-carbon plans can be used for the future of my local schools, city public transport, or municipal buildings, etc?

From the ‘research questions’ a dictionary of terms important to the topic need to be made. Ten dictionary terms is a good start. Our example dictionary terms for ‘zero-carbon’ plans would be:

rapid decarbonisation; zero-carbon; low-carbon; energy planning; decarbonisation; low energy transport; policy and planning; policy; low energy housing; low energy city planning; low energy schools.

These terms are then input into semanticClimate and the browser, and it goes off and downloads the top one hundred research papers from your repository of choice — we use Europe PubMed Central as the default as it aggregates many other sources, but many other literature repositories could be used. Wikidata can also be used in relation to the dictionary terms being used and this allows for more advanced semantic queries to be carried out as well as being able to retrieve multilingual Wikipedia pages of say English terms used in papers.

semanticClimate then does a local search on the one hundred downloaded papers after giving the papers a scan. The local searches can be focused on paper sections — introductions, findings, etc., or on content types illustrations or tables — informed by what is thought to be the most yielding in the papers.

The next step is to refine and repeat depending on what looks useful. The dictionary of initial search terms should be updated as well as reviewing the local full-text search. semanticClimate downloads the full-text of the papers, as well as PDF copies. It also makes a summary of the results of the frequency of the terms. The final number of papers can be narrowed down to a workable sized reader.

The whole search results package can then be published and shared online as a literature collection on the given topic.


“A European Green Deal.” 2020.
Galkina Cleary, Ekaterina, Jennifer M Beierlein, Navleen Surjit Khanuja, and Fred D Ledley. 2018. “Contribution of NIH Funding to New Drug Approvals 2010–2016.” PNAS.
IPCC. 2022. “Climate Change 2022: Mitigation of Climate Change.”
Kraker, Peter. 2018. “Illuminating Dark Knowledge.”
“Open-Notebook Science.” 2008. In Wikipedia.
User Type
  • Educator/museum
  • Researcher/research institution
Resource type
  • Case studies
  • Digital tools
Research Field