TROLLing dataset no. 100 is published!

posted in: Uncategorized | 0

The Tromsø Repository of Language and Linguistics (TROLLing) is happy to announce the publication of the 100th dataset in the repository!

Congrats to Lukas Sönning from the University of Bamberg as the author of this special dataset in TROLLing! His dataset on the acquisition of two English vowel categories by German L2 learner (Sönning 2021) marks a further important milestone in the history of TROLLing. The dataset is part of a series of seven datasets that Lukas Sönning has published in TROLLing to support his dissertation Phonological variation in German Learner English (Sönning 2020) defended at the University of Bamberg.

We are glad that more and more researchers are discovering TROLLing as a curated and trustworthy repository for sharing their data in way that is aligned with the FAIR Guiding Principles for scientific data management and stewardship (Wilkinson et al. 2016). Based on the Dataverse repository software, TROLLing was launched in 2014 and has ever since been operated by the University Library at UiT The Arctic University of Norway. The establishment of the repository was initiated by professors Laura A. Janda and Tore Nesset, both working with Russian linguistics at UiT. Laura and Tore’s mission was to provide linguists worldwide with a trustworthy, community-driven repository where they could share their research data, statistic code and other linguistic material and thus promote the transparency and reproducibility of linguistic studies. The repository is managed and curated by Helene N. Andreassen and Philipp Conzett, both Senior Subject Librarians for Linguistics at the UiT Library. TROLLing is a special collection within DataverseNO (https://dataverse.no/), which in 2020 was CoreTrustSeal certified as a sustainable and trustworthy repository.

Though 100 datasets over a period of more than six years may not seem a lot, the increasingly steady influx of dataset submissions from both established researchers and junior newcomers that we have seen during the last year indicates that focus on transparency and reproducibility of linguistic research is gaining ground in the international scholarly community.

Ever since the beginning, the curators of TROLLing have sought to learn from and contribute to the international discussion on good research data management. Since 2016 they have been active members of the Research Data Alliance Linguistic Data Interest Group, a community-based network which so far has worked out The Austin Principles of Data Citation in Linguistics (2018) and The Tromsø Recommendations for Citation of Research Data In Linguistics (2019). Several of its members have also contributed to the forthcoming Open Handbook of Linguistic Data Management.

Alignment with international frameworks and best-practice recommendations is a key concern for TROLLing. Since 2018, TROLLing has been recognised as a CLARIN C Centre, and harvested by the CLARIN Virtual Language Observatory (VLO). Being part of CLARIN/CLARINO, TROLLing is participating in the Social Sciences & Humanities Open Cloud (SSHOC), whose overall objective is to realise the social sciences and humanities’ part of the European Open Science Cloud (EOSC). One of the outcomes of the SSHOC project will be a new domain-specific metadata schema for TROLLing and other Dataverse-based repositories. This will make TROLLing even more compliant with CLARIN and more aligned with the FAIR principles. Also TROLLing’s participation in the FAIRsFAIR project aims at improving the FAIRness of the repository. Another important item on our road map is to assess the sustainability of the business model of TROLLing and adapt it to future challenges.

We would like to thank all our depositors and contributors and are looking forward to the next 100 datasets to be published in TROLLing!

Some key facts about TROLLing

### THE DEPOSITORS:

  • 82 contributing authors
  • Representing a total of 42 research organizations
  • From 17 countries in 4 continents
(Source: https://douwe.com/projects/visited?region=world)

### THE DATA:

  • 100 datasets containing 2 901 files
  • Keywords: 630 (tokens) / 298 (types)
  • Top 5 keywords:
Rank Keyword #
1 aspect 18
2 corpus linguistics 12
3 language change 11
4 corpus 10
morphology 10
variation 10
5 acquisition 8
  • 36 languages represented
  • Top 10 languages:
Rank Language #
1 Russian 51
2 Norwegian 21
3 English 12
4 French 9
North Saami 9
5 Ukrainian 8
6 Spanish 7
7 German 6
8 Czech 5
9 Old Church Slavonic 4
Uralic 4
10 Chinese 3
Korean 3
Latin 3
Old French 3
Slavic 3
  • 33 different file extensions
  • Top 5 file extensions:
Rank Language #
1 .wav 1 497
2 .mp3 585
3 .txt 289
4 .csv 138
5 .tab 83
  • Mostly replication data for article and book publications
  • Linked to 64 related journal articles
  • Top 3 journals:
Rank Journal #
1 Cognitive Linguistics 6
2 Diachronica 6
3 Russian Linguistics 6
4 Slavic and East European Journal 4
5 Borealis – An International Journal of Hispanic Linguistics 3
  • Linked to 8 related books
  • Book publishers:
Book Publisher #
Cambridge Scholars Publishing 2
Language Science Press 2
Cambridge University Press 1
ELiPhi 1
Narr 1
Slavica Publishers  1
  • In the last year, several datasets have been anonymised and shared with peer reviewers together with a journal or book manuscript. The private URL feature of the Dataverse software allows depositors to share their datasets before publication.

### THE USAGE (as of January 30, 2021):

  • 11 082 downloads of (parts of) datasets
  • At average 111 downloads per dataset
  • Downloads per year:

Links and references

The TROLLing repository:

Journals referring to TROLLing:

Organizations referring to TROLLing:

Other references:

CoreTrustSeal certification of DataverseNO: https://www.coretrustseal.org/why-certification/certified-repositories/.

LIBER Disciplinary Case Study: The Tromsø Repository of Language and Linguistics (TROLLing). PDF, 2019. https://doi.org/10.5281/zenodo.2668775

Open Handbook of Linguistic Data Management. To appear. https://mitpress.mit.edu/books/open-handbook-linguistic-data-management

Open Data in Linguistics – an interview with Laura A. Janda. Video, 2017. https://www.youtube.com/watch?v=8FLQwJVM-VA

Sönning, Lukas. 2020. Phonological variation in German Learner English. University of Bamberg dissertation, https://doi.org/10.20378/irb-49135.

Sönning, Lukas, 2021, “The TRAP-DRESS contrast in German Learner English: Dataset for chapter 4 in “Phonological variation in German Learner English””, https://doi.org/10.18710/ATIRRV, DataverseNO, V1, UNF:6:InisFzv6Q0s2HMUf96W1KQ== [fileUNF]

The Austin Principles of Data Citation in Linguistics. 2018. https://site.uit.no/linguisticsdatacitation/austinprinciples/

The Tromsø Recommendations for Citation of Research Data In Linguistics. 2019. https://doi.org/10.15497/RDA00040

TROLLing – why linguists need it. Video, 2014. https://www.youtube.com/watch?v=uEf0c0NT9_A

Why TROLLing is the thing to do for linguists. Blog post, 2019. https://www.inthefieldstories.net/why-trolling-is-the-thing-to-do-for-linguists/

Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. ‘The FAIR Guiding Principles for Scientific Data Management and Stewardship’. Scientific Data 3: 160018. https://doi.org/10.1038/sdata.2016.18.