TROLLing dataset no. 100 is published!

The Tromsø Repository of Language and Linguistics (TROLLing) is happy to announce the publication of the 100th dataset in the repository!

Congrats to Lukas Sönning from the University of Bamberg as the author of this special dataset in TROLLing! His dataset on the acquisition of two English vowel categories by German L2 learner (Sönning 2021) marks a further important milestone in the history of TROLLing. The dataset is part of a series of seven datasets that Lukas Sönning has published in TROLLing to support his dissertation Phonological variation in German Learner English (Sönning 2020) defended at the University of Bamberg.

We are glad that more and more researchers are discovering TROLLing as a curated and trustworthy repository for sharing their data in way that is aligned with the FAIR Guiding Principles for scientific data management and stewardship (Wilkinson et al. 2016). Based on the Dataverse repository software, TROLLing was launched in 2014 and has ever since been operated by the University Library at UiT The Arctic University of Norway. The establishment of the repository was initiated by professors Laura A. Janda and Tore Nesset, both working with Russian linguistics at UiT. Laura and Tore’s mission was to provide linguists worldwide with a trustworthy, community-driven repository where they could share their research data, statistic code and other linguistic material and thus promote the transparency and reproducibility of linguistic studies. The repository is managed and curated by Helene N. Andreassen and Philipp Conzett, both Senior Subject Librarians for Linguistics at the UiT Library. TROLLing is a special collection within DataverseNO (https://dataverse.no/), which in 2020 was CoreTrustSeal certified as a sustainable and trustworthy repository.

Though 100 datasets over a period of more than six years may not seem a lot, the increasingly steady influx of dataset submissions from both established researchers and junior newcomers that we have seen during the last year indicates that focus on transparency and reproducibility of linguistic research is gaining ground in the international scholarly community.

Ever since the beginning, the curators of TROLLing have sought to learn from and contribute to the international discussion on good research data management. Since 2016 they have been active members of the Research Data Alliance Linguistic Data Interest Group, a community-based network which so far has worked out The Austin Principles of Data Citation in Linguistics (2018) and The Tromsø Recommendations for Citation of Research Data In Linguistics (2019). Several of its members have also contributed to the forthcoming Open Handbook of Linguistic Data Management.

Alignment with international frameworks and best-practice recommendations is a key concern for TROLLing. Since 2018, TROLLing has been recognised as a CLARIN C Centre, and harvested by the CLARIN Virtual Language Observatory (VLO). Being part of CLARIN/CLARINO, TROLLing is participating in the Social Sciences & Humanities Open Cloud (SSHOC), whose overall objective is to realise the social sciences and humanities’ part of the European Open Science Cloud (EOSC). One of the outcomes of the SSHOC project will be a new domain-specific metadata schema for TROLLing and other Dataverse-based repositories. This will make TROLLing even more compliant with CLARIN and more aligned with the FAIR principles. Also TROLLing’s participation in the FAIRsFAIR project aims at improving the FAIRness of the repository. Another important item on our road map is to assess the sustainability of the business model of TROLLing and adapt it to future challenges.

We would like to thank all our depositors and contributors and are looking forward to the next 100 datasets to be published in TROLLing!

Some key facts about TROLLing

### THE DEPOSITORS:

82 contributing authors
Representing a total of 42 research organizations
From 17 countries in 4 continents

(Source: https://douwe.com/projects/visited?region=world)

### THE DATA:

100 datasets containing 2 901 files
Keywords: 630 (tokens) / 298 (types)
Top 5 keywords:

Rank	Keyword	#
1	aspect	18
2	corpus linguistics	12
3	language change	11
4	corpus	10
	morphology	10
	variation	10
5	acquisition	8

36 languages represented
Top 10 languages:

Rank	Language	#
1	Russian	51
2	Norwegian	21
3	English	12
4	French	9
	North Saami	9
5	Ukrainian	8
6	Spanish	7
7	German	6
8	Czech	5
9	Old Church Slavonic	4
	Uralic	4
10	Chinese	3
	Korean	3
	Latin	3
	Old French	3
	Slavic	3

33 different file extensions
Top 5 file extensions:

Rank	Language	#
1	.wav	1 497
2	.mp3	585
3	.txt	289
4	.csv	138
5	.tab	83

Mostly replication data for article and book publications
Linked to 64 related journal articles
Top 3 journals:

Rank	Journal	#
1	Cognitive Linguistics	6
2	Diachronica	6
3	Russian Linguistics	6
4	Slavic and East European Journal	4
5	Borealis – An International Journal of Hispanic Linguistics	3

Linked to 8 related books
Book publishers:

Book Publisher	#
Cambridge Scholars Publishing	2
Language Science Press	2
Cambridge University Press	1
ELiPhi	1
Narr	1
Slavica Publishers	1

In the last year, several datasets have been anonymised and shared with peer reviewers together with a journal or book manuscript. The private URL feature of the Dataverse software allows depositors to share their datasets before publication.

### THE USAGE (as of January 30, 2021):

11 082 downloads of (parts of) datasets
At average 111 downloads per dataset
Downloads per year:

Links and references

The TROLLing repository:

Repository homepage: https://trolling.uit.no/
Info site: https://info.trolling.uit.no/
re3data entry: https://doi.org/10.17616/R3834T

Journals referring to TROLLing:

Borealis – An International Journal of Hispanic Linguistics: https://septentrio.uit.no/index.php/borealis/about/submissions
Cognitive Linguistics; see Data Policy: https://www.degruyter.com/view/journals/cogl/cogl-overview.xml
Poljarnyj vestnik – Norwegian Journal of Slavic Studies: https://septentrio.uit.no/index.php/vestnik

Organizations referring to TROLLing:

Brooklyn College: https://libguides.brooklyn.cuny.edu/c.php?g=984874&p=7122659
MIT: https://libguides.mit.edu/linguistics/open_sources
Ohio University: https://libguides.library.ohio.edu/c.php?g=1068784&p=7781359
The International Cognitive Linguistics Association (ICLA): https://www.cognitivelinguistics.org/en/publications-and-resources
The University of Kansas: https://guides.lib.ku.edu/c.php?g=94923&p=1224538
UC San Diego: https://ucsd.libguides.com/c.php?g=91028&p=6202117

Other references:

CoreTrustSeal certification of DataverseNO: https://www.coretrustseal.org/why-certification/certified-repositories/.

LIBER Disciplinary Case Study: The Tromsø Repository of Language and Linguistics (TROLLing). PDF, 2019. https://doi.org/10.5281/zenodo.2668775

Open Handbook of Linguistic Data Management. To appear. https://mitpress.mit.edu/books/open-handbook-linguistic-data-management

Open Data in Linguistics – an interview with Laura A. Janda. Video, 2017. https://www.youtube.com/watch?v=8FLQwJVM-VA

Sönning, Lukas. 2020. Phonological variation in German Learner English. University of Bamberg dissertation, https://doi.org/10.20378/irb-49135.

Sönning, Lukas, 2021, “The TRAP-DRESS contrast in German Learner English: Dataset for chapter 4 in “Phonological variation in German Learner English””, https://doi.org/10.18710/ATIRRV, DataverseNO, V1, UNF:6:InisFzv6Q0s2HMUf96W1KQ== [fileUNF]

The Austin Principles of Data Citation in Linguistics. 2018. https://site.uit.no/linguisticsdatacitation/austinprinciples/

The Tromsø Recommendations for Citation of Research Data In Linguistics. 2019. https://doi.org/10.15497/RDA00040

TROLLing – why linguists need it. Video, 2014. https://www.youtube.com/watch?v=uEf0c0NT9_A

Why TROLLing is the thing to do for linguists. Blog post, 2019. https://www.inthefieldstories.net/why-trolling-is-the-thing-to-do-for-linguists/

Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. ‘The FAIR Guiding Principles for Scientific Data Management and Stewardship’. Scientific Data 3: 160018. https://doi.org/10.1038/sdata.2016.18.