AcqVA Aurora Collection in TROLLing: Deposit guidelines

(Last updated 24 February 2022, by Helene N. Andreassen & Philipp Conzett)

1. How to prepare data for depositing
1.1 Readme file
1.2. Files: Folder structure
2. How to deposit data in the TROLLing repository
2.1 Create a TROLLing user account
2.2 Create a dataset draft
2.3 Fill in metadata
2.3.1 Prefilled metadata fields
2.3.2 Series
2.3.3 Keywords: Order of appearance
2.3.4 Kind of data
2.4 Upload files
2.5 Submit for review/curation
3. How the dataset will be curated before publication
4. How to cite the dataset in your research paper
5. How the dataset draft can be shared with editors/peer reviewers
6. How to get help & more information

1. How to prepare data for depositing

If you are new to dataset archiving, you should have a look at the TROLLing/DataverseNO deposit guidelines: https://site.uit.no/dataverseno/deposit/. The guidance below covers primarily AcqVA-specific issues.

1.1 Readme file

The readme file is obligatory in all TROLLing datasets, and must minimally contain information about the following:

Title of the dataset, DOI, contact information
Methods
Data and file overview
Data-specific information
Terms of Reuse

A readme file template is available on this link.

1.2 Files: Folder structure (if applicable)

To keep the folder structure of your dataset, you need to pack the folders and single files into a container file (.zip). Procedure:

(Windows)

Open Windows File Explorer
Open the folder that contains all the folders and files of your dataset
Select all (Ctrl+A)
Right-click, and select 7-zip followed by Add to in “FOLDERNAME.zip”

(Mac)

Open Finder
Open the folder that contains all the folders and files of your dataset
Select all (Command + A)
Right-click, and select Compress

The zip file will be unpacked during upload to TROLLing, and the folders (and sub-folders) in the zip file will be uploaded with their respective folder names. Note! If your dataset contains very many files (more than 200 or so) we recommend you depositing them as zip files. Such zip files will need to be zipped once more in order for the “inner” zip file to be retained during upload. The name of the “outer” zip doesn’t matter as it will be unpacked during upload.

2. How to deposit data in the TROLLing repository

2.1 Create a TROLLing user account

Before depositing your first dataset in the AcqVA Collection, you need to create a TROLLing user account.

If you’re an employee (or a student) at a Norwegian research institution, you will use Feide log-in. Follow the steps described on this page in order to get deposit access to TROLLing. In addition, send a short email to support@dataverse.no asking for deposit access to the AcqVA Collection. Once you are granted access, you will be able to create a dataset: Go to the AcqVA Collection (https://dataverse.no/dataverse/acqva), log in using the option Your Institution.

For other users: Go to the TROLLing homepage (https://trolling.uit.no/), and click on Sign Up (in the middle of the page) in order to apply for a TROLLing user account. You will get a request confirmation by email. In addition, send a short email to support@dataverse.no asking for deposit access to the AcqVA Collection. Once you are granted access, you will be able to create a dataset: Go to the AcqVA Collection (https://dataverse.no/dataverse/acqva), log in using Other options – Username/Email.

2.2 Create a dataset draft

After logging in, click the Add Data button on the right-hand side, and select New Dataset to create a dataset draft.

2.3. Fill in metadata

Your first task is to fill in metadata about your dataset. Some metadata fields are repeatable. If you want to add more metadata in such fields, click the plus sign to the right.

Be aware that metadata are entered in two rounds: First, you are asked to enter obligatory metadata. Once you’ve saved these, a more elaborate metadata template will appear, where you can fill in more information.

The following adaptations apply specifically to the AcqVA collection:

2.3.1 Prefilled metadata fields

The metadata template for the AcqVA Aurora Collection contains some fields that are pre-filled with information. If any of the information does not fit with the dataset in question, edit or delete it.

Field	Pre-filled content	Comment
Author – Affiliation	UiT The Arctic University of Norway
Contact – Affiliation	UiT The Arctic University of Norway
Description – Text	[Article abstract:] [Dataset abstract:]	Note: The dataset must contain at least a description of the dataset. If you aren’t adding also an article abstract, delete the prefilled text («[Article abstract»), and remove the field by clicking the – sign to the right.
Subject	Arts and humanities
Language	English	This is the language used to describe the data. If you’ve used another language than English, update. The language(s) of the study should be entered as keyword(s).
Producer – Name	UiT The Arctic University of Norway
Contributor	Research Group: UiT Aurora Center for Language Acquisition, Variation & Attrition: The Dynamic Nature of Languages in the Mind Funder: UiT The Arctic University of Norway Hosting Institution: UiT The Arctic University of Norway
Grant Agency	UiT The Arctic University of Norway	If the deposited data is from a project with additional funding, duplicate the Grant Information field, and add the additional agency and number.
Grant Number	2062165
Distributor	The Tromsø Repository of Language and Linguistics

2.3.2 Series

If the research is conducted as part of a specific project, the project name should be entered in the field Series.

2.3.3 Keywords: Order of appearance

If you have many keywords, you may want to group them in order of appearance, e.g.

Keywords informing about the languages involved
Keywords informing about the research question(s)
Keywords informing about the methods used
Keywords informing about the type of participants

2.3.4 Kind of data

In the Kind of data field, you should specify the type of content in the dataset. Here is a short list of examples (if you want other suggestions, you may hover over the question mark next to the field heading):

experimental data
clinical data
textual data
stimuli
code

2.4 Upload files

In the Files tab, click Select Files to Add and choose your files from your computer or server space. You may select multiple files using Shift + arrow keys or Ctrl/command + mouse click. To select all files in a given folder, press Ctrl/command + A.
When you have uploaded all the files in your dataset, click Save Dataset. Your dataset draft will be saved.

Note! If your dataset contains a lot of files, it is convenient to have the ReadMe file displayed on top of the file list. To achieve this, you can add an initial zero to the file name, e.g. “0_ReadMe.txt”.

Uploading a zip file: The zip file will be unpacked during upload, and the folders (and sub-folders) in the zip file will be uploaded with their respective folder names.

Note! If you have kept your folder structure by uploading a zip file, you need to select Tree View to be able to see the folder structure: Click the Tree button to the right of Change View on top of the file overview.

2.5 Submit for review/curation

Once your dataset is ready for sharing/publication, click the Submit for review button. A TROLLing curator will then check whether your dataset complies with the deposit guidelines.

3. How the dataset will be curated before publication

When you click “submit for review”, one of the TROLLing curators will go through your dataset (usually within 3 working days), with particular focus on:
- The metadata
- The readme file
- The file format(s)
- The license and any information connected with this
You get a curation report with any suggested (and in some cases required) modifications. When you have finished a revised version of the dataset, you need to click “submit for review” to notify the curators about this.
When the dataset complies with the deposit guidelines and both parties are satisfied with the result, the dataset is ready for publication. You now have three options: 1) publish the dataset, 2) share the unpublished dataset with journal editors using a private URL, 3) ask for an anonymized version of the unpublished dataset and share with journal editors using a private URL (see section 5).

4. How to cite the dataset in your research paper

In addition to the data availability statement, now common for many online journals, you should include a reference to your dataset in the manuscript, where you see it fit (e.g., where you present your data collection or your results). Here’s an example of an in-text citation and the corresponding reference:

A bibliographic reference is automatically created for your dataset in TROLLing, based on the metadata you enter. Make sure you include all these elements in the bibliographic reference in the research paper. Depending on the guidelines applying for the publications channel you are using, you may have to adjust the reference style. Here’s an example of a ready-made reference in TROLLing, in the blue box:

If you need more details on how to cite linguistic research data, see the following resources:
- Andreassen, H. N., Berez-Kroeker, A. L., Collister, L., Conzett, P., Cox, C., De Smedt, K., McDonnell, B., & Research Data Alliance Linguistic Data Interest Group. (2019). Tromsø recommendations for citation of research data in linguistics (Version 1). https://doi.org/10.15497/RDA00040
- Conzett, P., & De Smedt, K. (2022). Guidance for Citing Linguistic Data. In A. L. Berez-Kroeker, B. McDonnell, E. Koller, & L. B. Collister (Eds.), The Open Handbook of Linguistic Data Management (pp. 143-155). MIT Press. https://doi.org/10.7551/mitpress/12200.003.0015

5. How the dataset draft can be shared with editors/peer reviewers

If you plan to submit your manuscript to a journal with open or single-blind peer review, you may share the unpublished dataset along with the manuscript. To do so, you need to ask the TROLLing curators to provide you with a so-called private URL for your dataset.
If you plan to submit your manuscript to a journal with double-blind peer review, your dataset needs to be anonymized. This needs to be done manually, by the TROLLing curators.
- Procedure:
  - In the metadata field Related Publication, write that you wish an anonymized version of your dataset.
  - Once the dataset is curated, the TROLLing curators create an anonymized version with a private URL that you may send to the editors alongside your manuscript.
  - Before sharing it with the editors, you should have a look at the anonymized dataset to double-check there is no information that could reveal your identity.
  - The anonymized version of your dataset will be deleted once your manuscript has been accepted for publication, and your original dataset will be published. This means that in your article manuscript you should refer to your original dataset, but in an anonymized way. See example here:
    - NN, 2021, “Replication data for: Salience-simplification strategy to markedness of causal subordinators: The case of “because” and “since” in argumentative essays”, https://doi.org/10.18710/RULYMP, DataverseNO, DRAFT VERSION. An anonymized version of the dataset is available at https://dataverse.no/privateurl.xhtml?token=ccc4b175-ec2f-4488-99c8-c410d5d9260f.
    - Note! The anonymized version has a different DOI.
  - When your manuscript has been accepted for publication, you should change any anonymized in-text citations as well as the bibliographic reference. In case there has been a turn of the year since you added the reference to the DRAFT version of the dataset, change the publication year of the dataset reference. See the finalized (and published) version of the above example here:
    - Kang, Hui; Xu, Jiajin, 2021, “Replication data for: Salience-simplification strategy to markedness of causal subordinators: The case of “because” and “since” in argumentative essays”, https://doi.org/10.18710/RULYMP, DataverseNO, V1.
  - To publish your original, non-anonymized dataset, you need to inform the TROLLing curators.
  - Note! Until this point, your dataset has the status draft and may be modified or deleted.

6. How to get help & more information

AcqVA Aurora Lab –> Resources –> Open Data: https://site.uit.no/acqvalab/resources/#open_data
TROLLing/DataverseNO deposit guidelines: https://site.uit.no/dataverseno/deposit/
Support e-mail: support@dataverse.no
PowerPoint presentations
- Meeting 2021-11-15: Archiving research data: Whys and hows
- Meeting 2021-06-16: TROLLing: The Tromsø Repository of Language and Linguistics