Introduction
Overview
Teaching: 30 min
Exercises: 30 minQuestions
What are the objectives of this workshop?
Objectives
Learn who your fellow students are with an Icebreaker
Sign into the shared editable document and do some jargon busting
Acknowledgement of country
Code of Conduct
Introductions
- Introduce instructors and helpers
- Introduce workshop over two days - where we are going and how we will get there
Protocol for Zoom - Sticky Notes and reactions
- Pop out the chat and reactions windows.
- Use chat to ask questions of each other at any time. This is one of the more awesome benefits of an online workshop. Side conversations are entirely encouraged.
- Chat to everyone if you have a question or response to everyone, or to the instructors.
- Chat to individuals if you want to hold a private conversation.
- If you chat to someone privately, their chat usually changes to private in response. Don’t forget to check who you’re chatting to.
Everyone: say hi in chat! Say hi to someone else, privately!
We will use reactions to see how folk are going. Use the green check to say “I’m done with the challenge! Everything works!” Use the red x to say “could someone contact me to help me?” It is awesome to help each other out, and to ask for help. This is one of the best ways to learn. Use the raise hand icon to ask for our attention. (Remind us in chat if we don’t see your hand after a while.)
Everyone: Put up a big happy green check!
Plan for the workshop
Part 1
- Research Data Management
- File Manipulation
Part 2
- Backups
- Collaboration
- FAIR and the paper of the future!
Shared editable document
Ice-breaker (10 minutes / 5 minutes discussion)
- Get to know your fellow participants (in breakout rooms).
- “What … is your name?”
- “What … is your quest?” (Or, for those who are not Monty Python fans – perhaps instead you can ask “What drives you? Or, what is your current major life goal?”)
- The third question is up to you. If you are stuck, options could include: “What is your favourite colour?”, “What is the airspeed of an unladen swallow?” or “What is most recent research paper they found interesting?”
- Write a two sentence summary of one of your fellow participants that you met during the icebreaker in the shared document.
Jargon busting
This exercise is an opportunity to begin to ask questions and to get a firmer grasp on the concepts around research data management.
Challenge - Jargon‽ (10 minutes / 5 minutes discussion)
In pairs, look at the list of terms on the shared document.
- research data
- research data management
- upload/download
- data movement
- cloud storage
- cloud computing
- sync
- collaborative document
- collaborative editing
- etherpad
- active data
- sensitive data
- repository
- GitHub
- computational notebooks/Jupyter Notebooks
Are you familiar with these terms in this context? What are the ones that trip you up? Think of a way to remember what that word or term means in this context that might help others understand it better. How could you define a term (or two!) above to make it easier to understand? Write your definitions down in the etherpad - we can add to this list as we go and keep it as a resource for the future.
Key Points
Explore some of the tools and methods for effective active data management.
Research Data Management
Overview
Teaching: 20 min
Exercises: 75 minQuestions
What is Research Data Management?
Why should a researcher care about data management?
Objectives
Discuss research data
Learn about the research data lifecycle
Discover anti-sharing patterns
Research Data
What is research data? What examples can you think of?
Some reading here: https://ardc.edu.au/wp-content/uploads/2020/01/What-is-research-data.pdf
Research Data lifecycle
Research data lifecycle diagrams give an overview of the stages and processes required for the successful management of data. By showing the different phases a dataset goes through, the data lifecycle concept describes each action as a research project moves from begin an idea to making discoveries and then sharing those discoveries.
This is an example of a research data lifecycle diagram. Is there something surprising in there for you? Or something missing? Where are you at in this cycle right now?
Image credit: UC Santa Cruz University Library
Research Data Management (aka RDM)
Research data management includes a range of activities, all related to how to maintain sound practices around research data collection, organisation and sharing.
More info here: https://staff.mq.edu.au/research/strategy-priorities-and-initiatives/data-science-and-eresearch/research-data-toolkit
What could possibly go wrong? Why RDM?
We’ve all no doubt heard of some data disasters by now - people losing data in different ways, data being shared inappropriately.
Let’s tart with a chat about the kinds of things that happen when researchers wing it.
Challenge - A video on data sharing (30 minutes)
- In breakout rooms, watch this video in small groups: Data Sharing and Management Snafu in 3 Short Acts
- Discuss: Have you run into any of these scenarios? What happened? How should this have gone? Write your room’s conclusions in the etherpad.
- Share: When we get back together, each room should share some of its worst horror stories (with the names changed to protect the innocent).
But rather than being motivated by fear, there are lots of good reasons to be your own best data manager - the one most likely to benefit (and suffer) is you!
Data Management Policies and Plans
Best laid plans
Research data is important! Please take care of it.
Challenge - Responsibilities (5 minutes)
Which of these does NOT count as active research data? Put a +1 in the shared document next to which one you think is right!
- A database
- A research publication
- Field notes
- Audio and video tapes
Solution
A research publication. These are taken care of by publishers. Even when held within an institution, either on open access or for research reporting purposes, these tend to be managed separately from other research data.
Universities around the world have developed a range of policies and procedures outlining their expectations around research data management. Let’s have a look at some university’s RDM statements.
Practice makes perfect. :smile:
Challenge - RDM, status quo (30 minutes)
As a group, choose (at least) one project that one person in your breakout room is running.
Try to answer the following questions in the shared document:
- What backup requirements does your project have? What risks are they protecting against? How confident are you in the backups?
- What data publishing requirements do the funders or universities involved in the project have?
- Follow the “responsbilities of institutions and responsibilities of researchers sections” in the NHMRC’s requirements. How well does the project stack up?
Write your answers in the shared document as a team.
OK. Let’s say that you have suddenly become responsible for an entire team’s research data. If anything happens, it’s on you. What can you do?
Challenge - Arguments for Collaboration (15 minutes)
Take a quick read of Dropbox sharing article.
In the shared document, in breakout rooms, make a single paragraph argument for a team to switch to cloud for collaboration. Make sure you address why which cloud provider they choose, matters.
As everyone: discuss which paragraphs are most persuasive?
Key Points
Research Data Management is the planning and application of a plan to manage the full data lifecycle, from creation to collection and analysis and then to publication and archiving.
A good data management plan will result in more consistent data, less time spent cleaning data, and reusable data that can generate better impact.
Active Data and Files
Overview
Teaching: 10 min
Exercises: 40 minQuestions
What do I need to consider when working with active research data?
What are some ways of organising files and directories that will make it easier to collaborate with others?
Objectives
Understand safe and secure ways of sharing and storing data
Learn how to organise, maintain and store your active data
Working with active research data
There are lots of simple ways you can manage your active data so that it is easy to find and safely stored.
You can set up an automatic way to move your data to the cloud so that your data is safe and secure. This is called ‘syncing’. You can also ensure that you are sharing your data securely by knowing what tools are best for this operation (hint: might not be email!).
A little bit of effort now will pay off tenfold in the future.
Challenge - Think, pair, share (15 minutes)
Where is your data now? How do you store, share, sync, protect and back up your files? Could this be done differently?
Important considerations
Whatever system you choose:
- There must be one and only one copy of the “authoritative” version of a file.
- Everyone must be able to access that version at all times.
- A history must be available so that older versions are accessible at need.
CloudStor
1. Go to the AARNet website: https://www.aarnet.edu.au/
2. Click on 'Log In and Tools' in the top righthand corner of the page.
3. Select 'CloudStor'.
4. Choose your organisation and click on 'Login at AARNet'.
5. Sign-in with your credentials - user name and password - and click 'Login'.
You are now in CloudStor, which is a cloud storage environment.
Data and Cloudstor
Continously Syncing
Using the ownCloud Sync Client you can keep your computer and your CloudStor folders up to date automatically.
Here is the CloudStor Getting Started Guide.
Pushing and Pulling
Here, we can take advantage of Cloudstor’s versions. Groups can intentionally upload data when they want to preserve it or share it at that point in time, and other collaborators can download that data. With this, we can then see old versions, if needed.
Versioning Demo
Here, your instructors will demonstrate uploading and downloading data and retrieving old versions.
FileSender
FileSender is a way to send many and large files to your colleagues and external collaborators that can be encrypted so it is a secure transfer.
Challenge - Send yourself a file (10 minutes)
- Learn about filesender ye Using FileSender in CloudStor, send a file to yourself. Make sure you receive a file back from that same email address. What’s the maximum file size? Are there any limits on file type?
File manipulation
Take a look at your file organisation. How can you do this better?
Let’s now look at the benefits and best practices of gathering your data in one place, using planned folder structures(also known as ‘directories’) and naming conventions for your files.
Image credit: Clare Trowell
Challenge - Organising a directory (15 minutes)
- Open the Cambridge Organising your Data Guide
- Take a look at the sample file directory on cloudstor
- Download that directory: Click on the ... icon, and choose download. Unzip the folder on your computer.
- Using the guide, rename the folders using best practice naming conventions.
- Upload the folder onto cloudstor. Drag and drop the folder onto the white space at the top of the window next to the plus button.
Optional Challenge
Did you know you can update file names automatically?
Windows users
Read this set of instructions then reorganise and name the images so that they are easily found and recognised.
Mac users
Read this set of instructions then reorganise and name the images so that they are easily found and recognised.
Linux users
Read this set of instructions then reorganise and name the images so that they are easily found and recognised.
Optional Challenge
Set up the CloudStor sync client.
Syncing data
Using the ownCloud Sync Client you can keep your computer and your CloudStor folders up to date automatically.
Sync Password
If you wish to use the optional sync clients from ownCloud, you will be required to set a separate ownCloud sync password. This password can be set from the > “Settings” page.
Note: This password is only for use with the sync clients and is not used to login to CloudStor.
Download the Desktop Sync Client
You can download the Cloudstor sync client for Windows, OSX, IOS, Linux and Android here.
Follow the instructions to install the client.
Configuring the ownCloud Desktop Sync Client
- Install and open the ownCloud Desktop Sync Client. You will be prompted for the server address, enter https://cloudstor.aarnet.edu.au/plus and click > “Next”
- Enter your institutional email address as your username and the sync password you created (not the password for your institutional login) and then click > “Next”.
- Ensure “Sync Everything” is selected and then click “Connect…” and CloudStor will start synchronising your data.
- Click on Cloudstor icon to expand the folders you have access to from the web client.
- To access a group folder, click on Shared folder to expand.
- Click on the folder you need to access.
NOTE: If saving or adding documents it can take a few seconds to 30 sec to update the web client. If large volumes of data, it can take a while.
Key Points
There must be one and only one copy of the ‘‘authoritative’ version of a file.
Everyone must be able to access that version at all times.
A history must be available so that older versions are accessible at need.
Files must be consistently named, without spaces.
Think about where you keep your research and how you share it.
Back-ups and recovery
Overview
Teaching: 25 min
Exercises: 50 minQuestions
Why are automated backups important?
How do I make a backup plan?
Objectives
Learn who to ask for help
Know what the back up systems are
The first rule of backups
It is not a backup until you have restored from it. – Brian Ballsun-Stanton
What is a sync client?
A sync client is, “[A] continuous file synchronization program. It synchronizes files between two or more computers in real time…” (Syncthing 2020)[https://syncthing.net/]
- Keeps a local and remote copy up to date
- Designed for easy migration between computers, (i.e. work and home)
- Always on, presumes internet
- Does not provide versioning to a reliable level.
What is a Backup client?
- Not designed to share data
- No risk of “overwriting” local data accidentally.
- Can capture all files, or the latest changes in files
- Keeps controllable versions with a specific retention policy. (Are there legal lengths which must be obeyed? Are there risks for keeping longer?)
- Designed to run automatically, but not continously.
- If a cloud service, encrypts your data such that they cannot access it?
Challenge: Common clients (10 minutes)
Add comments in the shared document.
- Is Dropbox a backup client?
Solution
No. It can overwrite your local files, shares file data at its discretion to “Dropbox uses certain trusted third parties (for example, providers of customer support and IT services) to help us provide, improve, protect, and promote our Services.” (Wired)[https://www.wired.com/story/dropbox-sharing-data-study-ethics/], is not liable for data loss, and saves versions up to 30 days only.
- Is OneDrive a backup client?
Solution
This is a trick question. The OneDrive sync client is not a backup, because it can overwrite local files on your computer. While Micrsoft advertises OneDrive as a backup client, with the backup your folders option, it looks like this option is syncing, rather than creating backups (See backblaze’s discussion of differences and the discussion inside microsoft’s answers site. But you only have 3-30 days to recover deleted files in standard accounts, which makes this not a backup.
Sync clients and backup clients solve different problems, and with some care, there’s no problem running both.
The 3-2-1 Rule
When storing data, follow the 3-2-1 rule:
Keep 3 copies of your files in 2 different locations, with 1 copy in a location in another geographic area
Master copy: Keep at secure location
Working copy: Keep on a reliable/safe device or locations
Back up copy: Keep off-site
Challenge - Why have a 3-2-1 strategy? (10 minutes)
What could possibly go wrong? Write your answers in the shared document.
How to make a backup plan
- What do you want to protect? Where is it?
- Articulate risks
- Examples:
- Formally:
- What is the cost of replacing all or part of the data?
- What is the cost of a data breach?
- Where is the data stored and what can happen at each place where the data lives?
- Multiply the cost of the above by the odds of failure.
- 1-3% a year for a hard drive failing
- Odds of fire, workplace theft, etc. (This rabbit hole can go as deep as you like.) Mean fire experience probability says “The mean annual probability of having a fire experience whilst an adult was .0125.”
- Your budget for your backups should be lower than the odds of the thing occuring times the cost of fixing it.
- Find automatic services for your 3-2-1 strategy
- The more you have to do manually, the less the odds of a backup being performed during stressful or busy times.
- Most important backup is the offsite backup. Find a service or program that can offer versions and deleted files out to the entire history of your backup. Make sure it runs automatically on all of your computers.
- Backups’ backup, in case your online account is compromised is your local backup. Figure out what high-relability media you will be using (a USB flash drive is not reliable, external hard disks are adequate for the purpose) and how you can semi-automate backups.
- Figure out your test strategy.
- Backups which haven’t been tested are not backups
- No, really
- Restoring from a historical version of your backups: “I want last tuesday’s version of this document” is a great way to know that the entire system is working.
- Document what you need to do to backup and to restore data, using a checklist. Assume you will be panicked and not remembering anything when you really need to restore data.
Challenge - Backup Plan (30 minutes)
Group: Make, document, and share a backup plan. Use a variety of resources and share those too. Think, pair, share
Key Points
Automated backups can reduce the time and money costs of things going badly.
A backup plan should be in response to your data’s sensitivity and difficulty of collection. It should address risks and mitigations of those risks. It should cost less than being hit by the thing you are defending against.
Collaboration
Overview
Teaching: 15 min
Exercises: 45 minQuestions
How do I work on the same documents as others?
How do I share data using the cloud?
How do I build citations with the rest of my team?
Objectives
Use collaborative files
Understand group allocations
Getting and sending (sensitive) data
Why not email?
- Not encrypted
- Cannot control access
- Cannot restrict when access ends
- Limited file size
Filesender as one mechanism for exchanging data.
Challenge - As a breakout room, securely send files around (10 minutes)
Share a dataset from a group drive in Cloudstor via Cloudstor token (link), inviting someone to the document and via FileSender
Collaboration and Citations
As academics we will collaborate on at least two of the following things:
- Papers
- Analysis
- Citations
Live Collaboration
Challenge: Investigate and Report. (45 minutes)
In your breakout groups, investigate one of the following paper+citation collaboration options. Make sure to see if you can get a group making edits to the same document and adding citations to the same bibliography.
Stuff to investigate with the resources below:
- Make a group on whatever citation system you’re using. A group is defined as some sort of shared space where different members of your breakout room can add citations. (Bonus points for editing.)
- Everyone add 1 citation from a paper they have open. Otherwise search for papers about FAIR. Use the tool’s automatic metadata harvesting if possible.
- Everyone open up a shared document in whatever system you’re trying. A shared document is one where everyone can contribute changes and edits without needing to email the document around.
- Try to add your citation to that shared document, along with a 1 sentence summary of the document (or a copy-paste from its abstract)
- Try to generate a PDF.
- Summarise your outcomes in the hackmd shared document.
- (Paper, version control, citations) Manubot and Github
- (Collaborative editing, collaborative citations) Google Docs and Paperpile
- (Collaborative editing, typesetting, citations) Overleaft and Zotero
- (Rmarkdown, paper, code, citations, version control) R markdown, Zotero, citations from zotero in R
- (Collaborative editing) Word 365 and zotero on your system
Key Points
First key point. Brief Answer to questions. (FIXME)
FAIR data and data publishing
Overview
Teaching: 5 min
Exercises: 10 minQuestions
What is FAIR data?
How do you publish data?
Objectives
Appreciate the FAIR data standards
Share data on OSF
How do you publish data for a paper?
- Find a repository
- Add metadata
- Add a license
- Check for identifability
- Make peer review links
- Publish
- Mint DOI
https://journals.sagepub.com/doi/pdf/10.1177/2515245918757689
https://help.osf.io/hc/en-us/articles/360024207633-Sharing-Research-Outputs
What is FAIR?
- https://go-fair.org
- https://cos.io/top/
FAIR and Active Data Management
https://osf.io/prereg/
Optional challenge - add a preregistration on test.osf.io
Take a look at sample preregistrations: https://osf.io/e6auq/wiki/Example%20Preregistrations/?view Try to make a preregistration, pretending you are one of those projects: https://help.osf.io/hc/en-us/articles/360019738834-Create-a-Preregistration
Metadata
Challenge - Add a readme, and metadata for your data, and license to your folders
Write a small readme and choose a license for a dataset you have.
Publish your data
Upload test data to test.osf.io
https://journals.sagepub.com/doi/full/10.1177/2515245918757689
FAIR links:
- https://fair-software.nl
- https://librarycarpentry.org/Top-10-FAIR//2018/12/01/research-software/
- https://librarycarpentry.org/Top-10-FAIR//2019/09/06/astronomy/
Key Points
Publishing your data is important, with good license and sharing information.