This lesson is being piloted (Beta version)

R for Research

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R.

This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.

These lessons assume no prior knowledge of the skills or tools.

To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.


This lesson requires a working copy of R and RStudio.
To most effectively use these materials, please make sure to install everything before working through this lesson.

For Instructors

If you are teaching this lesson in a workshop, please see the Instructor notes.


Setup Download files required for the lesson
00:00 1. R Project Setup How to find your way around RStudio?
How to set up a version controlled project
How to interact with R?
00:30 2. Research Project Organisation How to organise project folders?
Where and how to save different types of files?
Should I version control my data?
00:45 3. Introduction to Spreadsheets What are basic principles for using spreadsheets for good data organization?
01:03 4. Formatting data tables in Spreadsheets What are some common challenges with formatting data in spreadsheets and how can we avoid them?
01:33 5. Formatting problems What are some common challenges with formatting data in spreadsheets and how can we avoid them?
01:53 6. Dates as data What are good approaches for handling dates in spreadsheets?
02:13 7. Quality assurance How can we carry out basic quality assurance in spreadsheets?
02:38 8. Exporting data How can we export data from spreadsheets in a way that is useful for downstream applications?
02:53 9. Introduction to OpenRefine What is OpenRefine useful for?
03:03 10. Working with OpenRefine How can we bring our data into OpenRefine?
How can we sort and summarize our data?
How can we find and correct errors in our raw data?
03:38 11. Filtering and Sorting with OpenRefine How can we select only a subset of our data to work with?
How can we sort our data?
03:58 12. Examining Numbers in OpenRefine How can we convert a column from one data type to another?
How can we visualize relationships among columns?
04:18 13. Using scripts How can we document the data-cleaning steps we’ve applied to our data?
How can we apply these steps to additional data sets?
04:38 14. Exporting and Saving Data from OpenRefine How can we save and export our cleaned data from OpenRefine?
04:53 15. Other Resources in OpenRefine What other resources are available for working with OpenRefine?
05:03 16. Hello World - interacting with R How to interact with R?
How to install packages?
05:43 17. Starting with Coding What data types are available in R?
What is an object?
How can values be initially assigned to variables of different data types?
What arithmetic and logical operators can be used?
How can subsets be extracted from vectors and data frames?
How does R treat missing values?
How can we deal with missing values in R?
07:03 18. Starting with Data What is a data.frame?
How can I read a complete csv file into R?
How can I get basic summary information about my dataset?
How can I change the way R treats strings in my dataset?
Why would I want strings to be treated differently?
How are dates represented in R and how can I change the format?
08:23 19. Introducing dplyr and tidyr How can I select specific rows and/or columns from a data frame?
How can I combine multiple commands into a single command?
How can create new columns or remove existing columns from a data frame?
How can I reformat a dataframe to meet my needs?
09:43 20. Data visualisation with ggplot2 What are the components of a ggplot?
How do I create scatterplots, boxplots, and barplots?
How can I change the aesthetics (ex. colour, transparency) of my plot?
How can I create multiple plots at once?
11:38 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.