Introduction
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How do you prepare and investigate your data?
How do you visualise your data?
How do you choose what test/model to use?
How do you check that your model performed correctly?
How do you interpret the results of your model?
Objectives
Identify different data types
Recognise different types of visualisations
What is statistics?
Statistics is the science of collecting, organizing, and interpreting numerical facts, which we call data
Moore & McCabe (1993)
Statistics is a theory of information with inference making as its objective
Mendenhall, Schaeffer & Wackerly (1986)
Statistics concerns the use of data to obtain information about real-life situations and problems
Griffiths, Stirling & Weldon (1998)
There are several different things that can all be put together as “statistics”. These can be simple summary numbers, like a count of objects, or the mean of different groups. Statistics can also mean complex visualisation and modelling.
Statistics should be involved at all stages of a research project, starting from when you first approach your research problem. Before you start performing experiments, there are
- Design of experiments
- Summarise data
- Visualise data
- Perform tests / models
- Check the model
-
Interpret the model
- Data is often collected and organised in other software, e.g. Excel
Types of variables
Quantitative Variable
Any variable which takes numerical values. Numerical values can be discrete (numerical values only assume certain values; e.g. number of children) or continuous (numerical values can assume any value within a specific range; e.g. adult heights).
In R, quantitative variables usually have the numeric
class.
Qualitative Variable
A variable which is classified into one of several categories. Qualitative variables are also known as categorical variables.
If there is no ordering of the categories, we talk of having a nominal variable and data collected on the variable as being nominal data. If the categories are ordered, we talk of having an ordinal variable and any data collected on this variable as being ordinal data.
Qualitative variables can be of several different classes in R, however it can be useful to
convert them to a factor
. Ordinal variables can use the special type of factor ordered
.
What types of variables are these?
- Weight
- Smoking status (Never smoked, Previously smoked, Currently smokes)
- Grade (percentage)
- Grade (Fail, Pass, Credit, Distinction, High Distinction)
- Agreement with a statement (Strongly agree, Agree, Neither agree nor disagree, Disagree, Strongly disagree)
Solution
- Weight: quantitative, continuous
- Smoking status: qualitative, nominal
- Grade (percentage): quantitative, continuous
- Grade (category): qualitative, ordinal
- Agreement: qualitative, ordinal
Key Points
Data should have 1 observation per row.