Exercices
Visualizing a dataset
The file students.csv contains information collected from biology students at the University of Lausanne. You can load the file in R using the command:
students <- read.csv("students.csv")
You can view the content of the dataset using the command
View(students)
Look at the different variables, and think about how you would be visualize each of them separately.
As a quick reminder about R commands:
- you can select a variable by using the "$" operator: students$height
Then, think about how you would best display them so that you would be able to see if the variable is distributed differently between males and females.
In particular:
- draw a scatterplot of student's height vs weight, specifying a title and using different colours according to sex.
- plot two histograms comparing the
distribution of heights for both sexes. Which issues do you have to solve ?
You can try this with both base R and ggplot2.
Survey on graphics
The file quiz.csv provides the average of the scores that you provided for the "utility" and "aesthetic" of the graphs that were shown to you. You can load it from R using the command
quiz <- read.csv("quiz.csv")
How would you visualize this data ?
Note: if you want to use ggplot2 and split the data according to the type of score (utility vs variable for example), you will need to convert the data to the "long" format (only one value per line, and "type of score" becomes a separate variable). You can do this with the melt() command from the reshape2 package:
library(reshape2)
quiz_long <- melt(quiz)
Additional question. On the Moodle website, you will also find a file with the results from the same survey obtained from different people. How would you combine the information from these two files in order to show how different the two groups of participants have answered this survey ?
Timecourse experiment
The timecourse.csv file (which can be read in the same way as the previous files) contains information about 10 animals (5 wild-type, 5 knock-out), taken over 5 time points.
How would you represent the data ?
How would you represent the data, if we are interested in seeing both the individual data points and the average per group ?
Note: you will probably need to convert the data to the "long" format, as described above.