Visualizing a dataset

The file students.csv contains information collected from biology students at the University of Lausanne. You can load the file in R using the command:

students <- read.csv("students.csv")

You can view the content of the dataset using the command


Look at the different variables, and think about how you would be visualize each of them separately.

As a quick reminder about R commands:

  • you can select a variable by using the "$" operator:   students$height

Then, think about how you would best display them so that you would be able to see if the variable is distributed differently between males and females.

In particular:

  • draw a scatterplot of student's height vs weight, specifying a title and using different colours according to sex.
  • plot two histograms comparing the distribution of heights for both sexes. Which issues do you have to solve ?

You can try this with both base R and ggplot2.

Survey on graphics

The file quiz.csv provides the average of the scores that you provided for the "utility" and "aesthetic" of the graphs that were shown to you. You can load it from R using the command

quiz <- read.csv("quiz.csv")

How would you visualize this data ?

Note: if you want to use ggplot2 and split the data according to the type of score (utility vs variable for example), you will need to convert the data to the "long" format (only one value per line, and "type of score" becomes a separate variable). You can do this with the melt() command from the reshape2 package:

quiz_long <- melt(quiz)

Additional question. On the Moodle website, you will also find a file with the results from the same survey obtained from different people. How would you combine the information from these two files in order to show how different the two groups of participants have answered this survey ?

Timecourse experiment

The timecourse.csv file (which can be read in the same way as the previous files) contains information about 10 animals (5 wild-type, 5 knock-out), taken over 5 time points.

How would you represent the data ?

How would you represent the data, if we are interested in seeing both the individual data points and the average per group ?

Note: you will probably need to convert the data to the "long" format, as described above.

Country data

The package gapminder contains a gapminder data frame, with information about

  • the income in different countries;
  • the life expectancy in these countries
  • the population in these countries
  • the region (continent) these countries belong to (4 main regions)

How would you plot:

  • the link between income and life expectancy, indicating as well the population and the region, for a recent year (e.g. 2007)
  • the changes observed in income and life expectancy since the beginning of the measurements ?

Last modified: Friday, 3 November 2023, 3:59 PM