Exercises

Exercice 1

Suppose that you are given information about the amount of fertilizer used on several parts of a field, in the form of a factor:

fert <- factor( c(10, 20, 50, 30, 10, 20, 10, 45) )

Starting from the factor, how would you calculate the mean amount of fertilizer used ?

Exercise 2

A

The table() function allows you to count the number of occurrences of each value in a vector ; for example :

> set.seed(3)
> data <- sample(1:10, 10, replace=T)

> table(data)
data
2 3 4 6 7 9
2 1 2 1 3 1

However, the table has ‘gaps’ in it. How could we obtain the same result, but including all numbers from 1 to 10, with a count of ‘0’ when the value does not appear in the vector ?

Side question: the tabulate() function solves part of our question here, but it is not entirely satisfactory. Why ?

B

When plotting a boxplot, the order of the groups is generally alphabetical; for example, the boxplot created by the following commands has groups A, AB, B, BA, C.

a <- runif(100)
groups <- sample( c("A", "AB", "ABC", "B", "BC", "C", "CA"), 100, replace=TRUE)
boxplot(a ~ groups)

How you can easily force a different order  (e.g. A, B, C, AB, BC, CA, ABC) ?

C

When conducting the linear regression described below, all groups are compared to the "A" group (as the intercept, chosen because it is the first group in alphabetical order). How can we force the lm function to choose group "ref" as the reference ?

set.seed(1)
data <- runif(100)
groups <- rep( c("ref", "a", "b", "c"), each=25)
summary(lm(data~groups))

Exercise 3: lists

Here is an example of a list:

> mylist <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9))

  1. How do you extract the "name" component of the lists ?
  2. Look at mylist; what is the difference between mylist[1] et mylist[[1]]?
  3. How do you extract the age of the second child of Fred ?
  4. How can you change the number of children of Fred ?
  5. How can you add a new element, the family name (e.g. Smith in this case) to mylist ?

Exercise 4: functions

  1. Write a function that calculates (and returns) the standard deviation of a vector
  2. Add a parameter (e.g. na.rm) that allows the user to specify whether to remove missing values prior to calculation
  3. Add a parameter (e.g. biased) that allows to select whether the calculated standard deviation should be biased or unbiased (cf your course notes)
  4. Change the function so that it returns a list containing: 1) the standard deviation, 2) the value of the na.rm parameter

Exercise 5: example of simulation

In R, you can calculate the power of a Student's t test, or the number of samples required to achieve a given power using the power.t.test() function. However, the function allows you only to calculate the power in a simple case (both groups of equal size, same variance, etc)

Instead, perform simple simulations in order to estimate the power of a t-test under different circumstances.

Hint: you will likely need the functions rnorm() (for generation random normally distributed numbers), for()  (for looping over several simulations), t.test()  (for applying t-tests and extracting the p-value).



Exercise 6: a useful piece of code

Write a piece of R code that will do the following:

  • it will make a list of all the files available in a given subdirectory (e.g. "data"). Hint: use the list.files() command
  • It will loop over all these files (hint: use a for loop), and it will for each of them:
    • read the file
    • do some processing (e.g. draw a plot, calculate something, or summarize the content

As an example of files to process, you can use the data.zip file available on the Moodle website.

When this works, you can turn the code into a function, which will take one argument: the name of the directory to process.


Last modified: Friday, 15 November 2019, 8:54 AM