Suppose that you are given information about the amount of fertilizer used on several parts of a field, in the form of a factor:
fert <- factor( c(10, 20, 50, 30, 10, 20, 10, 45) )
Starting from the factor, how would you calculate the mean amount of fertilizer used ?
The table() function allows you to count the number of occurrences of each value in a vector ; for example :
> set.seed(3) > data <- sample(1:10, 10, replace=T)
2 3 4 6 7 9
2 1 2 1 3 1
However, the table has ‘gaps’ in it. How could we obtain the same result, but including all numbers from 1 to 10, with a count of ‘0’ when the value does not appear in the vector ?Side question: the tabulate() function solves part of our question here, but it is not entirely satisfactory. Why ?
plotting a boxplot, the order of the groups is generally alphabetical;
for example, the boxplot created by the following commands has groups A,
AB, B, BA, C.
a <- runif(100)
groups <- sample( c("A", "AB", "ABC", "B", "BC", "C", "CA"), 100, replace=TRUE)
boxplot(a ~ groups)
How you can easily force a different order (e.g. A, B, C, AB, BC, CA, ABC) ?
conducting the linear regression described below, all groups are
compared to the "A" group (as the intercept, chosen because it is the
first group in alphabetical order). How can we force the lm function to
choose group "ref" as the reference ?
data <- runif(100)
groups <- rep( c("ref", "a", "b", "c"), each=25)
Exercise 3: lists
Here is an example of a list:
> mylist <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9))
- How do you extract the "name" component of the lists ?
- Look at mylist; what is the difference between mylist et mylist[]?
- How do you extract the age of the second child of Fred ?
- How can you change the number of children of Fred ?
- How can you add a new element, the family name (e.g. Smith in this case) to mylist ?
Exercise 4: functions
- Write a function that calculates (and returns) the standard deviation of a vector
- Add a parameter (e.g. na.rm) that allows the user to specify whether to remove missing values prior to calculation
- Add a parameter (e.g. biased) that allows to select whether the calculated standard deviation should be biased or unbiased (cf your course notes)
- Change the function so that it returns a list containing: 1) the standard deviation, 2) the value of the na.rm parameter
Exercise 5: example of simulation
In R, you can calculate the power of a Student's t test, or the number of samples required to achieve a given power using the power.t.test() function. However, the function allows you only to calculate the power in a simple case (both groups of equal size, same variance, etc)
Instead, perform simple simulations in order to estimate the power of a t-test under different circumstances.
Hint: you will likely need the functions rnorm() (for generation random normally distributed numbers), for() (for looping over several simulations), t.test() (for applying t-tests and extracting the p-value).
Exercise 6: a useful piece of code
Write a piece of R code that will do the following:
- it will make a list of all the files available in a given subdirectory (e.g. "data"). Hint: use the list.files() command
- It will loop over all these files (hint: use a for loop), and it will for each of them:
- read the file
- do some processing (e.g. draw a plot, calculate something, or summarize the content
As an example of files to process, you can use the data.zip file available on the Moodle website.
When this works, you can turn the code into a function, which will take one argument: the name of the directory to process.