We can extract statistical summary information from our data set using the summary
function. This function will give measures such as the mean and median for each column on the data set.
summary(women)
height weight
Min. :58.0 Min. :115.0
1st Qu.:61.5 1st Qu.:124.5
Median :65.0 Median :135.0
Mean :65.0 Mean :136.7
3rd Qu.:68.5 3rd Qu.:148.0
Max. :72.0 Max. :164.0
Let’s say we want to look at sections of our data, more specifically, we want to look at the heights for women who weighed more than the average.
We can extract the correct column name using the $
symbol and we use the which
function to find the row numbers for which the weight is more than the average, 136.7.
which(women$weight > 136.7)
[1] 9 10 11 12 13 14 15
We can improve this code by using the R function mean
to find the average, and assign this value to mean_weight
:
mean_weight <- mean(women$weight)
which(women$weight > mean_weight)
[1] 9 10 11 12 13 14 15
Our final step to find the heights is to assign the row numbers to which_rows
:
which_rows <- which(women$weight > mean_weight)
women[which_rows, ]
height weight
9 66 139
10 67 142
11 68 146
12 69 150
13 70 154
14 71 159
15 72 164