Site icon Equations of Disease

Extracting information II

We can extract statistical summary information from our data set using the summary function. This function will give measures such as the mean and median for each column on the data set.

summary(women)
     height         weight     
 Min.   :58.0   Min.   :115.0  
 1st Qu.:61.5   1st Qu.:124.5  
 Median :65.0   Median :135.0  
 Mean   :65.0   Mean   :136.7  
 3rd Qu.:68.5   3rd Qu.:148.0  
 Max.   :72.0   Max.   :164.0  

Let’s say we want to look at sections of our data, more specifically, we want to look at the heights for women who weighed more than the average.

We can extract the correct column name using the $ symbol and we use the which function to find the row numbers for which the weight is more than the average, 136.7.

which(women$weight > 136.7)
[1] 9 10 11 12 13 14 15

We can improve this code by using the R function mean to find the average, and assign this value to mean_weight:

mean_weight <- mean(women$weight)
which(women$weight > mean_weight)
[1] 9 10 11 12 13 14 15

Our final step to find the heights is to assign the row numbers to which_rows:

which_rows <- which(women$weight > mean_weight)
women[which_rows, ]
   height weight
9      66    139
10     67    142
11     68    146
12     69    150
13     70    154
14     71    159
15     72    164
Exit mobile version