Site icon Equations of Disease

First steps

After reading in our data set, the first thing we can do is check how many columns and rows we have in our data set.

We can do this using the function dim, which stands for dimension.

dim(avian_data)
[1] 422   6

Our data set has 422 observations of 6 variables.

Next, to find out some summary statistics for our variables, we can use the function summary.

summary(avian_data)
     Family                   Species_name     Subspecies       Mass      
 Min.   : 17   Acanthiza chrysorrhoa:  2   australis:  4   Min.   : 5.20  
 1st Qu.:116   Acanthiza ewingii    :  2   calandra :  4   1st Qu.:11.43  
 Median :123   Acanthiza iredalei   :  2   dubius   :  4   Median :19.35  
 Mean   :122   Acanthiza lineata    :  2   flava    :  4   Mean   :27.36  
 3rd Qu.:141   Acanthiza nana       :  2   nana     :  4   3rd Qu.:35.75  
 Max.   :146   Acanthiza pusilla    :  2   sordidus :  4   Max.   :98.40  
               (Other)              :410   (Other)  :398                  
      Wing        Sex    
 Min.   : 41.70   F:211  
 1st Qu.: 60.50   M:211  
 Median : 78.02          
 Mean   : 81.34          
 3rd Qu.: 96.42          
 Max.   :148.60          
                         

This gives us the minimum, median, mean, maximum and 1st and 3rd quartiles of our quantitative variables and gives counts for our qualitative variables (e.g. Species_name).

Also, notice that R has assumed Family is a quantitative variable and given the minimum, mean etc. But this doesn’t make sense as although Family is defined using numbers, it is a qualitative variable.

Instead, we might want to us the function table, this tells us the counts of a qualitative variable.

table(avian_data$Family)

 17  35  45 114 115 116 117 122 123 128 138 140 141 145 146 
 10   8  16  20  48  54  24   8  34  24  12  40  20  36  68 
Exit mobile version