Animath
  • Home
  • GCSE
    • Number >
      • Arithmetic
    • Algebra >
      • Equations
      • Graphs
      • Patterns and Sequences
    • Geometry and Measure
  • Contact
  • Notes
    • registers
    • A-Level 16-18 >
      • Pure Maths >
        • Surds and Indices
        • Equations of Lines
        • Quadratics
        • Polynomials
        • Binomial Expansions
        • Trigonometry
        • Differentiation
        • Summations
        • Curve Sketching
        • Matrices
        • Sequences
    • ISC Accounting
    • ISC Pure 3
    • ISC Core Maths
    • PMP Math
    • PRS
    • ISC PURE 2
    • MCR
    • ARM main
  • Projects
    • Excel Projects >
      • Enigma
      • Lorenz SZ40
      • GOL
      • 9 Pins
      • AUTM
    • WebApps
Data Manipulation Exercise

Data Manipulation Exercise

Ensure tidyverse is installed.

Exercise 1

The iris dataset is included with the R base package:

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Find out how many observations and variables there are in the dataset. Also, find out what type of variables the dataset contains.

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Exercise 2

What is the mean, median and variance of each of the numeric variables in the dataset?

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
var(iris$Sepal.Length)
## [1] 0.6856935
var(iris$Sepal.Width)
## [1] 0.1899794
var(iris$Petal.Length)
## [1] 3.116278
var(iris$Petal.Width)
## [1] 0.5810063

Exercise 3

What species of Iris are included in the dataset?

unique(iris$Species)
## [1] setosa     versicolor virginica 
## Levels: setosa versicolor virginica

Exercise 4

Calculate the mean, median and variance for each measurement type for each species of Iris.

setosa_index <- which(iris$Species == "setosa")
versicolor_index <- which(iris$Species == "versicolor")
virginica_index <- which(iris$Species == "virginica")

summary(iris[setosa_index,])
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100  
##  1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200  
##  Median :5.000   Median :3.400   Median :1.500   Median :0.200  
##  Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246  
##  3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300  
##  Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600  
##        Species  
##  setosa    :50  
##  versicolor: 0  
##  virginica : 0  
##                 
##                 
## 
summary(iris[versicolor_index,])
##   Sepal.Length    Sepal.Width     Petal.Length   Petal.Width   
##  Min.   :4.900   Min.   :2.000   Min.   :3.00   Min.   :1.000  
##  1st Qu.:5.600   1st Qu.:2.525   1st Qu.:4.00   1st Qu.:1.200  
##  Median :5.900   Median :2.800   Median :4.35   Median :1.300  
##  Mean   :5.936   Mean   :2.770   Mean   :4.26   Mean   :1.326  
##  3rd Qu.:6.300   3rd Qu.:3.000   3rd Qu.:4.60   3rd Qu.:1.500  
##  Max.   :7.000   Max.   :3.400   Max.   :5.10   Max.   :1.800  
##        Species  
##  setosa    : 0  
##  versicolor:50  
##  virginica : 0  
##                 
##                 
## 
summary(iris[virginica_index,])
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.900   Min.   :2.200   Min.   :4.500   Min.   :1.400  
##  1st Qu.:6.225   1st Qu.:2.800   1st Qu.:5.100   1st Qu.:1.800  
##  Median :6.500   Median :3.000   Median :5.550   Median :2.000  
##  Mean   :6.588   Mean   :2.974   Mean   :5.552   Mean   :2.026  
##  3rd Qu.:6.900   3rd Qu.:3.175   3rd Qu.:5.875   3rd Qu.:2.300  
##  Max.   :7.900   Max.   :3.800   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    : 0  
##  versicolor: 0  
##  virginica :50  
##                 
##                 
## 
var(iris[setosa_index,"Sepal.Length"])
## [1] 0.124249
var(iris[setosa_index,"Sepal.Width"])
## [1] 0.1436898
#etc

Exercise 5

Rename the variables as shown below and store this updated dataset as iris.newnames.

newnames <- c("SL", "SW", "PL", "PW", "Species")
iris.newnames <- iris
names(iris.newnames) <- newnames
##    SL  SW  PL  PW Species
## 1 5.1 3.5 1.4 0.2  setosa
## 2 4.9 3.0 1.4 0.2  setosa
## 3 4.7 3.2 1.3 0.2  setosa
## 4 4.6 3.1 1.5 0.2  setosa
## 5 5.0 3.6 1.4 0.2  setosa
## 6 5.4 3.9 1.7 0.4  setosa

Exercise 6

Restructure iris as shown below and save the result as iris2.

iris2 <- gather(iris, key = type, value = value, c(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width))
##   Species         type value
## 1  setosa Sepal.Length   5.1
## 2  setosa Sepal.Length   4.9
## 3  setosa Sepal.Length   4.7
## 4  setosa Sepal.Length   4.6
## 5  setosa Sepal.Length   5.0
## 6  setosa Sepal.Length   5.4

Exercise 7

Restructure iris as shown below and save the result as iris.newnames3.

iris3 <- separate(iris2, col = type, into = c("part", "measure"), sep = "\\.")
##   Species  part measure value
## 1  setosa Sepal  Length   5.1
## 2  setosa Sepal  Length   4.9
## 3  setosa Sepal  Length   4.7
## 4  setosa Sepal  Length   4.6
## 5  setosa Sepal  Length   5.0
## 6  setosa Sepal  Length   5.4

Exercise 8

Restructure iris as shown below and save the result as iris4. You do not always have to use spread(), gather(), etc. How else could you achieve this?

Length <- c(iris$Sepal.Length, iris$Petal.Length)
Width <- c(iris$Sepal.Width, iris$Petal.Width)
Part <- c(rep("Sepal", 150), rep("Petal",150))
Part <- rep(Part,2)


iris4 <- data.frame(Species = iris$Species, Length, Width, Part)
str(iris4)
## 'data.frame':    600 obs. of  4 variables:
##  $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Length : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Width  : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Part   : Factor w/ 2 levels "Petal","Sepal": 2 2 2 2 2 2 2 2 2 2 ...
##   Species Length Width  Part
## 1  setosa    5.1   3.5 Sepal
## 2  setosa    4.9   3.0 Sepal
## 3  setosa    4.7   3.2 Sepal
## 4  setosa    4.6   3.1 Sepal
## 5  setosa    5.0   3.6 Sepal
## 6  setosa    5.4   3.9 Sepal
Proudly powered by Weebly
  • Home
  • GCSE
    • Number >
      • Arithmetic
    • Algebra >
      • Equations
      • Graphs
      • Patterns and Sequences
    • Geometry and Measure
  • Contact
  • Notes
    • registers
    • A-Level 16-18 >
      • Pure Maths >
        • Surds and Indices
        • Equations of Lines
        • Quadratics
        • Polynomials
        • Binomial Expansions
        • Trigonometry
        • Differentiation
        • Summations
        • Curve Sketching
        • Matrices
        • Sequences
    • ISC Accounting
    • ISC Pure 3
    • ISC Core Maths
    • PMP Math
    • PRS
    • ISC PURE 2
    • MCR
    • ARM main
  • Projects
    • Excel Projects >
      • Enigma
      • Lorenz SZ40
      • GOL
      • 9 Pins
      • AUTM
    • WebApps