Data types and classes

Tyler George

Cornell College
DSC 223 - Spring 2024 Block 7

Pivoting data

Suppose we have the following patient data:

patients
# A tibble: 3 × 4
  patient_id pulse_1 pulse_2 pulse_3
  <chr>        <dbl>   <dbl>   <dbl>
1 XYZ             70      85      73
2 ABC             90      95     102
3 DEF            100      80      70

And we want to know:

  • Average pulse rate for each patient.

  • Trends in pulse rates across measurements.

Pivoting data

Suppose we have the following patient data:

patients
# A tibble: 3 × 4
  patient_id pulse_1 pulse_2 pulse_3
  <chr>        <dbl>   <dbl>   <dbl>
1 XYZ             70      85      73
2 ABC             90      95     102
3 DEF            100      80      70

And we want to know:

  • Average pulse rate for each patient.

  • Trends in pulse rates across measurements.

These require a longer format of the data where all pulse rates are in a single column and another column identifies the measurement number.

Pivoting data

patients_longer <- patients |>
  pivot_longer(
    cols = !patient_id,
    names_to = "measurement",
    values_to = "pulse_rate"
  )

Summarizing pivoted data

patients_longer |>
  group_by(patient_id) |>
  summarize(mean_pulse = mean(pulse_rate))
# A tibble: 3 × 2
  patient_id mean_pulse
  <chr>           <dbl>
1 ABC              95.7
2 DEF              83.3
3 XYZ              76  

Visualizing pivoted data

ggplot(
  patients_longer, 
  aes(x = measurement, y = pulse_rate, group = patient_id, color = patient_id)
  ) +
  geom_line()

Types and classes

Types and classes

  • Type is how an object is stored in memory, e.g.,

    • double: a real number stored in double-precision floatint point format.
    • integer: an integer (positive or negative)
  • Class is metadata about the object that can determine how common functions operate on that object, e.g.,

    • factor

Types of vectors

You’ll commonly encounter:

  • logical
  • integer
  • double
  • character

You’ll less commonly encounter:

  • list
  • NULL
  • complex
  • raw

Types of functions

Yes, functions have types too, but you don’t need to worry about the differences in the context of doing data science.

typeof(mean) # regular function
[1] "closure"
typeof(`$`) # internal function
[1] "special"
typeof(sum) # primitive function
[1] "builtin"

Factors

A factor is a vector that can contain only predefined values. It is used to store categorical data.

x <- factor(c("a", "b", "b", "a"))
x
[1] a b b a
Levels: a b
typeof(x)
[1] "integer"
attributes(x)
$levels
[1] "a" "b"

$class
[1] "factor"

Other classes

Just a couple of examples…

Date:

today <- Sys.Date()
today
[1] "2024-03-26"
typeof(today)
[1] "double"
attributes(today)
$class
[1] "Date"

Date-time:

now <- as.POSIXct("2024-02-08 11:45", tz = "EST")
now
[1] "2024-02-08 11:45:00 EST"
typeof(now)
[1] "double"
attributes(now)
$class
[1] "POSIXct" "POSIXt" 

$tzone
[1] "EST"

Application exercise

ae-07-population-types

  • Go to Github and find the ae-07 repo.

  • Open the file called ae-07-population-types.qmd and render it.