Dates and Times

Written by Gül İnan and Modified by Tyler George

Cornell College
DSC 223 - Spring 2024 Block 7

Goals

  • Types of date and time objects
  • Lubridate!

Setup

library(tidyverse)

Overview

  • Up to now, we have dealt with: numeric, character, logical, and factor type of vectors.
  • In data science projects, we may encounter variables that are dates and times.
  • Dates and times can be used to quantify when exactly a certain event occurs and can help us understand chronological relationships.

Date-time functions/classes

  • Luckily, R provides several options for dealing with date and date/time data.
  • Just as numbers and strings have their own class in R, date and date/time objects have their own class too.
  • In base R, the built-in as.Date() function with Date class handles dates (without times); and the as.POSIXct() function with POSIXct class and as.POSIXlt() function with POSIXlt class allow for dates and times with control for time zones.

Date-time functions/classes 2

Type Function Class Description
date as.Date() Date represent calendar dates.
date-time as.POSIXct() POSIXct stores seconds since epoch (since January 1, 1970).
date-time as.POSIXlt() POSIXlt stores a list of day, month, year, hour, minute, second, etc.
  • Note: POSIX stands for Portable Operating System Interface and it is a family of standards specified by the IEEE Computer Society for maintaining compatibility between different operating systems.
  • The POSIX classes are especially useful when time zone manipulation is important.
  • ct stands for calendar time.

Dates for Computers

  • Computer languages usually use January 1, 1970 as the epoch which is an origin (reference) day so that the dates can be converted to numbers. Dates older than the origin are stored as negative integers.
  • Similarly, in R, except for the POSIXlt class, dates are stored internally as the number of days or seconds from January 1, 1970.
  • The POSIXlt class stores date/time values as a list of components (day, month, year, etc.) making it easy to extract these parts.
  • Unless you need the list nature of the POSIXlt class, the POSIXct class is the usual choice for storing dates in R.

Getting Started

  • Let’s get the current system date and time.
# get system date
Sys.Date()
[1] "2024-04-08"
# check the class
class(Sys.Date())
[1] "Date"
# get the system time
Sys.time()
[1] "2024-04-08 20:25:29 CDT"
# check the class
class(Sys.time())
[1] "POSIXct" "POSIXt" 

Dates across the globe

  • There are many ways to write a date and date formats vary across the world.
  • Having different formats could be confusing for people that come from different countries to understand.
  • The ISO 8601 recommends writing the date as year, then month, then the day: YYYY-MM-DD and uses the 24 hour clock system.
  • This default format used in R is also the ISO 8601 standard for date/time.
Sys.time()
[1] "2024-04-08 20:25:29 CDT"

Create Date Objects

  • We can create date objects through as.Date() function in base R.
#check
class(2021-10-04)
[1] "numeric"
class("2021-10-04")
[1] "character"
#create a data object in base R
school_start_date <- as.Date("2021-10-04")
class(school_start_date)
[1] "Date"
#number of days since 1970-01-01
unclass(school_start_date)
[1] 18904
#How long has been since the school started?
Sys.Date()-school_start_date
Time difference of 917 days

Basic Functions

  • Many of the statistical summary functions, like mean, min, max, etc are able to transparently handle date objects.
important_dates <- as.Date(c("2021-11-01","2021-10-01","2021-09-01"))
important_dates
[1] "2021-11-01" "2021-10-01" "2021-09-01"
min(important_dates)
[1] "2021-09-01"
max(important_dates)
[1] "2021-11-01"

Basic Functions 2

  • The by=argument to the seq() function can be specified in any units of time that the function accepts, making it very easy to generate sequences of dates.
seq(from=as.Date('2021-10-04'), to=as.Date('2022-01-01'), by='10 days') #1 day, 10 days, 2 weeks, 3 months
[1] "2021-10-04" "2021-10-14" "2021-10-24" "2021-11-03" "2021-11-13" "2021-11-23"
[7] "2021-12-03" "2021-12-13" "2021-12-23"
  • where unit of time in by argument can be: “secs”, “mins”, “hours”, “days”, “weeks”, and “months”.

Date Formatting

  • The as.Date() function allows a variety of input formats through the format input argument.
Code Value
%d Day of the month (decimal number)
%m Month (decimal number)
%b Month (abbreviated)
%B Month (full name)
%y Year (2 digit)
%Y Year (4 digit)

Date Formatting 2

#convert it into a data object with YYYY-MM-DD 
#format argument tells the input structure, output is always in YYYY-MM-DD
as.Date('15-11-2021', format = '%d-%m-%Y')
[1] "2021-11-15"
#convert it into a data object with YYYY-MM-DD 
#format argument tells the input structure, output is always in YYYY-MM-DD
as.Date('November 15, 2021', format='%B %d, %Y')
[1] "2021-11-15"

Format function

  • You can also change the format of the date into another date form through format() function.
Sys.Date()
[1] "2024-04-08"
format(Sys.Date(), '%B %d, %Y')
[1] "April 08, 2024"

Base R date

Use:

  • Date when there is no time component,
  • POSIXct when dealing with time and time zones, and
  • POSIXlt when you want to access/extract the different components.

Lubridate

lubridate package

  • The tidyverse ecosystem includes lubridate package for dealing with date-times and time-spans.
  • It is fast and user friendly helping with:
    • Parsing of date-time data,
    • Extraction and updating of components of a date-time, and
    • Algebraic manipulation on date-time and time-span objects.
  • First, to be able to use the functionality of lubridate we have to load the package:
# The easiest way to get lubridate is to load the whole tidyverse:
# library(tidyverse) or
# Alternatively, load just lubridate:
library(lubridate)

Parsing dates and times (Converting strings or numbers to date-times)

  • The lubridate package has a number of functions to convert strings to date and date-time objects.
  • Here, we simply try to match the characters to the pattern of the date-time we are trying to parse.
  • The goal is to end up with a date-time in the form: YYYY-MM-DD HH:MM:SS.
  • For parsing the date, in lubridate we can use a combination of the letters ‘d’, ‘m’, ‘y’ (standing for day, month, year).

Functions to parse dates in Lubridate

  • Use the functions below whose name replicates the order.
Function Description
ymd() converts characters into YYYY-MM-DD format.
ydm() converts characters into YYYY-MM-DD format.
mdy() converts characters into YYYY-MM-DD format.
dmy() converts characters into YYYY-MM-DD format.
hms() converts characters into HH:MM:SS format.
hm() converts characters into HH:MM format.
h() converts characters into HH format.

Lubridate Example 1

# When is International Women's day?
library(lubridate)
#pattern is year, month, day
ymd("20210308")
[1] "2021-03-08"
#pattern is year, day, month
ydm("20210803")
[1] "2021-03-08"
#pattern is month, day, year
mdy("03-08-2021")
[1] "2021-03-08"
#pattern is day, month, year
dmy("08/03/2021")
[1] "2021-03-08"
#all of them are converted to the YYYY-MM-DD format.

Lubridate Example 1 Continued

#if we thing that these strings are in year-month-day format,
#then put all of them in a standard year-month-day format.
x <- c(20090101, "2009-01-02", "2009 01 03", "2009-1-4",
       "2009-1, 5", "Created on 2009 1 6", "200901 !!! 07")
ymd(x)
[1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05" "2009-01-06"
[7] "2009-01-07"

Lubridate Example 2

  • A very similar technique can be used to parse times:
#there is no other functions such as smh() etc.
#other versions are hm(), h()
hms("20:30:10")
[1] "20H 30M 10S"

Lubridate Example 2 Continued

  • Now if we come across a variable that includes a date and time we can join relevant functions together with an ’_’ to parse a date-time.
ymd_hms("2006/03/19 20:30:10")
[1] "2006-03-19 20:30:10 UTC"
ydm_hms("2006/19/03 20:30:10")
[1] "2006-03-19 20:30:10 UTC"
mdy_hms("03/19/2006 20:30:10")
[1] "2006-03-19 20:30:10 UTC"
dmy_hms("19/03/2006 20:30:10")
[1] "2006-03-19 20:30:10 UTC"
#all of them are converted to the YYYY-MM-DD hh:mm:ss format.
  • The phrase Coordinated Universal Time (UTC) in the output represents the time-zone and by default should be UTC, this stands for universal time coordinated.

GET components of date-times

  • We can use helper functions to get a relevant component in data-time objects.
Function Description
date() Date component
year() Year component
month(, label) Month component
day() Day of month
week() Week of the year
wday(, label) Day of week
yday() Day of year
hour() Hour
minute() Minute
second() Second

Lubridate GET Example

womensday <- ymd("2021/03/08")
#replaces "/" with "-".
year(womensday)
[1] 2021
month(womensday)
[1] 3
week(womensday)
[1] 10
#note that Sunday is accepted as the first day of the week
wday(womensday)
[1] 2
wday(womensday, label=TRUE)
[1] Mon
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
yday(womensday)
[1] 67

Extract date and times

  • The lubridate also has a function to extract hours, minutes and seconds:
now() %>% hour()
[1] 20
now() %>% minute()
[1] 25
now() %>% second()
[1] 30.25963