23 May '22
Pulling apart Zoom attendance csv dumps in tidy R
My team ran some Zoom training last week and today I needed to figure out who actually attended across all the days, and for how long.
Zoom can give you a csv dump of all attendees but doesn’t provide the aggregations I was after, so I hacked up a little script (in tidy R) to do it. If you ever want to do something similar, feel free to use it (MIT Licence).
read_zoom_attendance_csv = function(filename){
read_csv(filename, show_col_types = FALSE) %>%
# I only needed the date, not the actual start time, so I didn't bother parsing
# the full datetime + timezone string that Zoom gives
mutate(date = parse_date(str_sub(`Join Time`, end = 10), format = "%m/%d/%Y")) %>%
# this isn't necessary, but I like shorter column names
rename(name = `Name (Original Name)`, email = `User Email`, duration = `Duration (Minutes)` ) %>%
# I'm only intrested in these columns
select(name, email, date, duration) %>%
# this isn't necessary, but handy if individuals have signed in with
# slightly different names on different days (requires eyeballing the data)
mutate(name = recode(name,
"JS" = "Joanna Smith",
"Louise" = "Louise Jones"
}
Then you can read the Zoom csv file like so:
df = read_zoom_attendance_csv("zoom-call.csv")
And to visualise in ggplot2 (which was my reason for using R in the first place) you could try something like:
df %>%
group_by(name, date) %>%
# this was the key summarisation I was after - total time
# in-call across multiple connects/re-connects
summarise(duration = sum(duration)) %>%
ungroup() %>%
ggplot(aes(name, duration)) +
geom_col() +
coord_flip() +
facet_wrap(~date, nrow = 1) +
labs(
title = "Zoom call attendance",
x = "participant name",
y = "duration on the call (minutes)"
)
Enjoy!