Short version: Philip Guo and I wrote
a paper
about how a team of
data scientists built their own software tools for making, maintaining, and
iterating on an educational program they designed to help adults from
historically marginalized populations to start their data science careers.
The real story of this research project starts when I was sitting in the
audience at Shannon Ellis’
Design at Large talk. She was explaining
Chromebook Data Science
(CBDS), a
small miracle of a program that is offering paid educational opportunities to
adults in East Baltimore for training in data science. East Baltimore is a
historically marginalized neighborhood where the effects of
redlining can still be seen today. The goal of CBDS is to open up
educational and career pathways in a community where such opportunities,
especially in technical fields, have been historically scarce. Currently they
find interested adult students in East Baltimore by working with local
organizations like HEBCAC, an organization that helps
East Baltimore residents to further their education or find jobs. Each student
gets a tutor, they are given a Chromebook, and over the course of twelve weeks
they learn how to use their Chromebook to do data science and cloud computing
based work. After the program the CBDS team help place their students into
local data science jobs.
I got inspired during Shannon’s talk about this new program when I realized
that there were several stories
with research implications that were simultaneously unfolding as part of the
CBDS project. The central and most important story of course is the impact this program is having on the community it is serving, and the effect it will have
on the student’s careers. The team building CBDS is in the process of iterating
on their program while looking at the longitudinal outcomes of their students to
evaluate the extent to which CBDS is effective. I am optimistic that they will
have initial results to report on soon.
However, there is another story here about how the CBDS team was able to create
their program is such a short amount of time (three months), how they are able
to keep the course up to date despite how quickly data science technologies
are changing, and how they were able to quickly adapt course materials based on
feedback from their students. The CBDS team was made up of mostly data
scientists who engage in end user programming, a genre of programming
focused on using programming to achieve a goal, rather than to build a software
system. As data scientists, the CBDS team had years of combined experience
using programming to answer questions about biological and health data. To
build CBDS the team repurposed the tools of data science (and the spirit of
end user programming) to affect social change and to explore new methods of
producing an educational experience.
To investigate how this happened we interviewed everyone on the CBDS team,
asking them about the work they did for the project, and how they collaborated
with each other. It turns out that they already had many of the tools and skills
they needed to build this program from their experiences and expertise that
they had developed doing open and reproducible science. Almost all of the
course materials were written in R Markdown - the most popular computational
notebook for R programmers. R Markdown is a plain text format, therefore
the team could use collaboration and versioning tools like Git and GitHub.
Several members of the CBDS team routinely write academic papers, blogs, and
even interactive dashboards using R Markdown, so it was natural for them to
extend this format to creating courses.
To transform these R Markdown files into the artifacts the team needed for
teaching a course the CBDS team partnered with Leanpub,
an eBook publisher, to launch a new online course platform. This platform can
ingest modified Markdown files to create an online course structure with
individual lessons and assessments. Data science is a rapidly changing field,
so keeping data science courses up to date is usually challenging. Online
course providers like Coursera only allow instructors to change the content of
their course through a web interface, so making lots of changes to course has
been historically difficult. This difficulty is especially frustrating in data
science courses which can depend heavily on a few software libraries. If the
API for one of those libraries changes, then updating the entire course can be
extremely tedious. However, since all of the course materials for CBDS are based
in plain text documents like R Markdown, updating materials to incorporate
changing technologies is significantly easier and closer to the workflows the
CBDS team uses in the rest of their work.
Videos and video lectures are a central component of many online courses, and
the CBDS team confronted many of the same challenges with video content as they
did with the rest of the materials in their course. Whenever the API for a
software library would change, they would need to re-shoot and re-edit the
video, a process that consumed valuable time and resources. In order to keep
their videos up-to-date and decrease the amount of time required to produce each
video they developed Ari, a software
library for R that allowed them to write
R Markdown documents that could be compiled into videos. To create a video
using Ari, a team member would create a series of lecture slides using R
Markdown. They would then write a script that would be read over each slide
using text-to-speech technology that Ari could access. Ari would then stitch
together the narration audio file and the series of slide images into a video.
Since the source material for all of the course videos were plain text files,
changes to videos could be tracked and videos could be created collaboratively
using Git, GitHub, and the rest of the CBDS team’s typical workflow.
To manage and test all of these materials and digital artifacts that were being
created for the course, the CBDS team developed
Didactr, another R library that
tested and packaged all of their materials and videos. The CBDS team was
accustomed to the data processing pipelines common in genomics and
bioinformatics research, and Didactr served as a comparable piece of
pipeline-like software for proofing and publishing their online courses.
The sum of this entire process suggests the emergence of new technologies for
maintaining educational materials which I have been thinking of as “EdDevOps”
or DevOps for education. Traditional DevOps (which is still a relatively
new field) combines software development practices with the operational and
deployment requirements that are necessary for running that software. EdDevOps
is similarly concerned with development, deployment, and maintenance, however
applied to educational materials and specifically materials that address
rapidly changing fields like data science. In the future instead of authors and
instructors needing to painstakingly revise course materials, this process
could be significantly aided by software that is empowered to update materials
automatically.
End user programming means using programming to achieve a specific goal, and in
the case of the CBDS team this goal was to affect social change in their local
community through creating new educational programs and career opportunities.
CBDS is of course not the first effort towards social change driven by end user
programming. Data for Black Lives is one example
of how data sharing and analysis can be used to empower social change. The case
of CBDS is remarkable however because of how many tools and paradigms that were
already available to the CBDS team could be repurposed to meet their new
socially focused goal. This case study opens up questions about who is in the
best position to create opportunities like CBDS, and how end user programming
can be an empowering force to this end.
Short version: Philip Guo and I wrote
a paper
about how a team of
data scientists built their own software tools for making, maintaining, and
iterating on an educational program they designed to help adults from
historically marginalized populations to start their data science careers.
The real story of this research project starts when I was sitting in the
audience at Shannon Ellis’
Design at Large talk. She was explaining
Chromebook Data Science
(CBDS), a
small miracle of a program that is offering paid educational opportunities to
adults in East Baltimore for training in data science. East Baltimore is a
historically marginalized neighborhood where the effects of
redlining can still be seen today. The goal of CBDS is to open up
educational and career pathways in a community where such opportunities,
especially in technical fields, have been historically scarce. Currently they
find interested adult students in East Baltimore by working with local
organizations like HEBCAC, an organization that helps
East Baltimore residents to further their education or find jobs. Each student
gets a tutor, they are given a Chromebook, and over the course of twelve weeks
they learn how to use their Chromebook to do data science and cloud computing
based work. After the program the CBDS team help place their students into
local data science jobs.
I got inspired during Shannon’s talk about this new program when I realized
that there were several stories
with research implications that were simultaneously unfolding as part of the
CBDS project. The central and most important story of course is the impact this program is having on the community it is serving, and the effect it will have
on the student’s careers. The team building CBDS is in the process of iterating
on their program while looking at the longitudinal outcomes of their students to
evaluate the extent to which CBDS is effective. I am optimistic that they will
have initial results to report on soon.
However, there is another story here about how the CBDS team was able to create
their program is such a short amount of time (three months), how they are able
to keep the course up to date despite how quickly data science technologies
are changing, and how they were able to quickly adapt course materials based on
feedback from their students. The CBDS team was made up of mostly data
scientists who engage in end user programming, a genre of programming
focused on using programming to achieve a goal, rather than to build a software
system. As data scientists, the CBDS team had years of combined experience
using programming to answer questions about biological and health data. To
build CBDS the team repurposed the tools of data science (and the spirit of
end user programming) to affect social change and to explore new methods of
producing an educational experience.
To investigate how this happened we interviewed everyone on the CBDS team,
asking them about the work they did for the project, and how they collaborated
with each other. It turns out that they already had many of the tools and skills
they needed to build this program from their experiences and expertise that
they had developed doing open and reproducible science. Almost all of the
course materials were written in R Markdown - the most popular computational
notebook for R programmers. R Markdown is a plain text format, therefore
the team could use collaboration and versioning tools like Git and GitHub.
Several members of the CBDS team routinely write academic papers, blogs, and
even interactive dashboards using R Markdown, so it was natural for them to
extend this format to creating courses.
To transform these R Markdown files into the artifacts the team needed for
teaching a course the CBDS team partnered with Leanpub,
an eBook publisher, to launch a new online course platform. This platform can
ingest modified Markdown files to create an online course structure with
individual lessons and assessments. Data science is a rapidly changing field,
so keeping data science courses up to date is usually challenging. Online
course providers like Coursera only allow instructors to change the content of
their course through a web interface, so making lots of changes to course has
been historically difficult. This difficulty is especially frustrating in data
science courses which can depend heavily on a few software libraries. If the
API for one of those libraries changes, then updating the entire course can be
extremely tedious. However, since all of the course materials for CBDS are based
in plain text documents like R Markdown, updating materials to incorporate
changing technologies is significantly easier and closer to the workflows the
CBDS team uses in the rest of their work.
Videos and video lectures are a central component of many online courses, and
the CBDS team confronted many of the same challenges with video content as they
did with the rest of the materials in their course. Whenever the API for a
software library would change, they would need to re-shoot and re-edit the
video, a process that consumed valuable time and resources. In order to keep
their videos up-to-date and decrease the amount of time required to produce each
video they developed Ari, a software
library for R that allowed them to write
R Markdown documents that could be compiled into videos. To create a video
using Ari, a team member would create a series of lecture slides using R
Markdown. They would then write a script that would be read over each slide
using text-to-speech technology that Ari could access. Ari would then stitch
together the narration audio file and the series of slide images into a video.
Since the source material for all of the course videos were plain text files,
changes to videos could be tracked and videos could be created collaboratively
using Git, GitHub, and the rest of the CBDS team’s typical workflow.
To manage and test all of these materials and digital artifacts that were being
created for the course, the CBDS team developed
Didactr, another R library that
tested and packaged all of their materials and videos. The CBDS team was
accustomed to the data processing pipelines common in genomics and
bioinformatics research, and Didactr served as a comparable piece of
pipeline-like software for proofing and publishing their online courses.
The sum of this entire process suggests the emergence of new technologies for
maintaining educational materials which I have been thinking of as “EdDevOps”
or DevOps for education. Traditional DevOps (which is still a relatively
new field) combines software development practices with the operational and
deployment requirements that are necessary for running that software. EdDevOps
is similarly concerned with development, deployment, and maintenance, however
applied to educational materials and specifically materials that address
rapidly changing fields like data science. In the future instead of authors and
instructors needing to painstakingly revise course materials, this process
could be significantly aided by software that is empowered to update materials
automatically.
End user programming means using programming to achieve a specific goal, and in
the case of the CBDS team this goal was to affect social change in their local
community through creating new educational programs and career opportunities.
CBDS is of course not the first effort towards social change driven by end user
programming. Data for Black Lives is one example
of how data sharing and analysis can be used to empower social change. The case
of CBDS is remarkable however because of how many tools and paradigms that were
already available to the CBDS team could be repurposed to meet their new
socially focused goal. This case study opens up questions about who is in the
best position to create opportunities like CBDS, and how end user programming
can be an empowering force to this end.