A longitudinal dataset of five years of public activity in the Scratch online community

Benjamin Mako Hill, Andrés Monroy-Hernández

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Scratch is a programming environment and an online community where young people can create, share, learn, and communicate. In collaboration with the Scratch Team at MIT, we created a longitudinal dataset of public activity in the Scratch online community during its first five years (2007-2012). The dataset comprises 32 tables with information on more than 1 million Scratch users, nearly 2 million Scratch projects, more than 10 million comments, more than 30 million visits to Scratch projects, and more. To help researchers understand this dataset, and to establish the validity of the data, we also include the source code of every version of the software that operated the website, as well as the software used to generate this dataset. We believe this is the largest and most comprehensive downloadable dataset of youth programming artifacts and communication.

Original languageEnglish (US)
Article number170002
JournalScientific Data
Volume4
DOIs
StatePublished - Jan 31 2017
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Information Systems
  • Education
  • Computer Science Applications
  • Statistics, Probability and Uncertainty
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'A longitudinal dataset of five years of public activity in the Scratch online community'. Together they form a unique fingerprint.

Cite this