FanData: Scraping Data from Fan Fiction Archives

15-20 Minute Paper

Abigail De Kosnik
Assistant Professor at the University of California

Andrea Horbinski
Ph.D. student in modern Japanese history at the University of California

Biographies
Abigail De Kosnik is an Assistant Professor at the University of California, Berkeley in the Berkeley Center for New Media (bcnm.berkeley.edu) and the Department of Theater, Dance& Performance Studies (tdps.berkeley.edu). She has published numerous essays on popular media, with a focus on digital culture.

Andrea Horbinski is a Ph.D. student in modern Japanese history at the University of California, Berkeley. She was a 2007-08 Fulbright Fellow to Japan, where she researched hypernationalist manga at Doshisha University in Kyoto. She has discussed fandom, anime, manga, and Japanese history and folklore at various conventions and conferences.

Abstract
In this paper, we will discuss our development of a tool called “FanData,” which is a Python-based meta-data web scraper of online archives of user-generated content, specifically fan fiction. Fan fiction is published primarily online, and many fan fiction websites are archives, specifically intended to preserve and make accessible fan fiction stories for future readers Fan fiction archives have existed on many platforms: newsgroups such as Usenet, mailing lists, discussion boards, custom websites, single-fandom databases, multi-fandom databases, social networks designed to facilitate blogging (such as LiveJournal and Dreamwidth), and social networks designed for microblogging (such as Tumblr and Twitter). We used FanData to analyze sites from each of these platform categories. Fan fiction archives are excellent examples of websites that are populated primarily by user-generated content, and we hope to continue developing FanData’s functionality so that it can allow scholars to scrape data from a wide variety of social media sites.