Doom Haikus: Machine Learning, Poetry and Despair

Developing a dataset and data explorer for machine summarization and haiku language models
Screenshot from Doom Haikus data exploration app

Prompt

How might we develop a training dataset for machine summarization (and cope with the worst year in modern human history)?

Background

The project started with a seemingly simple goal: teaching Tensorflow to write haikus about current events. Since poetry is subjective, there aren’t straightforward objective functions to quantify good haikus, so we’d need some actual haikus as training data (i.e. self-supervised is out).

Challenges

  • 2020 had no shortage of news to seed moody poetry, but at the time no such news → haiku datasets existed.
  • There are a number of ways to gather up pre-written text in the shape of a haiku (5,7 and 5 syllables), but these would surely lack the spirit of a real haiku (much less the kireji).

Insights

When you need repetitive and weird things done by humans on the internet, there’s only one place to turn: Mechanical Turk.

Doom Haikus data-exploration app (javascript, react, d3)

Solutions

  • Developed a data pipeline to extract top posts from popular news-related subreddits, scrape the referenced URL to develop a prompt, submit custom “human intelligence tasks” (HITs) to Mechanical Turk, then pull results, clean and store them. (Technology was Google Cloud Run, Firestore, Clojure)
  • Developed a supplemental database of word word syllables that don’t appear in canonical sources like CMU’s Pronunciation Dictionary
  • Automated the review process for the majority of submissions, built tooling to facilitate quick manual review of exceptions
  • Data Product Prototyping. Developed a javascript prototype app to explore haikus, combining instant, client-side search with a custom canvas data visualization, allowing users to read individual haikus but also get a sense for overall news trends. (Technology was React, D3, Canvas)

Results

  • From March - December 2020, ~2,000 people on Mechanica Turk submitted ~2,700 hand-written haikus about the news events of the day.
  • Launched https://doomhaikus.3iap.com in December 2020.
  • Project / dataset were featured on Data Is Plural, Nightingale, Product Hunt (home page), Boing Boing and AVClub.
Doom Haikus was featured in Data Is Plural, Nightingale, Product Hunt, BoingBoing and AVClub

a curious guinea pig

Would you like to be a guinea pig?

Join 3iap’s mailing list for early access to the latest dataviz research, writing, and experiments.

No guinea pigs (or humans) have been harmed in the course of 3iap’s research, writing, or experiments.