Xkcd Data Visualization

Feature Distribution Menu

{{num_features}} Features

Selected Features:

comics containing multiple of the selected features:

Feature Distribution

TFIDF of Top Words in Selected Comics

Picked (1) Selected () All ({{num_comics}})

TFIDF to LSA to TSNE Comic Relations

Pick Comic 1-{{num_comics}}

Picked Comic Info

221: Random Number

Alt Text

Process

scrape comics transcript, title, and alt-text from explainxkcd.com
clean data by removing domain-specific stop-words (e.g. character names), lemminize and stem words (e.g. “chocolates”, “chocolatey”, “choco” all count as the root word, “chocolate”)
represent each of the {{num_comics}} comic as a {{num_features}}-dim text vector of term-frequency x inverse document frequency (tf-idf) scores
reduce the effects of synonymy and polysemy, and reduce the feature space from 7000 unique words to 50 feature by perform latent semantic analysis with truncated svd
create 2d embedding of document relations with t-sne that shows similar comics located closer together
build interactive, reactive data analysis web application with d3.js, bootstrap, and flask

Background Information on Xkcd Comics

xkcd comics are "A webcomic of romance, sarcasm, math, and language" (xkcd slogan). These comics are licensed under a Creative Commons Attribution-NonCommercial 2.5 License, and their transcripts are available on www.explainxkcd.com (xkcd comic's wiki). This web application uses the first {{num_comics}} comics as data source.