Status

Hack day project idea(s), inspired by the data science session this morning. Look at a random sample of comments across WordPress.com and…

  • Classify their content (e.g. how they’re responding to the post).
  • Do a topical classification of post content and compare against comment word count or frequency.
  • Calculate diversity of commenters for a site as a function of unique email addresses to number of comments.
  • Build a network graph indicating correlation between commenters across different sites.

The big takeaway: with any given dataset, play with visualizations first before trying to draw a conclusion.

Leave a Reply