Hack day project idea(s), inspired by the data science session this morning. Look at a random sample of comments across WordPress.com and…
- Classify their content (e.g. how they’re responding to the post).
- Do a topical classification of post content and compare against comment word count or frequency.
- Calculate diversity of commenters for a site as a function of unique email addresses to number of comments.
- Build a network graph indicating correlation between commenters across different sites.
The big takeaway: with any given dataset, play with visualizations first before trying to draw a conclusion.