BCNI Philly: GitHub for news

Greg Linch led a 3 pm session in room 4 on applying GitHub (and Git) to news. Andrew Spittle and I collaborated on live Google Doc notes. Here’s Greg’s previous blog post on the topic.

GitHub is a social coding site. You can host your code in a repository, fork others’ code, merge others’ code back into yours, and get social. GitHub has one of the best interfaces for accessing various facets of a code project.

Version control is similar to Wikipedia in that you can see all of the changes that have been made. Seeing revision history in Wikipedia makes the text, and people contributing text, more trustworthy.

“Version control is your safety blanket.” If your screw something up, you can easily revert back to a previous version and recover the content.

Greg presented a few ideas for Git in a news setting. It would be tied into your CMS, have a  simple interface, and not necessarily rely upon code knowledge to use. This opens it up to non-coders as contributors.

Flashbake is “source control for ordinary people.” From Cory Doctorow’s announcement post:

This is a set of Python scripts that check your hot files for changes every 15 minutes, and checks in any changed files to a local git repository. Git is a free “source control” program used by programmers to track changes to source-code, but it works equally well on any text file. If you write in a text-editor like I do, then Flashbake can keep track of your changes for you as you go.

I was prompted to do this after discussions with several digital archivists who complained that, prior to the computerized era, writers produced a series complete drafts on the way to publications, complete with erasures, annotations, and so on. These are archival gold, since they illuminate the creative process in a way that often reveals the hidden stories behind the books we care about. By contrast, many writers produce only a single (or a few) digital files that are modified right up to publication time, without any real systematic records of the interim states between the first bit of composition and the final draft.

Copyright and ownership over data becomes a concern when introducing ideas like forking and merging stories. Whose story is it? How do certain changes reflect back upon the organization that created the story?

Forking is a Feature” (Anil Dash):

There are several related technical concepts that can answer to the name “fork”, but the one I reference here is the dramatic moment when a software project undergoes a schism on ideological or technical grounds. Instead of merely taking their ball and going home, those who forked were taking a copy of your ball and going to a new playground. And while splitting a community could obviously cause an open source community’s momentum to grind to a halt, even the mere threat of a fork could cause significant problems, by revealing conflicting goals or desires or motivations within a previously-united community.

Starting with the “talk to me like I’m stupid” approach lets the building blocks be put in place so that the story can be expanded going forward. Different branches and layers of a story can then be overlaid to provide additional context.

Episodic updates to a story can be left as separate branches or merged back into the trunk to tell a more complete story.

On a technical note, Git is much nicer than Subversion because it has a much better system for managing contributions. Submitting a pull request is easier than creating a patch file and having to email it somewhere, upload it to Trac, etc.

Question from Zach Seward: What type of story works best for this concept?

Albert Sun thinks breaking news might be a good candidate for this approach, as there’s often conflicting information, quick corrections, etc. When the story is changing on a regular basis, readers want to see what has changed and why.

Via Greg, Marginalia is a tool for visualizing historical information about a Wikipedia article. It also seems to be an open source, Javascript-based web annotation system.

There is a WordPress plugin for publicly displaying the revision history of a post. Scott Rosenberg supported its development and wrote about it last August.

Andrew Nacin brings up the concept of licensing. The big problem is media companies are largely protective of and restrictive with their content. They want to maintain full ownership over it. Users contributing changes back to the original document is different than a user contributing back to WordPress because the licenses are completely opposite.

Leave a Reply