If you like my WordPress work, check out my new plugin, Bylines. Thanks!
In the first Carnival of Journalism of the new year, David Cohn asks: How do we increase the role of higher education as hubs of journalistic activity?
First, the why. Educational institutions often have long-standing ties to a local community, both in terms of physical location as well as relationships. In New York City, there are families with multiple generations who have attended CUNY. Educational institutions are also in a unique position where they have access to continually fresh human capital. These are the strategic advantages.
As to the how, there are dozens of projects we could embark on. For instance, we could team with computer science students build a tool that maps a community’s information needs. Or we could offer low-cost multimedia reporting courses to active community members in hopes they will take the initiative to cover their own neighborhoods. Or we could reorient the entire institution to be a working newsroom and task hundreds of students as boots-on-the-ground reporters.
Before going off the deep-end, it’s critically important to ground big ideas in today’s reality. Educational institutions are cautious publishers because of libel possibility. Workshops require space and security staff to check people in. More often than not, their technology departments are optimized for providing IT support, not innovating with content management systems. Once you have this valuable context, it’s more straight-forward how you can make legitimate change. We can hack technology and we can hack systems.
How you frame the problem also determines the success of your solution. In my mind, “hubs of journalistic activity” provide a space for communities to: access accurate and impartial information, learn about topics affecting their civic well-being, and make enlightened decisions. These topics range from transportation to education to environment to governance. For many institutions of higher education, the local community is where they can have the most significant impact.
The Locals: East Village and Fort Greene-Clinton Hill
Let’s use data and interviews to paint a picture of university-sponsored hyperlocal journalism as it currently exists in New York City. As examples, look at The Local East Village (LEV) and The Local Fort Greene-Clinton Hill (FGCH) during November 2010. The LEV is an operation run out of NYU, has content contributed to it by roughly 35 students from an elective class called “The Hyperlocal Newsroom,” and is edited full-time by Rich Jones. Kim Davis was the community editor in November. The FGCH website has been a CUNY Graduate School of Journalism project since January 2010 and is managed by two adjunct faculty, Annaliese Griffin and Indrani Sen. Two smaller classes participated in creating content for the website.
Both publications held weekly budget meetings last November to discuss story ideas generated from emails from the community, press releases, watching other local news operations, and general brainstorming. When not in the same physical space, communicating the editorial workflow included email, Google Chat, text and phone. Breaking news generally received a same-day turnaround while other pieces like features and multimedia had a longer production cycle.
Just by the numbers, the LEV, in its 3rd month of operation, published 100 posts from 48 authors totaling 46,289 words. Rich Jones feels this is “roughly 70 to 80 percent” of the story ideas they generated. 33 posts were from 19 community contributors and their posts were 369 words on average.
While trying to cover everything, Rich explains the “challenge for students is working around their class schedules for story assignments and filing stories” and the “challenge regarding the community is helping them become accustomed to professional-level standards that some – through not fault of their own – are largely unfamiliar with.” He also says “feature stories are most likely to be picked up by contributors. The most difficult stories appear to be hard news pieces (fires, crime, etc.) which involve multiple sources and contacting the authorities such as police and fire officials.”
The most discussed LEV stories based on comments were “Examining M15 Bus Line Changes“, and a profile of John Penley, both which received 5 comments. On average, posts had 0.8 comments a piece. The Local East Village had a grand total of 80 unique commenters.
To compare, in November 2010, FGCH published 105 posts from 46 authors totaling 46,945 words. Annaliese Griffin thinks this to be “maybe 75 percent” of the story ideas they generated, but explains it “always feels like there’s so much more we’re not covering.” 36 posts were from 23 community contributors and their posts were, surprisingly, 610 words on average.
When asked about the successes and challenges of working with students and community contributors, Annaliese explains:
Scheduling is the big problem with students. They have to balance Local assignments with stories for three other classes. You can’t just say, “This event is happening today, you have to cover it.” It’s always a negotiation, which ties your hands as an editor. The successes have been many, though. Seeing students gain story sense and understand not only what needs to be covered, but how best to cover a community has been great. Also, we worked on several crowdsourcing projects that were fun and innovative. Here’s a post outlining my favorite student work from last semester. With community contributors we’ve found that straight news, especially crime and anything dealing with city agencies is a challenge. Reporting is hard work that takes skill that needs to be developed over years, not afternoons. We’ve gotten great features and opinion pieces from community contributors, and our schools coverage is about half reported by students, half community contributors and concerned parents. It’s a nice mix.
She adds “schools, arts, and local going-ons” are good fodder for community contributions, and that they want The Local to be the communication platform for micro concerns like “cracked sidewalks, trash pick up, noise complaints, [and] great art programs in schools.”
The most discussed stories based on comments were “IMHO: Don’t Leave District 13, Stay and Help” (71 comments), “Opinion: Let’s Work Together, Parents” (45 comments), and “Moe’s Bar to Close in February” (32 comments). On average, posts had 4.5 comments a piece. The Fort Greene-Clinton Hill website had a grand total of 303 unique commenters.
On the technology side, both sites are on the NY Times’ blog network and running WordPress 2.9.2. Images and photo galleries are managed by Flickr.
Grand ideas and incremental improvement
Innovation is possible, it just takes hard work, persistence, and meta-innovation.
For example, Edit Flow and Assignment Desk are a couple of projects I’ve been helping develop for the last year. They work together. Edit Flow manages your editorial workflow inside of WordPress with custom workflow statuses, editorial commenting only accessible in the admin, and a story budget to view all upcoming content. Assignment Desk allows community members to pitch story ideas, vote and volunteer for them, and then lets the editor assign volunteers to different roles in a story. Both are open source WordPress plugins anyone can install.
Jay, Erik, Matt, and I have been trying to get these plugins deployed on the NY Times servers since the launch of The Local East Village. The project has the hallmarks of collaborative innovation. Jay, Erik, and Matt are at NYU and I’m at CUNY. Both of The Locals would benefit, and other news organizations can adopt future improvements we make based on feedback from Rich, Annaliese, and Indrani. Unfortunately, for a variety of reasons, the plugins still aren’t installed (although I have my fingers crossed it will be any week now).
Tremendous raw potential exists at educational institutions. I’d personally love to evolve Assignment Desk so that it’s the tool to go to when you need to find someone to cover a story. It could build suggestions based on comment quality, past post topics, or what you’ve filled out in your bio, and integrate with Google Chat and Foursquare to determine your prospect’s availability. The technology can be solved with hard work. We need more people who are persistent and can find new ways to make large elephants dance.
Once I established the most effective way, collecting the data was actually quite simple. WordPress usually has a URL structure that supports filtering posts by date. Then, for any given URL, you can access an RSS feed for all of the posts on that date by appending “/feed/”.
The script I wrote generated day URLs for the entire month (e.g. http://eastvillage.thelocal.nytimes.com/2010/11/28/), downloaded the RSS feed, and parsed out the permalink and post ID for each post. MagpieRSS is a simple PHP-based RSS parser for the job.
Using the permalinks identified from the RSS feed, I used another tool called PHP Simple HTML DOM Parser to download the entire HTML page for the post, and then parse out the data points I needed, including author, author type, body content, post category and tags. Once I had the data, I saved it to the database.
If the site you’re scraping gives you full body content in the feed, you can probably skip the second step.
The only downside to accessing the original permalinks by RSS is that WordPress offers 10 items per feed by default, and doesn’t support pagination. Most admins don’t change this. If there were more than 10 posts published in a day, you’d miss data. Similarly, if there are more than 10 comments, you only get the first 10. Because of this, my data set only has 10 of N comments for several of the FGCH posts.
There are a few things I’d like to have tabulated if I had more time (I’ve already spent 20+ hours on this project). First, I’d go through each post and map it to a narrow set of topical categories. These categories would be the types of information needed by the given community (e.g. entertainment, governance, education, etc. ). Second, it would be amazing pull out all of the links in the body content, sort them by internal vs. external, and use that information to determine which other publications were most commonly linked to. Lastly, and this might even be a standalone WordPress plugin, I’d love to generate a network diagram of which tags were used on which posts, and use that to draw relationships.