Pipe Dream in numbers. 7,763 articles containing 4,136,279 words written by ~380 authors in the BU Pipe Dream’s export from College Publisher.
Tag Archives: College Publisher
Markup normalization
A small selection of the Vim statements required to normalize every possible variant of shitty markup entered by copy and paste online editors into 35,000 articles over the last eight years:
:%s/,|||,/=nr2char(11)/g
:%s/,|||/=nr2char(11)/g
:%s/""/=nr2char(21)/g
:%s/"//g
:exe '%s/' . nr2char(11) . '/","/g'
:exe '%s/' . nr2char(21) . '/\"/g'
:exe '%s/"$//g' # add a
:%s/^/"/g
:%s/<br /><p><br />/</p><p>/g
:%s/<p></p><p>/<p>/g
:%s/<BR><br />/<br />/g
:%s/<br /><p></p><p>/<p>/g
:%s/<p><br />/<p>/g
:%s/</p></p><p>/</p><p>/g
:%s/</p><br /><p>/</p><p>/g
:%s/<br /><p>/</p><p>/g
:%s/<p><p>/<p>/g
:%s/<P>/</p><p>/g
:%s/</p></p>/</p>/g
:%s/<br /></p><p>/</p><p>/g
Two more things: 1) Anyone who’s ever tried to tell you to use find and replace in bbEdit for large files is dead wrong. 2) College Publisher, you suck ****. ‘,|||,’ is not a valid delimiting character. Quit being malicious.
Lastly, if I’ve thought ahead, I would’ve tracked invalid markup against prevalence and date range. That would’ve made for a fascinating anthropological study.
Daily Emerald relaunches on WordPress
Mad props to Ivar Vong and the rest of his team who finally pulled it off. They did the switch to “web-first” too. It’s been a long time in the making.
Q&A: CMN’s Rusty Lewis and Jon Beck about new advertising options for College Publisher
Q&A: CMN’s Rusty Lewis and Jon Beck about new advertising options for College Publisher. CMN’s new managed WordPress offering is required to use their advertising software, ultimately meaning they still take a cut of the overall revenue.
Q&A: Rusty Lewis on CMN’s new business model
Q&A: Rusty Lewis on CMN’s new business model. It just hit me: College Publisher inadvertently made it cost-effective to hire a developer and host it yourself. Student publications who don’t, and instead pay $2K/year for a terrible CMS while also donating their advertising revenue to CMN, aren’t long for this world. I can’t believe College Publisher would stick this to 80% of their clients.
Licensing Fees to Begin in 2011
Licensing Fees to Begin in 2011. Awesome timing: College Publisher announces three new services, including managed WordPress hosting for $4,500/year, and that it will charge $2k/year for the basic Polopoly offering to those under 25,000 page views a month. Absolutely fascinating.
Interview with Rusty Lewis on sale of College Media Network
Interview with Rusty Lewis on sale of College Media Network. Bryan Murley gets the details on the recent transfer of ownership to Access Networks. It sounds like time to delivery was a major friction point at MTVu, and this will enable them to be more nimble.
Teasing WordPress posts with YQL, jQuery, JSONP and iframes
Teasing WordPress posts with YQL, jQuery, JSONP and iframes. In Ivar Vong’s first post for the Daily Emerald web development blog, he discusses an ingenious workaround for bringing outside content to a College Publisher homepage. Using YQL for external data could be an intriguing user experience experiment. The CUNY J-School homepage currently uses Simple Pie to pull at least six RSS feed and, when the cache is being refreshed, the page load time can spike.
Internet famous
Money quote on Romenesko today from a Chronicle article covering the techies doing it in-house.
College Publisher to WordPress conversion script is now open source
Alternate title for this post: Let the exodus continue. The Python conversion script CoPress used to migrate over 50 student publications to the glorious free and open source WordPress is now itself licensed under GPL version 2. It’s optimized for College Publisher 4 and College Publisher 5 databases, but will also work with most any database you can turn into a flat CSV file. You can fork it on Github or download the brand new 1.0 release.
Right off the bat, I’d like to say that the most awesome bit about the conversion script is its ease of use. Granted, you do have to run it on the command line and it does often throw mythical, unintelligible errors if your data is screwy, but it’s about 100 to 1,000 times easier than what Sean Blanda or Brian Schlansky had to go through. Furthermore, it spits out WordPress eXtended RSS files that WordPress imports natively. Depending on the size of your archives, you could even do the entire migration in less than a half hour.
There are detailed instructions in the README I encourage you to read thoroughly but, in screenshots, here’s how you’d migrate your site.

Backup your database using Sequel Pro. This is a critically important step, as you’ll definitely want a clean version to revert to if the import goes awry.

Place the conversion script and your archives in a folder you can access from the command line. Both College Publisher 4 and College Publisher 5 migrants should receive an articles file that will need to be renamed “stories.csv.” Publications migrating from the former will have all of their image references stored in a file that will need to be renamed “media.csv.” Navigate to that directory from your terminal prompt and run “python CoPress-Convert.py.”

Once the script is running, you’ll be asked a series of questions to configure the conversion process. Most options are self-explanatory, and all are explained fully in the README file packaged with the script. The most important thing I’d like to note in this post is that, unless you have less than 500 authors in your archives, I’d highly, highly recommend importing your authors as custom fields instead of users. WordPress is not optimized to add a large number of new users through its import process. We learned this the hard way migrating CM Life‘s database last summer.

When the script is done, you’ll have a series of WordPress eXtended RSS files you can easily upload into WordPress.
Mad props go to Miles Skorpen for the long hours he spent on the conversion script, and to Albert Sun, Will Davis, and Max Cutler for their later contributions.
Feel free to send along any suggestions for improvement, bugs, fixes or general comments. I intend to maintain it for the indefinite future, it’s good Python practice when everything else I’m working on is PHP, but code contributions are always welcome. There is a short list of upgrades under consideration in the top of the script.

