Markup normalization

A small selection of the Vim statements required to normalize every possible variant of shitty markup entered by copy and paste online editors into 35,000 articles over the last eight years:

:%s/,|||,/=nr2char(11)/g
:%s/,|||/=nr2char(11)/g
:%s/""/=nr2char(21)/g
:%s/"//g
:exe '%s/' . nr2char(11) . '/","/g'
:exe '%s/' . nr2char(21) . '/"/g'
:exe '%s/"$//g' # add a
:%s/^/"/g
:%s/<br /><p><br />/</p><p>/g
:%s/<p></p><p>/<p>/g
:%s/<BR><br />/<br />/g
:%s/<br /><p></p><p>/<p>/g
:%s/<p><br />/<p>/g
:%s/</p></p><p>/</p><p>/g
:%s/</p><br /><p>/</p><p>/g
:%s/<br /><p>/</p><p>/g
:%s/<p><p>/<p>/g
:%s/<P>/</p><p>/g
:%s/</p></p>/</p>/g
:%s/<br /></p><p>/</p><p>/g

Two more things: 1) Anyone who’s ever tried to tell you to use find and replace in bbEdit for large files is dead wrong. 2) College Publisher, you suck ****. ‘,|||,’ is not a valid delimiting character. Quit being malicious.

Lastly, if I’ve thought ahead, I would’ve tracked invalid markup against prevalence and date range. That would’ve made for a fascinating anthropological study.

#wcbos: Advanced Theme Performance Techniques

Frederick Townes is the founder of W3 Edge, CTO at Mashable and author of W3 Total Cache. He’s presenting today on WordPress theme performance best practices.First, he recommends contributing back to the WordPress Codex because everyone in the room thinks it could be improved.

Pay lots of attention to the hierarchy with page templates.

Think about how many files you’re loading into memory, and the overall footprint they end up consuming. You can track this down using xdebug.

Fundamentals:

  • The larger the heap, the greater the execution time.
  • “Graduate” groups functions to plugins.
  • The fewer files the better.
  • Explore and use microformats for reviews, businesses & organizations, products, and people.
  • Use external services and fail gracefully.

W3 Total Cache has a debug mode that will show you what’s being cached on a request and what’s being missed.

Trick to debug on production:

define( 'WP_DEBUG', true );
// log to wp-content/debug.log, useful tests on production
define( 'WP_DEBUG_LOG', true );
define( 'WP_DEBUG_DISPLAY', false );

#wcbos: Entreprise WordPress Do’s and Don’ts

What does enterprise mean? In the context of the WordPress presentation: sites on a large scale. Sites with a lot of traffic, content, and that require high availability.

WordPress evolution from an enterprise perspective:

  • 2.3 – Introduction of custom taxonomies
  • 2.9 – Introduction of custom post types; WordPress matures to a real CMS
  • 3.1 – Network admin and expanded queries
  • 3.2 – Modernization and performance improvements

Conde Nast started migrating a lot of sites from Movable Type to WordPress in 2008-2009, and the total number has only been growing.

Guidelines for using WordPress in enterprise

Hosting infrastructure do’s:

  • Carefully examine your site’s requirements and evaluate service offerings before deciding on a host
  • Give yourself at least 2 weeks for new WordPress VIP setups
    • This lead time requirement can sometimes be a deterrent for clients that want to get a project live on a quick turnaround
  • Give yourself additional time for VIP code and plugin reviews. Plugins that aren’t already in their set of accepted set can take a while
  • Leverage AMI’s for sites on Amazon Web Services
  • Use multiple regions for failover on Amazon Web Services
  • Use a Content Delivery Network (CDN)

Hosting don’t: Host multiple high-trafficked sites on the same hardware

Migration do’s:

  • Transfer your SEO juice using 301 redirects
  • Minimize the need for a double-publishing scenario

Migration don’t: Forget your image assets.

Neat trick: If you don’t know whether all of your image assets were copied over, write a script to tail Apache/Nginx request log, watch for 404s, and pull the image over from the old environment if the request 404’s.

Development do’s:

Development don’ts:

  • Modify WordPress core
  • Write your own SQL queries unless absolutely necessary
  • Forget about your admin users — use contextual help and train them

Launch do’s:

  • Lower DNS TTL settings before launch (if updating DNS address)
  • Apply appropriate CDN exceptions for wp-admin pages
  • Remove your robots.txt file to make the site visible to search engines
  • Verify server permissions on files and directories
  • Set up an automated deployment process

Launch don’t: Keep .htaccess writable

Resources on hardening WordPress

The case of the mysterious external services

There’s something wrong with this picture. And it has to do with “All External.”

We’re using a few different tools at the J-School to monitor performance and uptime of our webserver. Munin is one, Pingdom is another, a bash script running on cron is a third, and the fancy New Relic is the last. Two weeks ago, just as I’m leaving NYC to ski on the west coast, the bash script, which downloads the homepage of two sites every two minutes and greps the response, starts sending email notifications of response failures. Continue reading