Securing Input

Each time a user submits data to WordPress, or data is ingested from an external feed, or data generally comes from an external source, you should make sure it’s safe to handle. You want to make sure the data is safe for a variety of reasons; to help prevent XSS if the data is improperly escaped on output, and to ensure your code is executing how you expect are two good reasons. You can make sure this data is safe to use by validating and sanitizing.

If you learn anything from this document, pay attention to you’re handling those $_GET and $_POST variables!

Validating: Checking User Input

Validating is ensuring the data you’re receiving from a user matches what you expect to receive. Let’s take a look at an example.

Say we have an input area in our form like this:

<input type="text" id="my-zipcode" name="my-zipcode" maxlength="5" />

We’re telling the browser to only allow up to five characters of input, but there’s no limitation on what characters they can input. They could enter “11221” or “eval(“. We want to make sure we’re only processing zip codes from the form.

This is where validation plays a role. When processing the form, we’ll write code to check each field for its proper data type. If it’s not of the proper data type, we’ll discard it. For instance, to check “my-zipcode” field, we might do something like this:

$safe_zipcode = intval( $_POST['my-zipcode'] );
if ( ! $safe_zipcode )
    $safe_zipcode = '';

if ( strlen( $safe_zipcode ) > 5 )
    $safe_zipcode = substr( $safe_zipcode, 0, 5 );

update_post_meta( $post->ID, 'my_zipcode', $safe_zipcode );

 

Since the maxlength attribute on our input field is only enforced by the browser, we still need to validate the length of the input on the server. If we don’t, an attacker could cleverly submit a form with a longer value.

The intval() function casts user input as an integer, and defaults to zero if the input was a non-numeric value. We then check to see if the value ended up as zero. If it did, we’ll save an empty value to the database. Otherwise, we’ll save the properly validated zipcode.

This style of validation most closely follows WordPress’ whitelist philosophy: only allow the user to input what you’re expecting.

Sanitizing: Cleaning User Input

Sanitizing is a bit more liberal of an approach to accepting user data. We can fall back to using these methods when there’s a range of acceptable input.

For instance, if we had a form field like this:

<input type="text" id="title" name="title" />

We could sanitize the data with the sanitize_text_field() function:

$title = sanitize_text_field( $_POST['title'] );
update_post_meta( $post->ID, 'title', $title );

Behinds the scenes, sanitize_text_field() does the following:

  • Checks for invalid UTF-8.
  • Converts single characters to entity.
  • Strips all tags.
  • Remove line breaks, tabs and extra white space.
  • Strip octets.

The sanitize_*() class of helper functions are super nice for us, as they ensure we’re ending up with safe data and require minimal effort on our part:

  • sanitize_email()
  • sanitize_file_name()
  • sanitize_html_class()
  • sanitize_key()
  • sanitize_meta()
  • sanitize_mime_type()
  • sanitize_option()
  • sanitize_sql_orderby()
  • sanitize_text_field()
  • sanitize_title()
  • sanitize_title_for_query()
  • sanitize_title_with_dashes()
  • sanitize_user()
  • esc_url_raw()
  • wp_filter_post_kses()
  • wp_filter_nohtml_kses()

Conclusion

Any time you’re using potentially unsafe data, it never hurts to validate and sanitize it. Validating is confirming the data is what you expect it to be. Sanitization is a more liberal approach to cleaning your data.

Escaping Output

Every time a post title, post meta value, or some other data from the database is rendered to the user, we need to make sure it’s properly escaped. Escaping helps us prevent issues like malformed HTML or the dreaded cross-site scripting attack.

For ease of code review, it’s best to escape as late as possible. WordPress’ escaping functions have low overhead, so there’s no performance penalty to using them as many times as you need to.

Escaping: Securing Output

WordPress thankfully has a few helper functions we can use for most of what we’ll commonly need to do:

esc_html() we should use anytime our HTML element encloses a section of data we’re outputting.

<h4><?php echo esc_html( $title ); ?></h4>

esc_url() should be used on all URLs, including those in the ‘src’ and ‘href’ attributes of an HTML element.

<img src="<?php echo esc_url( $great_user_picture_url ); ?>" />

esc_js() is intended for inline Javascript.

<a href="#" onclick="<?php echo esc_js( $custom_js ); ?>">Click me</a>

esc_attr() can be used on everything else that’s printed into an HTML element’s attribute.

<span class="<?php echo esc_attr( $my_class ); ?>">

It’s important to note that most WordPress functions properly prepare the data for output, and you don’t need to escape again.

<h4><?php the_title(); ?></h4>

Conclusion

Whenever you’re rendering data from the database, you’ll want to make sure it’s properly escaped. Escaping helps prevent issues like cross-site scripting.

Informal VIP client survey: how do you commit to SVN?

I am currently a single point of failure for getting code from our Github repo to WordPress.com VIP SVN. As such, we (Fusion) are exploring a project to auto-deploy our Github repository to VIP SVN through post-CI middleware. But, before we dive into development, we want to make sure we’ve exhausted all lower-effort options.

How does your code get from Github to VIP SVN? Would you potentially want ot use our project? Please let me know with a comment — thanks!

Two proposed sessions for SRCCON 2015

SRCCON was my favorite conference last year, and in the running for favorite conference of all time. I liked it so much I’ve submitted two proposals for this year. You should too! Submissions are open until April 10th.

Continous Integration for Content

There’s lots of little attributes which define the “quality” of a piece of content — just like there are attributes which define code quality. Developers have continuous integration to run automated checks on their code, but journalists have editors — who are prone to human error. It’s easy and quite common to forget to add a photo credit, or spell the SEO title incorrectly. What are some ways we can automate these errors out of existence? Let’s get together, present some real world “quality” problems to work on, prototype, wireframe, and define algorithms, and then share our results.

Code review takes two

Code review is single-handedly the best way to level up your development skills. It’s also really hard! Let’s discuss code review methodologies as a group, and then pair up to practice.

#pdxwp: Code Review Takes Two

We did a second pass at our code review meetup — last night turned out much better than the first. The high point for me: most of the “presentation” was, in fact, discussion. The latter proved to be way more valuable for everyone, as most of the twenty people in the room don’t do code review on a regular basis.

Here’s what we did:

  • Jeremy Ross submitted a section of code he had been working on, along with instructions on what he wanted feedback on.
  • On Saturday, I reviewed the code. I committed it in one commit to a branch in a private Github repo. On the changeset, I did a line-by-line read-through, commenting as I went. To wrap the review up, I created a pull request explaining how I did the review, what I looked for, and how to interpret my feedback.
  • Saturday night, I prepared a reveal.js presentation with an introduction to code review and the contents of what I found in my review. reveal.js is a super slick tool for preparing a HTML/CSS/JS presentation out of content in Markdown.
  • Jeremy read and considered my review, then updated the presentation with his feedback.
  • I did most of the presentation discussion facilitation, and Jeremy talked through how he received my feedback.

reveal.js doesn’t produce great static slide output, and I used its Markdown feature which requires Grunt to serve, so the bulk of what we covered will forever live on in the outline that follows.
Continue reading “#pdxwp: Code Review Takes Two”

Advantages of code review

Advantages of pre-deploy code review, over post-deploy audit:

  • Authors have a strong incentive to craft small, well-formed changes that will be readily understood, to explain them adequately, and to provide appropriate test plans, test coverage and context.
  • Reviewers have a real opportunity to make significant suggestions about architecture or approach in review. These suggestions are less attractive to adopt from audit, and may be much more difficult to adopt if significant time has passed between push and audit.
  • Authors have a strong incentive to fix problems and respond to feedback received during review, because it blocks them. Authors have a much weaker incentive to address problems raised during audit.
  • Authors can ask reviewers to apply and verify fixes before they are pushed.
  • Authors can easily pursue feedback early, and get course corrections on approach or direction.
  • Reviewers are better prepared to support a given change once it is in production, having already had a chance to become familiar with and reason through the code.
  • Reviewers are able to catch problems which automated tests may have difficulty detecting. For example, human reviewers are able to reason about performance problems that tests can easily miss because they run on small datasets and stub out service calls.
  • Communicating about changes before they happen generally leads to better preparation for their effects.