Mean in green
I'm Kevin. I live in Salem, Mass with my wife and little boy and I build software.

Saving those old URLs

Tuesday Nov 18, 2008

Search engine rankings are invaluable when it comes to the success of a site. I've managed sites that have received upwards of 100,000 referrals a day from Google alone. This is not even to mention all of the external links and bookmarks to your pages. If you are thinking about migrating to Drupal and have not considered the importance of maintaining old URLs, you definitely should. The impact of losing that amount of traffic could be catastrophic.

Taking some careful steps when migrating to Drupal can ensure that you don't lose any of of your important visitors. Of course, you could capture all of those old links with a generic "page not found" message, but this is frustrating for users and will inevitably hurt your search engine rankings. Thankfully there are options. With a small amount of configuration, you can still send your visitors to the right place in your new site. Here are a few options to maintain URLs using Apache's mod_rewrite module and a couple of Drupal modules.

Mapping a previously used ID to a NID

If you are fortunate enough to be able to map a previously used ID number to a Drupal NID, then you can very easily use mod_rewrite to remap the URL. Let's say that the old URL looked like http://example.com/index.asp?id=123, but your new Drupal site uses a URL like http://example.com/node/123. The following rules could be used to remap the old URL unbeknownst to the user. The following examples would be placed before the Drupal rewrite rules in your .htaccess file:

Match a request for index.asp

RewriteCond %{REQUEST_URI} ^/index.asp$

Match a query string like id=[some number] and capture that number

RewriteCond %{QUERY_STRING} ^id=([0-9].*)$

Rewrite a Drupal friendly URL using the captured number

Note that %1 is a backreference from a RewriteCond

where $1 is a backreference from a RewriteRule

RewriteRule ^.*$ index.php?q=node/%1 [L] For a greedy match, you could drop the first RewriteCond and it would capture any URL that ends in "?id=[some number]". The convenient thing here is that Apache doesn't need to know what to do with an ASP file extension because these URLs are rewritten as requests to index.php.

I should note that it is outside the scope of this post to explain how to create new nodes with a NID that matches the ID of an imported item from your old site. I'll save that for another day.

Mapping a previously used numeric filename to a NID

Perhaps your former CMS creates static HTML files, but uses an ID predictably in the filename. The following example will capture any numeric characters in parentheses and append them to index.php?q=node/. e.g. requesting file5.html would return the same contents as node/5.

Map a filename with a predictable number to a drupal nid

RewriteRule ^file([0-9.*]).html$ index.php?q=node/$1 [L]

The Added Bytes website has a really convenient mod_rewrite cheat sheet if you want to learn more.

Creating individual URL aliases in Drupal

In some cases you might not want to use mod_rewrite to process these URLs. It might be too much overhead for your server, or you might not have a predictable ID as a reference. Using the Path Module, which comes with Drupal Core, you can add aliases for system paths. You can manually enter your Aliases by navigating to Admin -> Site Building -> URL aliases -> Add. The following example would map your old URL 2008/12/12/my-seo-page.html to the Drupal URL node/5

Add a path alias

Alternatively, you can add these aliases when you edit each node under the heading URL path settings.

A programmatic solution

The above example of using the Path Module is great when you only have a handful of pages to alias. But consider a migration that has 250,000 pages to remap. If you are using a migration script, you can create these aliases on the fly during the import process using Drupal's path_set_alias() function. <?php

require "includes/bootstrap.inc"; drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$node = new stdClass();

// Here you would look up your old article information and populate the node object. $node->title = $old_title; // etc.

// Next you would want to create some logic to reconstruct your old URL // example.com/2008/10/18/my-old-url.html -- use: $old_url = "2008/10/18/my-old-url.html";

// Save the node in the database node_save($node);

// Save the new alias based on your old URL if ($node->nid) { path_set_alias('node/' . $node->nid, $old_url, NULL, 'en'); }

?>

A Drupal module solution

So, what if you want visitors to see your new Drupal-style URLs and you also want search engines to start indexing them? Well, one option would be to use the Global Redirect module.

When the Path module creates an alias, it does not remove access to the original path. So, /node/123 and /your-aliased-url will both show the same content. This can be a detriment to your search engine rankings because they frown on duplicate content. The Global Redirect module solves this problem by creating a 301 status code to permanently redirect from /node/123 to /your-aliased-url.

This module can also be used in conjunction with the mod_rewrite rules above. If you rewrite /index.asp?id=123 to /node/123 and create the URL alias of /my-aliased-node for /node/123, Global Redirect will essentially tell visitors that /index.asp?id=123 should be permanently redirected to /my-aliased-node.

As you can see, there are many options for capturing your old URLs. No matter what solution you choose, be sure that people coming to your site are seeing what they expect. With a little bit of planning, your visitors won't notice a thing and you won't hurt your valuable relationship with the search engines.