Building web applications using experience-driven strategies, agile development practices and open source technology.

Spine Health

Spine-Health

Spine-Health.com was started in August 1999 by Peter Ullrich, MD and Stephanie Burke, with a shared goal of providing intelligent, unbiased and highly relevant medical information for people with back pain, neck pain and related conditions. Doctor Ullrich in particular was adamant that patients should have the benefit of health information that is unbiased, comprehensive and reliable, and that the way to get that was through a formal peer review process (similar to a medical journal). Today Spine-Health gets over 850,000 visitors a month and has a community of over 10,000 users and is growing everyday.

Project Goals

  • Convert the existing static html site into a dynamic CMS using the Drupal framework.
  • Organize / Tag data in a way that makes it easier to manage and as a result makes it easier for the end user to find what they are looking for.
  • Maintain existing search engine ranking and ultimately improve it.
  • Ensure the site is extremely responsive, redundant, secure and able to scale out.
  • Dramatically improve the user experience and increase traffic.

Data Transformation

Spine-Health.com had consisted of over 5,000 + static html pages that were somewhat loosely categorized and varied slightly in format and layout. Naturally, manually importing this data into Drupal would not really be feasible. After examining the pages, patterns in the layouts were found. Simply using the Import HTML module would not be powerful enough or pull in the data reliably due to markup inconsistencies. The solution was to first run the pages through HTMLTidy then extract the data out of the pages using a custom Perl script I wrote that relied heavily on XPath libraries. Then write that data to file with the same page name in an xml format. Once this process was dialed in, converting the entire legacy site took about 3 hours for the script to run. Once the data was in the xml file it was a matter of using the Import HTML module after heavily modifying the XSLT to import certain data segments into CCK fields. Data was imported into Drupal in category chunks so that the data could be assigned a proper initial taxonomy term / tag. The development of this solution for Spine Health saved the company countless hours and money.

As data was imported PathAuto was used to generate the urls based off of the legacy url structure initially. Once this was complete the url_alias table was backed up for later use. Now a legacy url mapping to the drupal type urls could be made and we could move forward in converting the urls into a more search engine friendly format. This was performed by setting up new rules in PathAuto and doing a bulk update for the entire site.

Next phase was converting the thousands of links inside the node bodies into a tagging format to ensure links would always match the latest url aliases. To achieve this I wrote a custom module for Spine Health that worked its way through all the nodes and searched for internal links. Once an internal link was found, the code simply looked up the old url alias table, compared the Drupal path to the new url alias table, and created a tag.
For example...

<a href=/Old_LegacyLink.html”>Some internal node</a>

Turns into...


[url:2345,type=|node|,content=|Some internal link|]

or

[url:2345,type=|term|,content=|Some internal link to a term page|]

Once the links were converted to this tag structure, I simply made a custom content filter to properly display and render the links with the latest url alias entry. Bingo! No more broken links... ever! The custom filter gives Spine Health many attribute options they can use in the tag structure. It also supports linking images, nofollow, classes, and much more to give them full control without ever entering a hard coded link again. There is even an option to render the path only for use in javascript links. To aid in the implementation of using the tag structure, once logged in admins get a small text field next to the view / edit links on each page that includes the tag representation of the current page that they can copy and paste. This eliminates the admin having to look up the node nid and helps the admin with the tag syntax.

To ensure legacy links continued to resolve to the new urls, a simple SQL statement was written that inserted old urls and the new destination url as 301 redirects into the path_redirect table. I was able to map these urls by joining the old url alias table with the new one. As search engines and other sites continue to update their links we are able to trim this table considerably.

Performance / Server Architecture

I was also responsible for the hardware and architecture for Spine Health. Spine Health is a very busy site and is getting busier every month. So the key was to have enough initial power but also be able to easily scale out. Four Red Hat Linux servers along with a load balancer and a firewall do the trick for the site very well at this point. Two servers are front end web servers that are load balanced. For the database there are two more servers using MySQL in a Master Slave configuration. Drupal was modified to support the Master Slave configuration to ensure all writes / updates /deletes go to the Master and all reads go to the Slave. This configuration is also used for www.drupal.org. This results in a type of database load balancing, and allows Spine Health to add more Slave databases down the road if need be. This solution works extremely well for the site because most of the requests are reads.

For the development environment another server was set up with subversion and trac configured to manage changes to the site. Modifications are tested in the dev environment and once everything checks out changes are exported from subversion and promoted to the web servers with rsync over ssh. This can easily be done by admins at Spine Health by running a simple script once logged into the dev server via ssh.

In addition to the beefy hardware the site was integrated into Memcached. The Memcache module was used with the PECL Memcache library and a few core patches. This resulted in a VERY impressive performance kick!

Custom Modules

Spine Health Link Cleaner
Replaces internal hard links in nodes to custom tag format used with the Spine Health Link Filter Module. Also generates log file of broken links that could not be converted so that an admin can manually fix any broken links.

Spine Health Link Filter
Filter for parsing custom tags into node links using the url alias.

Spine Health Site Explorer (coming soon)
A module that queries Yahoo’s Site Explorer API to check for incoming links to the site and generate a report of incoming links that are not resolving.

Spine Health CSS Generator
The site has hundreds and hundreds of graphic based block titles and page titles. In the markup these are all represented as either h1 or h2 tags respectfully. The Spine Health CSS generator generates page specific CSS classes based on what blocks are being displayed on the page and what the page title is. This eliminates the need for a massive CSS file or multiple CSS files or even a PHP parsed CSS file that would be a nightmare to manage and maintain. Feel free to view source on a page and you will see the generated CSS generated specifically for that page.

Moving Forward

The next big step is to integrate all of the other Spine-Health related sites under the Drupal umbrella. Moving their forum with over 10,000 active users into the Drupal site will allow Spine Health to integrate data and cross promote articles to users in their community more efficiently.
Look for another case study in the next 30 days on the migration and implementation.

Additional Credits

Site graphics by the nice folks at fathead design inc.

Visit site: Spine-health.com
Got a question? Would like more details? Contact me