Apache::Tidy

By anders pearson 21 Nov 2003

making websites validate can be a pain. especially if your authoring software or CMS generates invalid markup. if you can’t change your authoring software or CMS, but you still want your site to validate, you’ll need to find another solution.

one way or another, that solution usually will end up involving <a href=”http://tidy.sourceforge.net/“>HTML Tidy</a>, which is a handy command-line tool for automatically cleaning up and fixing messy, invalid markup. it does an amazing job of turning ugly, tag-soup, pseudo-HTML into clean, nicely formatted XHTML. the usual pattern is to figure out some way to run tidy over your pages in batch mode.

now you can also do it live, on demand, right as the page is served by apache. i’ve written a nice little module that just wraps HTML Tidy and makes it available to a mod_perl enabled apache 1.x server: <a href=”http://thraxil.org/tidy/Apache-Tidy-0.1.tar.gz”>Apache::Tidy</a>.

eg, here’s the <a href=”http://thraxil.org/tidy/sample_static/source.txt”>original static badly formed, invalid HTML</a> and here’s the <a href=”http://thraxil.org/tidy/sample_static/“>tidied version</a>.

i also wrote Apache::Tidy to be <a href=”http://search.cpan.org/~kwilliams/Apache-Filter-1.022/lib/Apache/Filter.pm”>Apache::Filter</a> compliant, so you can also use it to automatically correct dynamically generated content as long as it is generated by another Apache::Filter aware perl module. eg, <a href=”http://thraxil.org/tidy/sample_perl/source.txt”>some perl code</a> being handled by <a href=”http://search.cpan.org/search?query=Apache%3A%3ARegistryFilter&mode=all”>Apache::RegistryFilter</a>, here is the <a href=”http://thraxil.org/tidy/sample_perl/“>tidied output</a>.

Mark Pilgrim

Tags: perl html validation xhtml apache apache:tidy mod_tidy html tidy