A real-world example

You should see TWO images here, they are linked to the same image, but using two url-resolution methods, purely relative (src='images/img.png') and root-relative (src='/images/img.png')

All the CODE in this page is verbatim taken from a live Drupal4 site.

Only the content has been scrambled to protect the innocent.

The structure was left mostly intact to bounce a few different layout challenges at it.

Normal HTML layout suff includes:

Lists and things
Embedded images
Subsections and subheadings

And often navigation and cross-references

This stand-alone example will not include the referenced files of course, BUT:

When an import process is run
It will rewrite links appropriately to find the related links as appropriate
Images come along to, although links to them may optionally be rewritten differently.

The template to use on this input is the supplied, generic catch-all html2simplehtml.xsl file included in the distribution. This template has more complexity, and a few alternative switches built in to make the best of whatever is thrown at it. For this reason it's not the best to learn from at first, although it does illustrate a few ways of solving problems encountered in page parsing.

THIS content was also basically valid

BUT most input from unknown sources needs to run through tidy before we can trust the XSL process on them.

To be honest - there was one set of invalid tags :(

  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" >

(and other metas and rel links in the header) Had to be repaired into true XHTML with a closing singleton tag.

  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Damn. Validation is hard.

Alternative tags

As this file ALSO includes a comment saying  it would have been possible to use regexp or text tags to find the content. But that's old-school.

OK, that's enough random waffle. The page content is now representatively replaced.

Current work

A real-world example

And often navigation and cross-references

THIS content was also basically valid

Alternative tags