in reply to
detecting the language of a word?
Any pointers/comments about
* some useful software/libraries
* my general approach
I've done a fair amount of accessibility work, so some general pointers:
- I'd seriously consider going for XHTML rather than HTML4.01... if you're starting from dodgy HTML it won't be that much more work, and having stuff in XML will make future site changes and content manipulation easier.
- For your bulk work take a good look at tidy before you spend a lot of time coding a custom perl solution. It will almost certainly do most of what you need.
- You won't be able to completely automate your translation work - you'll need to have a human in the loop. For example there are cases where you can have the same word in multiple languages, sometimes with different meanings.
- How is you're final site being audited for WCAG conformance? This cannot be automated since some of the checkpoints rely on human judgement - so make sure you have the audit process sorted before you start. Otherwise you may find yourself facing impossible goals
Also, if it's not already in one, log the site into some kind of source control system. You will want a log of the changes at some point during the process.