Can you explain how you arrive from "HTML" to an "XML container limited to 1000 characters".
Of course, there is tidy (free) to clean up the code Word makes, but I don't know if it's good enough for you.
http://www.w3.org/People/Raggett/tidy/ tells more about it. There still is an active community maintaining it.
Typing up to 1000 characters doesn't take so long a time. Copy/paste from Word, then manually adding the formatting might be even faster.
It all depends on how often you want to do this, and where that rtf comes from.