That seems like a pretty regular structure. If you know that all the documents look like that, you can extract the values with a handful of simple regular expressions.
in reply to how to quickly parse 50000 html documents?
However, if the HTML documents can contain just about anything, including comments and attribute values that have content that looks like HTML, you'd need a full parser. You first have to parse your HTML, then parse the resulting structure, looking for a table that contains your data. This may be hard - the document could contain hundreds of tables, and you'll have to find the right one.