I just need to extract data within certain "class"es, regardless of the tag.
Have a look at
HTML::TreeBuilder::XPath - once you get to know Xpath you'll never look back. This should work for your sample data (slightly modified):
use HTML::TreeBuilder::XPath;
my $html = q|<div class="message reply">
<span class="profile fn">Person Name</span>
<span class="time published" title="2012-03-14T21:37:16+0000">March 14
+, 2012 at 3:37 pm</span>
<abbr class="time published" title="2013-03-17T21:37:16+0000">March 17
+, 2013 at 3:37 pm</abbr>
<div class="msgbody">Message body here.</div>
</div>|;
my $tree = HTML::TreeBuilder::XPath->new_from_content($html);
my @nodes = $tree->findnodes('//*[@class="time published"]');
for my $node ( @nodes ) {
print $node->attr('title'), "\n";
print $node->as_text, "\n";
}
Output:
2012-03-14T21:37:16+0000
March 14, 2012 at 3:37 pm
2013-03-17T21:37:16+0000
March 17, 2013 at 3:37 pm