Greetings monks. I have a bunch of URLs of news articles and need to get the publication dates from these, if available. There is a python library designed especially for this purpose. I'm wondering if there is any similar Perl module. I've searched around and the only thing I found was Web::Scraper which would take quite a bit of rules development to do the job. Am hoping maybe someone has done that work already.

    I also know of another Python library, article-date-extractor, which has a set of regular expressions.

    I haven't ported it to Perl though.

    Can you get what you want just from doing a HEAD on the web page? That would give you the Last Updated date, I think. I'm not sure if that's exactly what you want.

      That works for some pages that have known meta fields like pubdate or time, but many web pages don't use them. I think the Python library applies some heuristics in such cases.

        "I think the Python library applies some heuristics in such cases"

        You could look at the Python code and implement the same thing in perl. Let me know if you get stuck.

