Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Answer: How do I remove a specific keyword from a HTML page

by Foggy Bottoms (Monk)
on Jul 10, 2003 at 15:38 UTC ( #273019=categorized answer: print w/replies, xml ) Need Help??

Q&A > regular expressions > How do I remove a specific keyword from a HTML page - Answer contributed by Foggy Bottoms

     Hi kvale, you said that a good general strategy is to use HTML::Parser to decompose HTML into its constituent elements and extract the parts you want with event handlers..
     Even though this seems like a good way to handle HTML and retrieving data, I'm not convinced it's quite sufficient or efficient at all : I've been wanting to extract useful information from a webpage. What I infer by useful information is actually when you're on a newspaper website reading an article, to be able to retrieve the article only. In order to do that you need to find the beginning and the ending of the article's body. However, within the article itself there can be several HTML tags. I'm afraid your method would simply split the article apart turning it into nonsense.
     I haven't found any better way than to have a look at the HTML code itself and finding out whether special tags are used. Newspaper webmasters may sometimes use hidden HTML tags (<!-- article start-->) but then I need to come up with templates depending on which newspaper's website I'm currently analyzing.
     Have you any other idea ? I'd greatly appreciate your comments on this.

  • Comment on Answer: How do I remove a specific keyword from a HTML page
Log In?

What's my password?
Create A New User
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2018-02-19 06:24 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (258 votes). Check out past polls.