Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Answer: How do I remove a specific keyword from a HTML page

by Foggy Bottoms (Monk)
on Jul 10, 2003 at 15:38 UTC ( #273019=categorized answer: print w/replies, xml ) Need Help??

Q&A > regular expressions > How do I remove a specific keyword from a HTML page - Answer contributed by Foggy Bottoms

     Hi kvale, you said that a good general strategy is to use HTML::Parser to decompose HTML into its constituent elements and extract the parts you want with event handlers..
     Even though this seems like a good way to handle HTML and retrieving data, I'm not convinced it's quite sufficient or efficient at all : I've been wanting to extract useful information from a webpage. What I infer by useful information is actually when you're on a newspaper website reading an article, to be able to retrieve the article only. In order to do that you need to find the beginning and the ending of the article's body. However, within the article itself there can be several HTML tags. I'm afraid your method would simply split the article apart turning it into nonsense.
     I haven't found any better way than to have a look at the HTML code itself and finding out whether special tags are used. Newspaper webmasters may sometimes use hidden HTML tags (<!-- article start-->) but then I need to come up with templates depending on which newspaper's website I'm currently analyzing.
     Have you any other idea ? I'd greatly appreciate your comments on this.

  • Comment on Answer: How do I remove a specific keyword from a HTML page
Log In?

What's my password?
Create A New User
[erix]: Hey that's Hungary - wave down ambrus and tell him to reboot his systems!
[choroba]: it might be a problem if you expect to reach the maximal possible value soon without squashing the ids
[erix]: that's really impossible with bigint
[erix]: and the site is still only 'hopefully' active :)
[erix]: but fair enough, he may think he solves a problem

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2017-06-23 15:48 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (551 votes). Check out past polls.