Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: how to extract iframes from text

by marto (Chancellor)
on May 01, 2013 at 15:17 UTC ( #1031586=note: print w/ replies, xml ) Need Help??


in reply to Re: how to extract iframes from text
in thread how to extract iframes from text

Infinite loop for this test file:

<iframe id="derp" src="http://derp" width=800 hight=800> </iframe>

Output:

<iframe id="derp"</index> <iframe id="derp"</index> <iframe id="derp"</index> <iframe id="derp"</index> ....and so on

This is why it's better to use a parser to deal with mark up languages where possible.

Update: added missing " to input file and output.


Comment on Re^2: how to extract iframes from text
Select or Download Code
Re^3: how to extract iframes from text
by B-Man (Acolyte) on May 01, 2013 at 15:45 UTC

    Fair enough. This is actually easily fixable. You just have to replace a bit of code.

    $string = readline TEST;

    becomes

    while ( <TEST> ) { chomp $_; $string .= $_ ; }

    There. Now you've merged separate lines into a single string to search, and this works again. Happy?

    Edit:Oh, and if you're still missing an ending iframe tag, you can see if that exists in your while loop condition text. Heck, you could probably tell the user where they're missing a iframe tag if you took that idea a little furhter.

      I was never unhappy. I was simply pointing out that your posted solution doesn't work. You provide no caveats for the input, you don't cater for all valid HTML. Fundamentally your code doesn't return what OP wants. They want the value of the src element, though you'd have had to have read the other responses in the thread to know that.

      According to your response in the CB, I'm:

      "acting like the problem can't be fixed, and that's a load of crap."

      "basically saying my idea can't possibly be tweaked to work, so I should just use a parser, marto. The thing is, it can and was tweaked, and you're not as clever as you think."

      "missing the point. I fixed the error, and as long as there's an ending iframe tag from now on, my code works. Shit, I guess I could make sure there's an ending iframe tag too and that would prevent infinite loops caused by invalid html."

      I still think it's you who is missing the point. At no point did I suggest you couldn't write code to properly parse HTML from scratch. It'll take you a very long time to create your own parser for HTML which caters for all of it's foibles, with a convenient way to access/select each valid attribute and it's associated value, which is also well tested with many test cases.

      I think perhaps the scope of the problem wasn't something you'd fully considered when posting, or jumping to conclusions regards my response. A vast amount of work goes into creating parsers for HTML/XML/whatever which address their requirements and shortfalls. The solution OP has chosen is well tested, and they now have access to a toolkit which makes it trivial to cater for changes in the input/source data. The goal is to write code which works well and is easy to maintain.

      On a non technical note I honestly don't care what you think of me, however I ask that you take a step and think before acting when communicating online in places such as this. If you post something that has issues expect people to tell you about it. If you say something in the chatterbox and someone responds take the time to try and understand what they're saying. Of course you're free to disagree, but there's no need to be rude and jump to bizarre conclusions as to what others are saying or thinking. Few regulars here will intentionally give you bad advice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031586]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2014-11-28 23:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (200 votes), past polls