http://www.perlmonks.org?node_id=1031437

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

When dump my variable I have this

<p>No one's telling the truth anymore, and that makes the numbers suspect.</p>\n<p><iframe width=\"480\" height=\"360\" src=\"http://localhost:8000/embed/static/clips/2012/12/17/28210/test-rush\" allowfullscreen=\"\" frameborder=\"0\" scrolling=\"no\"></iframe></p>\n<p>Instead of addressing the fact that some text</p>\n<p><iframe width=\"480\" height=\"360\" src=\"http://localhost:8000/embed//static/video/2012/09/07/fnc-ff-20120907-doocytaxes\" allowfullscreen=\"\" frameborder=\"0\" scrolling=\"no\"></iframe></p>\n<p>The very first example AP cites was already corrected.some text ....Reacting to recent <a href="/blog/2013/04/17/major-errors-undermine-key-argument-for-austeri">research</a> that has questions.</p>\n<p><iframe width=\"480\" height=\"360\" src=\"http://localhost:8000/embed/static/clips/2013/04/29/29939/fnc-an-20130429-hemmermooredebtgdp\" allowfullscreen=\"\" frameborder=\"0\" scrolling=\"no\"></iframe></p>\n Arriving at such a conclusion requires not only obscuring the importance in pushing global austerity <a href="/static/images/item/gdp-components.jpg">strong measures</a> of too little spending.

what would be the easiest way to extract all the <iframe ..... ></iframe> from the text. Is regex an option here?

Replies are listed 'Best First'.
Re: how to extract iframes from text
by moritz (Cardinal) on Apr 30, 2013 at 17:55 UTC

      thanks for the quick answer.

      Here is the line of code: print STDERR (Mojo::DOM->new($args->{$t})->at('iframe'));

      this is what i get :

      $VAR1 = '<p>No one\'s telling the truth anymore, and that makes the nu +mbers suspect.</p> <p>***<iframe width="480" height="360" src="http://localhost:8000/embe +d/static/clips/2012/12/17/28210/test-rush" allowfullscreen="" framebo +rder="0" scrolling="no"></iframe>***</p> <p>\\n</p> <p>Instead of addressing the fact that some text</p> <p>\\n</p> <p>***<iframe width="480" height="360" src="http://localhost:8000/embe +d//static/video/2012/09/07/fnc-ff-20120907-doocytaxes" allowfullscree +n="" frameborder="0" scrolling="\\"no\\""></iframe>***</p> <p>\\n</p> <p>The very first example AP cites was already corrected.some text ... +.Reacting to recent <a href="/blog/2013/04/17/major-errors-undermine- +key-argument-for-austeri">research</a> that has questions.</p> <p>\\n</p> <p>***<iframe width="480" height="360" src="http://localhost:8000/embe +d/static/clips/2013/04/29/29939/fnc-an-20130429-hemmermooredebtgdp" a +llowfullscreen="" frameborder="0" scrolling="no"></iframe>***</p> <p>\\n Arriving at such a conclusion requires not only obscuring the i +mportance in pushing global austerity <a href="/static/images/item/gd +p-components.jpg">strong measures</a> of too little spending.</p>'; <iframe allowfullscreen="" frameborder="0" height="360" scrolling="no" + src="http://localhost:8000/embed/static/clips/2012/12/17/28210/test- +rush" width="480"></iframe>

      it seems to extract some of the non iframe stuff too....

      how to get rid of the non iframe part or put just the iframe part in an array
Re: how to extract iframes from text
by hdb (Monsignor) on Apr 30, 2013 at 20:15 UTC

    Against all recommendations:

    use strict; use warnings; use Data::Dumper; my $string = <<'IFRAME'; <p>No one's telling the truth anymore, and that makes the numbers susp +ect.</p>\n<p><iframe width=\"480\" height=\"360\" src=\"http://localh +ost:8000/embed/static/clips/2012/12/17/28210/test-rush\" allowfullscr +een=\"\" frameborder=\"0\" scrolling=\"no\"></iframe></p>\n<p>Instead + of addressing the fact that some text</p>\n<p><iframe width=\"480\" +height=\"360\" src=\"http://localhost:8000/embed//static/video/2012/0 +9/07/fnc-ff-20120907-doocytaxes\" allowfullscreen=\"\" frameborder=\" +0\" scrolling=\"no\"></iframe></p>\n<p>The very first example AP cite +s was already corrected.some text ....Reacting to recent <a href="/bl +og/2013/04/17/major-errors-undermine-key-argument-for-austeri">resear +ch</a> that has questions.</p>\n<p><iframe width=\"480\" height=\"360 +\" src=\"http://localhost:8000/embed/static/clips/2013/04/29/29939/fn +c-an-20130429-hemmermooredebtgdp\" allowfullscreen=\"\" frameborder=\ +"0\" scrolling=\"no\"></iframe></p>\n Arriving at such a conclusion r +equires not only obscuring the importance in pushing global austerity + <a href="/static/images/item/gdp-components.jpg">strong measures</a> + of too little spending. IFRAME my @iframe; for my $found ($string =~ /<iframe (.*?)<\/iframe>/gi ) { push @iframe, {}; while($found =~ /(.*?)=(.*?)( |>)/g) { $iframe[-1]->{$1} = $2; } } print Dumper( \@iframe );
      found what i wanted:

      my $collection = Mojo::DOM->new($args->{'body'})->find('iframe'); my @links; my $cpt = 0; foreach (@$collection) { $links[$cpt] = $_->{src}; print STDERR Dumper($_->{src}); # access elements of iframe with -> $cpt++; }
      thanks you u'r help
Re: how to extract iframes from text
by B-Man (Acolyte) on May 01, 2013 at 14:48 UTC

    I actually developed some code to extract this. I'm not sure if it's the "easiest" way, but you don't need any external modules to make it work!

    Yeah, I put the test string in a text so I didn't have to worry about escaping anything in a string literal.

      Infinite loop for this test file:

      <iframe id="derp" src="http://derp" width=800 hight=800> </iframe>

      Output:

      <iframe id="derp"</index> <iframe id="derp"</index> <iframe id="derp"</index> <iframe id="derp"</index> ....and so on

      This is why it's better to use a parser to deal with mark up languages where possible.

      Update: added missing " to input file and output.

        Fair enough. This is actually easily fixable. You just have to replace a bit of code.

        $string = readline TEST;

        becomes

        while ( <TEST> ) { chomp $_; $string .= $_ ; }

        There. Now you've merged separate lines into a single string to search, and this works again. Happy?

        Edit:Oh, and if you're still missing an ending iframe tag, you can see if that exists in your while loop condition text. Heck, you could probably tell the user where they're missing a iframe tag if you took that idea a little furhter.