Re: how to extract iframes from text
by moritz (Cardinal) on Apr 30, 2013 at 17:55 UTC
|
Regex is an option, but it usally doesn't work well for nexted delimiters (which can happen with iframes).
I personally like Mojo::DOM for this kind of task:
use Mojo::DOM;
say Mojo::DOM->new($yourstring)->at('iframe');
| [reply] [d/l] |
|
$VAR1 = '<p>No one\'s telling the truth anymore, and that makes the nu
+mbers suspect.</p>
<p>***<iframe width="480" height="360" src="http://localhost:8000/embe
+d/static/clips/2012/12/17/28210/test-rush" allowfullscreen="" framebo
+rder="0" scrolling="no"></iframe>***</p>
<p>\\n</p>
<p>Instead of addressing the fact that some text</p>
<p>\\n</p>
<p>***<iframe width="480" height="360" src="http://localhost:8000/embe
+d//static/video/2012/09/07/fnc-ff-20120907-doocytaxes" allowfullscree
+n="" frameborder="0" scrolling="\\"no\\""></iframe>***</p>
<p>\\n</p>
<p>The very first example AP cites was already corrected.some text ...
+.Reacting to recent <a href="/blog/2013/04/17/major-errors-undermine-
+key-argument-for-austeri">research</a> that has questions.</p>
<p>\\n</p>
<p>***<iframe width="480" height="360" src="http://localhost:8000/embe
+d/static/clips/2013/04/29/29939/fnc-an-20130429-hemmermooredebtgdp" a
+llowfullscreen="" frameborder="0" scrolling="no"></iframe>***</p>
<p>\\n Arriving at such a conclusion requires not only obscuring the i
+mportance in pushing global austerity <a href="/static/images/item/gd
+p-components.jpg">strong measures</a> of too little spending.</p>';
<iframe allowfullscreen="" frameborder="0" height="360" scrolling="no"
+ src="http://localhost:8000/embed/static/clips/2012/12/17/28210/test-
+rush" width="480"></iframe>
it seems to extract some of the non iframe stuff too....
how to get rid of the non iframe part or put just the iframe part in an array
| [reply] [d/l] [select] |
|
The output you are showing looks like it was produced by Data::Dumper, yet the line print STDERR (Mojo::DOM->new($args->{$t})->at('iframe')); doesn't contain any references to Data::Dumper.
Could it be that you are confused, and the output is produced by some other code?
| [reply] [d/l] |
|
|
|
|
Re: how to extract iframes from text
by hdb (Monsignor) on Apr 30, 2013 at 20:15 UTC
|
use strict;
use warnings;
use Data::Dumper;
my $string = <<'IFRAME';
<p>No one's telling the truth anymore, and that makes the numbers susp
+ect.</p>\n<p><iframe width=\"480\" height=\"360\" src=\"http://localh
+ost:8000/embed/static/clips/2012/12/17/28210/test-rush\" allowfullscr
+een=\"\" frameborder=\"0\" scrolling=\"no\"></iframe></p>\n<p>Instead
+ of addressing the fact that some text</p>\n<p><iframe width=\"480\"
+height=\"360\" src=\"http://localhost:8000/embed//static/video/2012/0
+9/07/fnc-ff-20120907-doocytaxes\" allowfullscreen=\"\" frameborder=\"
+0\" scrolling=\"no\"></iframe></p>\n<p>The very first example AP cite
+s was already corrected.some text ....Reacting to recent <a href="/bl
+og/2013/04/17/major-errors-undermine-key-argument-for-austeri">resear
+ch</a> that has questions.</p>\n<p><iframe width=\"480\" height=\"360
+\" src=\"http://localhost:8000/embed/static/clips/2013/04/29/29939/fn
+c-an-20130429-hemmermooredebtgdp\" allowfullscreen=\"\" frameborder=\
+"0\" scrolling=\"no\"></iframe></p>\n Arriving at such a conclusion r
+equires not only obscuring the importance in pushing global austerity
+ <a href="/static/images/item/gdp-components.jpg">strong measures</a>
+ of too little spending.
IFRAME
my @iframe;
for my $found ($string =~ /<iframe (.*?)<\/iframe>/gi ) {
push @iframe, {};
while($found =~ /(.*?)=(.*?)( |>)/g) {
$iframe[-1]->{$1} = $2;
}
}
print Dumper( \@iframe );
| [reply] [d/l] |
|
my $collection = Mojo::DOM->new($args->{'body'})->find('iframe');
my @links;
my $cpt = 0;
foreach (@$collection) {
$links[$cpt] = $_->{src};
print STDERR Dumper($_->{src}); # access elements of iframe with ->
$cpt++;
}
thanks you u'r help | [reply] [d/l] |
Re: how to extract iframes from text
by B-Man (Acolyte) on May 01, 2013 at 14:48 UTC
|
I actually developed some code to extract this. I'm not sure if it's the "easiest" way, but you don't need any external modules to make it work!
Yeah, I put the test string in a text so I didn't have to worry about escaping anything in a string literal.
| [reply] [d/l] |
|
<iframe id="derp"
src="http://derp"
width=800
hight=800>
</iframe>
Output:
<iframe id="derp"</index>
<iframe id="derp"</index>
<iframe id="derp"</index>
<iframe id="derp"</index>
....and so on
This is why it's better to use a parser to deal with mark up languages where possible.
Update: added missing " to input file and output. | [reply] [d/l] [select] |
|
$string = readline TEST;
becomes
while
( <TEST>
)
{
chomp $_;
$string .= $_ ;
}
There. Now you've merged separate lines into a single string to search, and this works again. Happy?
Edit:Oh, and if you're still missing an ending iframe tag, you can see if that exists in your while loop condition text. Heck, you could probably tell the user where they're missing a iframe tag if you took that idea a little furhter.
| [reply] [d/l] [select] |
|