Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Help with Toke Parser

by StarkRavingCalm (Sexton)
on Oct 27, 2015 at 20:41 UTC ( #1146178=perlquestion: print w/replies, xml ) Need Help??
StarkRavingCalm has asked for the wisdom of the Perl Monks concerning the following question:

good day monks

I have a script that will be used in a larger script but this part is giving me some trouble.

My goal for this part of the script is to perform a file listing on a webpage and into a hash with filename as key and file size as value.

But I have been unable to find a way to get filesize so I have tried to do it with just an array of filenames.

Here is the code as it currently stands, the issue is that it only prints the last element outside the loop, inside the loop it prints all of them. If anyone has a way to get it work with a hash as mentioned above, I'd rather that than messing with the array problem.
#!/usr/bin/perl use warnings; use strict; use Data::Dumper; use HTML::TokeParser; use LWP::Simple; use File::Basename; use List::Compare; ## POC URL: my $page=get('http://localhost/images'); my %urlhash; my @urlfiles; my @array; my @newarray; my $p= HTML::TokeParser->new(\$page); while (my $token = $p->get_tag("a")) { @array = $token->[1]{href} || "-"; my $text = $p->get_trimmed_text("/a"); ## Just a few lines of crap cleaner... for (@array) {s/test.txt//g}; for (@array) {s/\///g}; for (@array) {s/\?C\=N;O\=D//g}; for (@array) {s/\?C\=M;O\=A//g}; for (@array) {s/\?C\=S;O\=A//g}; for (@array) {s/\?C\=D;O\=A//g}; #print "@array\n"; } print "@array\n";

The crap cleaner section is to remove that from the webpage. I have removed it on my POC Apache server, but it exists on the server I will run it against, which I have no control over.

Thanks in advance!

Replies are listed 'Best First'.
Re: Help with Toke Parser
by tangent (Vicar) on Oct 27, 2015 at 21:08 UTC
    As you are already using LWP::Simple you can use that module's head() function to retrieve the size of the file:
    my %hash; while ( my $token = $p->get_tag("a") ) { if ( my $href = $token->[1]{'href'} ) { # may need to prefix domain to $href my ($type, $length, $mod, $exp, $server) = head($href); $hash{$href} = $length; } }
      Awesome! Thanks. Works great. Would still love to use a hash but this will get me to where I need for now.
Re: Help with Toke Parser
by stevieb (Abbot) on Oct 27, 2015 at 20:52 UTC

    I haven't ever used HTML::TokeParser so I'm unaware on how to get the file's size, but your array issue looks like it stems from the fact you're overwriting it in each loop, which explains why you are only getting the last element (actually, there would only be a single element, the one produced in the last loop of while()):

    @array = $token->[1]{href} || "-";

    I think what you want is this instead (see push):

    push @array, $token->[1]{href} || "-";

    Then it may be best to do the cleanup after while() loop:

    my $p= HTML::TokeParser->new(\$page); while (my $token = $p->get_tag("a")) { push @array, $token->[1]{href} || "-"; my $text = $p->get_trimmed_text("/a"); } for (@array){ next if /^-$/; # skip if line eq '-' s/ test.txt | \/ | \?C\=N;O\=D | \?C\=M;O\=A | \?C\=S;O\=A | \?C\=D;O\=A //xg; }

    To understand how I've turned your multiple regexes into a single one with embedded whitespace for clarity, see x modifier in perlre.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1146178]
Front-paged by GotToBTru
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2018-07-17 10:09 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (363 votes). Check out past polls.