http://www.perlmonks.org?node_id=1011827

SilverShadow has asked for the wisdom of the Perl Monks concerning the following question:

How to pass output of "$token->as_is" to a variable in the following code to be able to strip out extra spaces before printing it on screen, also for doing other things with the output later as well.

I don't like to use extra modules to not make the code any bigger. so i prefer to use regex on the fly during the final stage.

the commented # are my retries so u can ignore it.

and I wonder why you guys output very small code font on this site, its very hard to read unless clicking on the download link which is not very comfortable to follow up on reading by keep clicking to display codes.

Thanks

use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new(url => 'http://domain.com/?xxxxx +xx'); my $level; while (my $tag = $p->get_tag('div')) { my $class = $tag->get_attr('id'); next unless defined($class) and $class eq 'content'; $level += 1; while (my $token = $p->get_token) { $level += 1 if $token->is_start_tag('div'); $level -= 1 if $token->is_end_tag('div'); #$_ = s/<([\w-\:]+)>(.*?)<\/\1>/$2 /g; #print $_; next unless $token->is_text; #$cleaned = $token->as_is =~ s/\s{2,}/ /gs; # should remove ex +tra spaces #print $cleaned; print $token->as_is; unless ($level) { last; } } }

Replies are listed 'Best First'.
Re: passing token output to a variable
by frozenwithjoy (Priest) on Jan 06, 2013 at 04:50 UTC
    The way you have it written, $cleaned just gets set to the # of substitutions that occurred. See this example:
    #!/usr/bin/env perl use strict; use warnings; use feature 'say'; my $string = "This is a string with variable numbers + of spaces."; say "original: $string"; my $number_of_substitutions = $string =~ s|\s{2,}| |g; say "cleaned: $string"; say "# of substitutions: $number_of_substitutions"; __END__ original: This is a string with variable numbers of + spaces. cleaned: This is a string with variable numbers of spaces. # of substitutions: 6
    One alternative for you would be:
    my $cleaned = $token->as_is; $cleaned =~ s/\s{2,}/ /g; # I took out the /s modifier. I thought it +was only for transliteration (e.g., $cleaned =~ tr/ //s).
    A second alternative, if you are using 5.14+, is non-destructive substitution (with the /r modifier):
    my $cleaned = $token->as_is =~ s/\s{2,}/ /gr;
      Thank you frozenwithjoy, your code works perfect as a standalone code but when using it with my code it doesn't the script just return or exit to command prompt without giving any output, warnings or error messages!
        nevermind, fixed it..works now! Kind Regards
Re: passing token output to a variable
by Athanasius (Archbishop) on Jan 06, 2013 at 05:41 UTC
      Thank you very much bro, it works, what about the node reply notifications to e-mail? :) cuz i can't find it anywhere on the setting
Re: passing token output to a variable (parens force assignment before substitution)
by Anonymous Monk on Jan 06, 2013 at 07:57 UTC

    to see your program how perl sees it run

    perl -MO=Deparse,-p foo.pl

    Then write that expression as

    ( $cleaned = $token->as_is ) =~ s/\s{2,}/ /gs;

    parens force assignment before substitution

      Thanks, I will try that