http://www.perlmonks.org?node_id=133876

mvaline has asked for the wisdom of the Perl Monks concerning the following question:

I wrote the following script to extract some information from a web page and print it to the screen. The web page is absolutly basic HTML so my paltry regexp works fine.

The problem is in the print statements at the end wherein I attempt to print out the information I extracted. As shown below, the script only prints out the contents of $avg_size. If I uncomment all but any of the variables, it prints out correctly.

I have tried forcing the screen to autoflush using the STDOUT->autoflush(1) method of the FileHandle module and by setting $| = 1. Neither option had any effect.

#!/usr/bin/perl -w use strict; use LWP::Simple; #### retrieve web page my $url = "http://www.visionforum.com/admin/avantgo.asp"; my $page_content = get($url); #### remove extraneous information and HTML markup (my $plain_text = $page_content) =~ s/<[^>]*>//gs; #### extract relevant info my @lines = split /\n/, $plain_text; my $title = $lines[10]; my $timestamp = $lines[11]; my $shoppers = $lines[13]; my $num_orders = $lines[17]; my $gross_sales = $lines[18]; my $avg_size= $lines[20]; #### output print $title; print $timestamp; print $shoppers; print $num_orders; print $gross_sales; print $avg_size; exit;

Replies are listed 'Best First'.
Re: Screen buffering mystery
by grep (Monsignor) on Dec 22, 2001 at 01:03 UTC
    You have \r\n's coming in as your line terminators.
    change my @lines = split /\n/, $plain_text; to my @lines = split /\r\n/, $plain_text;
    HTH
    grep
    grep> cd pub 
    grep> more beer
    
Re: Screen buffering mystery
by derby (Abbot) on Dec 22, 2001 at 01:09 UTC
    mvaline,

    running your code through the debugger we see this

    DB<1> x @lines 0 "\cM" 1 "\cM" 2 "\cM" 3 "\cM" 4 "VisionForum.com Stats for AvantGo\cM" 5 "\cM" 6 "\cM" 7 "\cM" 8 "\cM" 9 "\cM" 10 "VisionForum.com Stats\cM" 11 "2:00:43 PM - December 21, 2001\cM" 12 "\cM" 13 "Number of Shoppers: 93\cM" 14 "\cM" 15 "\cM" 16 "\cM" 17 "Today's Orders: 43\cM" 18 "Today's Gross Sales: \$4,435.63\cM" 19 "\cM" 20 "Average Order Size: \$103.15\cM" 21 "\cM" 22 "\cM" 23 "\cM" 24 "\cM"

    so you'll need to chop off those Ctrl-M's one by one or change your split to:

    my @lines = split /\r\n/, $plain_text;

    -derby

Re: Screen buffering mystery
by Albannach (Monsignor) on Dec 22, 2001 at 01:22 UTC
    As grep says, you are printing everything, but there are no newlines in any of those variables since split removes them. The output variables overwrite each other because each ends in a carriage return character (this can often be seen in the output as the one line you do get sometimes has garbage at the end left over from previous lines). One solution which builds on grep's and adds a somewhat simpler print statement follows:
    my @lines = map{"$_\n"} split /\r\n/, $plain_text; print @lines[10,11,13,17,18,20];
    If you want to keep the meanings of the various line numbers intact in the code, but avoid creating independent variables for that purpose, you could do something like the following:
    use strict; use LWP::Simple; my %info_lines = ( title => 10, timestamp => 11, shoppers => 13, num_orders => 17, gross_sales => 18, avg_size => 20, ); my $url = "http://www.visionforum.com/admin/avantgo.asp"; my $page_content = get($url); (my $plain_text = $page_content) =~ s/<[^>]*>//gs; my @lines = map{"$_\n"} split /\r\n/, $plain_text; print $lines[$_] for sort values %info_lines;

    --
    I'd like to be able to assign to an luser