Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Parsing issues

by robertw (Sexton)
on Nov 16, 2012 at 00:57 UTC ( #1004093=perlquestion: print w/ replies, xml ) Need Help??
robertw has asked for the wisdom of the Perl Monks concerning the following question:

I have this really weird parsing issue, I split with htmlpagemark, a sign i gave at the beginning of each htmlpage saved but it does not split like that, it splits newlines

@differenthtml = split(/htmlpagemark/,@lines); print "\n$differenthtml[1]\n"; print "\n$differenthtml[2]\n"; #@lines = htmlpagemark http://finance.yahoo.com #/q/hp?s=%5EDJI&d=1 +0&e=8&f=2012&g=d&a=0&b=2&c=1992&z=66&y=5214 # <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" #"http://www.w3.o +rg/TR/html4/strict.dtd"> # It also consists of a lot of more text before the next #"htmlpagemar +k" #sorry if this question reveals to be noobish and simple #but i have n +o idea how to solve this issue the @lines = #contents of a read te +xt file

it prints: nothing following line ""

Comment on Parsing issues
Download Code
Re: Parsing issues
by Athanasius (Monsignor) on Nov 16, 2012 at 02:44 UTC

    split treats its second argument as a string (i.e. a scalar), so you need something like this:

    my @differenthtml = split /htmlpagemark/, join('', @lines);

    Two additional points (with apologies if they’re unnecessary):

    1. The first element of an array is at index 0, not 1 as suggested by the print statements.

    2. Always — yes, always! — begin your script with:

    3. use strict; use warnings;

    Hope that helps,

    Athanasius <°(((><contra mundum

Re: Parsing issues
by NetWallah (Abbot) on Nov 16, 2012 at 05:23 UTC
    The code below may give you some hints on how to process your data:
    use strict; use warnings; my @differenthtml; $/="htmlpagemark"; while (<DATA>){ chomp; next if $_ eq "htmlpagemark"; next unless length($_) > 0; push @differenthtml, $_; } for my $item (0..$#differenthtml){ print "===Item $item ==\n$differenthtml[$item]\n" } __DATA__ htmlpagemark http://finance.yahoo.com #/q/hp?s=%5EDJI&d=10&e=8&f=20 +12&g=d&a=0&b=2&c=199 +2&z=66&y=5214 # <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" #"http://www.w3.o +rg/TR/html4/strict.dtd"> # It also consists of a lot of more text before the next #"h t m l p a + g e m a r k" htmlpagemark http://another/URL More text #sorry if this question reveals to be noobish and simple #but i have n +o idea how to solve this issue + the= #contents of a read text file END

                 "By three methods we may learn wisdom: First, by reflection, which is noblest; Second, by imitation, which is easiest; and third by experience, which is the bitterest."           -Confucius

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1004093]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (12)
As of 2014-12-18 06:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (42 votes), past polls