Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Parsing a long string

by Anonymous Monk
on Aug 16, 2013 at 18:06 UTC ( [id://1049765]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello there Monks!
I have this long string over 1000000 chars, to try to explain what I am trying to do I have a sample code that I am parsing to get some values from it, I just can’t get it to work to print the values I am looking for. If someone would have the patience to look at this code and let me know what could be done better to make it print the right information from the inner IFs it will be very appreciated:
use strict; use warnings; my $data = ''; my $input_file = "string.txt"; open (DATA,"$input_file") || eval { print "Can't open file $input_file: $@"; exit; }; my ($account,$value,$end_value); my @frags; my $c = 0; while (<DATA>) { $data = $_; print "\n 1 - ^^$data^^\n"; my $frag = substr $data, 0, 6; my $frag2 = substr $data, 61, 6; my $frag3 = substr $data, 122, 6; #print "\n 2 - $c ^^$frag^^$frag2^^$frag3^^\n\n"; push @frags, $frag, $frag2, $frag3; foreach my $fs (@frags) { $account = substr $data, 9, 8; # should find: ACCOUNTA - ACCOU +NTB - ACCOUNTA print "\n 1 - ^$account^\n"; if( $account eq 'ACCOUNTA') { $value = substr $data, 22, 7; print "\n 2 - *$value*\n"; if( $value eq 'XXXXXMA') { $end_value = substr $data, 45, 6; print "\n 3 - ^$end_value^\n"; }else { $end_value = substr $data, 167, 6; print "\n 4 - *$end_value*\n"; } } } print "\n $account - $value - $end_value\n"; #1START ACCOUNTA XXXXXMA 12345 XYZ111 #1START ACCOUNTA XXXXXNY 54321 XYZ131 } #DATA from string.txt: #1START ACCOUNTA XXXXXMA 12345 XYZ111 1START + ACCOUNTB XXXXXBR 12345 XYZ191 1START ACCOUNT +A XXXXXNY 54321 XYZ131

Thanks for looking!

Replies are listed 'Best First'.
Re: Parsing a long string
by BrowserUk (Patriarch) on Aug 16, 2013 at 19:08 UTC

    $s = '1START ACCOUNTA XXXXXMA 12345 XYZ111 1ST +ART ...';; print $1 while $s =~ m[((?:\S+\s+){5})]g;; 1START ACCOUNTA XXXXXMA 12345 XYZ111 1START ACCOUNTB XXXXXBR 12345 XYZ191 1START ACCOUNTA XXXXXNY 54321 XYZ131

    Or as the records appear to be fixed length:

    [0] Perl> print $1 while $s =~ m[(.{61})]g;; 1START ACCOUNTA XXXXXMA 12345 XYZ111 1START ACCOUNTB XXXXXBR 12345 XYZ191 1START ACCOUNTA XXXXXNY 54321 XYZ131

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      They are one long line of characters. I cannot use fixed length, I have to use "substr" to find the values I need and to the logic then.
        I cannot use fixed length, I have to use "substr" to find the values I need ...

        I don't understand. All the examples of substr that appear in the OPed code use fixed offsets and lengths (at least, I assume you're the AnonyMonk who posted the OP). Sticking with fixed-width fields, here's another approach that might serve your needs — although I'm not really sure what those needs are! See pack for info on template specifiers; the  '@' specifier in an unpack template moves to an absolute position. Note that the  'x' specifier makes relative forward moves if you can figure out the relative displacements needed; this will save some absolute back-and-forthing. Also see perlpacktut.

        >perl -wMstrict -lE "my $data = 'ACCTAxACCTBxACCTCxxxFOOxxBARxxBAZxxBOFFxxxxx'; say qq{'$data'}; ;; my ($fr1, $fr2, $fr3, $at20, $at25, $at30, $at35) = unpack '@0 a5 @6 a5 @12 a5 @20 a3 @25 a3 @30 a3 @35 a4', $dat +a; ;; say qq{'$fr1' '$fr2' '$fr3'}; ;; my $account = $fr3; ;; my $value = ($account eq 'ACCTC') ? $at20 : 'unknown'; my $end_value = ($value eq 'FOO') ? $at25 : $at30; ;; say qq{account '$account' value '$value' end value '$end_value'}; " 'ACCTAxACCTBxACCTCxxxFOOxxBARxxBAZxxBOFFxxxxx' 'ACCTA' 'ACCTB' 'ACCTC' account 'ACCTC' value 'FOO' end value 'BAR'

        Update: Changed example code to simplify logic.

        I have to use "substr" to find the values

        No. You don't. As I demonstrated. But s'your choice ...


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Parsing a long string
by toolic (Bishop) on Aug 16, 2013 at 18:17 UTC
    If tokens are whitespace-separated and there are always 5 of them in a set:
    use warnings; use strict; my $str = '1START ACCOUNTA XXXXXMA 12345 XYZ111 + 1START ACCOUNTB XXXXXBR 12345 XYZ191 1START + ACCOUNTA XXXXXNY 54321 XYZ131'; my $i = 0; while ($str =~ /(\S+)/g) { print "$1 "; $i++; print "\n" if $i % 5 == 0; } __END__ 1START ACCOUNTA XXXXXMA 12345 XYZ111 1START ACCOUNTB XXXXXBR 12345 XYZ191 1START ACCOUNTA XXXXXNY 54321 XYZ131
      I wish it was that simple, the original long string if too complex, the end result from the sample code should only be:
      1START ACCOUNTA XXXXXMA 12345 XYZ111 1START ACCOUNTA XXXXXNY 54321 XYZ131
Re: Parsing a long string [OT]: strange open
by AnomalousMonk (Archbishop) on Aug 16, 2013 at 20:54 UTC
    open (DATA,"$input_file") || eval { print "Can't open file $input_file: $@"; exit; };

    BTW: Are you aware that this is a very strange error handler for an open statement? As best I can figure, it uses an eval expression to print the value of a previous (Update: No! See below.) eval error ($@ - see perlvar; in particular, Error Variables) as the file open error message. Why not just use a 'standard'
        open my $fh, '<', $input_file or die "opening '$input_file': $!";
    (see $! in perlvar and Error Variables).

    Update: A simple experiment, which I didn't have time to do before, shows that  $@ is emptied upon entry to an eval block:

    >perl -wMstrict -le "eval { die 'zot!' }; print qq{after first eval: '$@'}; ;; eval { print qq{in second eval: '$@'} }; " after first eval: 'zot! at -e line 1. ' in second eval: ''

    In any event,  $@ is not the proper error variable to print after an I/O error.

Re: Parsing a long string
by atcroft (Abbot) on Aug 17, 2013 at 05:39 UTC

    The original post was unclear what the intent was for the string. That being said, it would seem that setting $/ (the input record separator) to "1START" would make the data easier to handle.

    Hope that helps.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1049765]
Approved by toolic
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2024-04-26 08:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found