Parsing a long string

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello there Monks!
I have this long string over 1000000 chars, to try to explain what I am trying to do I have a sample code that I am parsing to get some values from it, I just can’t get it to work to print the values I am looking for. If someone would have the patience to look at this code and let me know what could be done better to make it print the right information from the inner IFs it will be very appreciated:

use strict;
use warnings;

my $data = '';
my $input_file = "string.txt";

  open (DATA,"$input_file") || eval {
    print "Can't open file $input_file: $@";
    exit;
  };

  my ($account,$value,$end_value);
  my @frags;
  my $c = 0;
  
  while (<DATA>) {
    $data = $_;
   
    print "\n 1 - ^^$data^^\n";
   
    my $frag  =  substr $data, 0, 6;
    my $frag2 =  substr $data, 61, 6;
    my $frag3 =  substr $data, 122, 6;

    #print "\n 2 - $c ^^$frag^^$frag2^^$frag3^^\n\n";
    push @frags, $frag, $frag2, $frag3;
    
    foreach my $fs (@frags) {
        
      $account =  substr $data, 9, 8; # should find: ACCOUNTA -  ACCOU
+NTB - ACCOUNTA
      print "\n 1 - ^$account^\n";
     
      if( $account eq 'ACCOUNTA') {

         $value =  substr $data, 22, 7;
         print "\n 2 - *$value*\n";
         
         if( $value eq 'XXXXXMA') {
         
           $end_value =  substr $data, 45, 6;
           print "\n 3 - ^$end_value^\n";
       
         }else {
            
              $end_value =  substr $data, 167, 6;
              print "\n 4 - *$end_value*\n";
         }
     

     }
     
    }

print "\n  $account - $value - $end_value\n";
#1START     ACCOUNTA     XXXXXMA      12345     XYZ111
#1START     ACCOUNTA     XXXXXNY      54321     XYZ131


    
  }
 
#DATA from string.txt:
#1START   ACCOUNTA     XXXXXMA      12345     XYZ111          1START  
+ ACCOUNTB     XXXXXBR      12345     XYZ191          1START   ACCOUNT
+A     XXXXXNY      54321     XYZ131
[download]

Thanks for looking!

Comment on Parsing a long string Download Code

Replies are listed 'Best First'.
Re: Parsing a long string by BrowserUk (Patriarch) on Aug 16, 2013 at 19:08 UTC
`$s = '1START ACCOUNTA XXXXXMA 12345 XYZ111 1ST +ART ...';; print $1 while $s =~ m[((?:\S+\s+){5})]g;; 1START ACCOUNTA XXXXXMA 12345 XYZ111 1START ACCOUNTB XXXXXBR 12345 XYZ191 1START ACCOUNTA XXXXXNY 54321 XYZ131` [download] Or as the records appear to be fixed length: `[0] Perl> print $1 while $s =~ m[(.{61})]g;; 1START ACCOUNTA XXXXXMA 12345 XYZ111 1START ACCOUNTB XXXXXBR 12345 XYZ191 1START ACCOUNTA XXXXXNY 54321 XYZ131` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Parsing a long string by Anonymous Monk on Aug 16, 2013 at 19:16 UTC
They are one long line of characters. I cannot use fixed length, I have to use "substr" to find the values I need and to the logic then.	[reply]
Re^3: Parsing a long string by AnomalousMonk (Archbishop) on Aug 16, 2013 at 20:08 UTC
I cannot use fixed length, I have to use "substr" to find the values I need ... I don't understand. All the examples of substr that appear in the OPed code use fixed offsets and lengths (at least, I assume you're the AnonyMonk who posted the OP). Sticking with fixed-width fields, here's another approach that might serve your needs — although I'm not really sure what those needs are! See pack for info on template specifiers; the `'@'` specifier in an unpack template moves to an absolute position. Note that the `'x'` specifier makes relative forward moves if you can figure out the relative displacements needed; this will save some absolute back-and-forthing. Also see perlpacktut. >perl -wMstrict -lE "my $data = 'ACCTAxACCTBxACCTCxxxFOOxxBARxxBAZxxBOFFxxxxx'; say qq{'$data'}; ;; my ($fr1, $fr2, $fr3, $at20, $at25, $at30, $at35) = unpack '@0 a5 @6 a5 @12 a5 @20 a3 @25 a3 @30 a3 @35 a4', $dat +a; ;; say qq{'$fr1' '$fr2' '$fr3'}; ;; my $account = $fr3; ;; my $value = ($account eq 'ACCTC') ? $at20 : 'unknown'; my $end_value = ($value eq 'FOO') ? $at25 : $at30; ;; say qq{account '$account' value '$value' end value '$end_value'}; " 'ACCTAxACCTBxACCTCxxxFOOxxBARxxBAZxxBOFFxxxxx' 'ACCTA' 'ACCTB' 'ACCTC' account 'ACCTC' value 'FOO' end value 'BAR' [download] Update: Changed example code to simplify logic.	[reply] [d/l] [select]
Re^3: Parsing a long string by BrowserUk (Patriarch) on Aug 16, 2013 at 20:05 UTC
I have to use "substr" to find the values No. You don't. As I demonstrated. But s'your choice ... With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Parsing a long string by toolic (Bishop) on Aug 16, 2013 at 18:17 UTC
If tokens are whitespace-separated and there are always 5 of them in a set: `use warnings; use strict; my $str = '1START ACCOUNTA XXXXXMA 12345 XYZ111 + 1START ACCOUNTB XXXXXBR 12345 XYZ191 1START + ACCOUNTA XXXXXNY 54321 XYZ131'; my $i = 0; while ($str =~ /(\S+)/g) { print "$1 "; $i++; print "\n" if $i % 5 == 0; } __END__ 1START ACCOUNTA XXXXXMA 12345 XYZ111 1START ACCOUNTB XXXXXBR 12345 XYZ191 1START ACCOUNTA XXXXXNY 54321 XYZ131` [download]	[reply] [d/l]
Re^2: Parsing a long string by Anonymous Monk on Aug 16, 2013 at 19:12 UTC
I wish it was that simple, the original long string if too complex, the end result from the sample code should only be: `1START ACCOUNTA XXXXXMA 12345 XYZ111 1START ACCOUNTA XXXXXNY 54321 XYZ131` [download]	[reply] [d/l]
Re: Parsing a long string [OT]: strange open by AnomalousMonk (Archbishop) on Aug 16, 2013 at 20:54 UTC
`open (DATA,"$input_file") \|\| eval { print "Can't open file $input_file: $@"; exit; };` [download] BTW: Are you aware that this is a very strange error handler for an open statement? As best I can figure, it uses an eval expression to print the value of a previous (Update: No! See below.) `eval` error (`$@` - see perlvar; in particular, Error Variables) as the file open error message. Why not just use a 'standard' `open my $fh, '<', $input_file or die "opening '$input_file': $!";` (see `$!` in perlvar and Error Variables). Update: A simple experiment, which I didn't have time to do before, shows that `$@` is emptied upon entry to an `eval` block: `>perl -wMstrict -le "eval { die 'zot!' }; print qq{after first eval: '$@'}; ;; eval { print qq{in second eval: '$@'} }; " after first eval: 'zot! at -e line 1. ' in second eval: ''` [download] In any event, `$@` is not the proper error variable to print after an I/O error.	[reply] [d/l] [select]
Re: Parsing a long string by atcroft (Abbot) on Aug 17, 2013 at 05:39 UTC
The original post was unclear what the intent was for the string. That being said, it would seem that setting $/ (the input record separator) to "1START" would make the data easier to handle. Read more... (1410 Bytes) Hope that helps.	[reply] [d/l] [select]


No such thing as a small change
	PerlMonks