Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: Match Line And Combine Into One Line

by kcott (Archbishop)
on Jul 21, 2016 at 20:25 UTC ( [id://1168271]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Match Line And Combine Into One Line
in thread Match Line And Combine Into One Line

Firstly, this code won't compile as it contains a syntax error. Do not just post untested code! If you don't understand an error message, post the error you're getting and ask. Here's the offending line:

print NEW if $_, join ' ', @{ $reformat{$_} } for keys %reformat;

Take a look at "perlsyn: Statement Modifiers". The very first sentence starts with:

Any simple statement may optionally be followed by a SINGLE modifier, ...

"SINGLE" is emphasised for a very good reason: you can only use one statement modifier per statement. In the line I've identified, you've used two: if and for. Had you tried to run your code, you would have got a syntax error similar to the one in this example:

$ perl -e 'my @x = qw{a b}; print if $_ for @x' syntax error at -e line 1, near "$_ for " Execution of -e aborted due to compilation errors.

You have another issue that isn't an error but which would generate warning messages. The problem is that you haven't accounted for the file header line. You can skip this line with the simple expedient of adding this as the first line of your while loop:

next if $. == 1;

$. is a special variable that holds the line count. Line 1 is the header line and next will effectively ignore it. See "perlvar: Variables related to filehandles" for a more detailed description.

It's good that you've used the 3-argument form of open; it's less good that you've chosen global package variables to hold the filehandles and, indeed worse, that you've not chosen meaningful names. Once you get into the habit of using names like FILE, you'll use them often and, in all likelihood, multiple times in the same script or module: this is highly error-prone and can lead to bugs that are hard to track down. Instead, use lexical variables, with meaningful names, in the smallest possible scope; this greatly reduces the chances of errors and, in many cases, means you don't even need to use close as Perl will do this for you.

It's also good that you're checking for I/O errors with "or die 'error message'" code; however, hand-crafting these messages is tedious and it's easy to leave out important information or forget to add them altogether. If you use the autodie pragma, Perl will perform this task for you: less work for you and less chances of errors.

Putting all that together, along with your additional information, here's a new version of the script. Although not shown, my original script was pm_1168253_reformat_input.pl, this one's called pm_1168253_reformat_input_WITH_FILES.pl.

#!/usr/bin/env perl -l use strict; use warnings; use autodie; my $input_file = 'pm_1168253_reformat_input_INPUT.txt'; my $output_file = 'pm_1168253_reformat_input_OUTPUT.txt'; my %reformat; my $re = qr{^(H\d+,\d+,)(.*)$}; { open my $in_fh, '<', $input_file; while (<$in_fh>) { next if $. == 1; chomp; /$re/; push @{ $reformat{$1} }, $2; } } { open my $out_fh, '>', $output_file; print $out_fh $_, join ' ', @{ $reformat{$_} } for keys %reformat; }

Note the anonymous blocks. The filehandles go out of scope once these blocks are exited: their reference counts are reduced to zero and Perl performs an implicit close.

Here's the input file:

$ cat pm_1168253_reformat_input_INPUT.txt ACCOUNT,DATE,NOTE H123456,20151209,THIS IS A TEST H123456,20151209,TO COMBINE ALL H123456,20151209,MY MATCHING LINES H123456,20151209,INTO THE FIRST LINE H123456,20151209,THAT MATCHES. H654321,20151209,MATCH LINES FOR THIS H654321,20151209,ACCT INTO THE H654321,20151209,TOP LINE OF THE ACCT H432165,20151209,SINGLE LINE FOR THIS ONE

And here's the output file before and after running the script:

$ cat pm_1168253_reformat_input_OUTPUT.txt cat: pm_1168253_reformat_input_OUTPUT.txt: No such file or directory $ pm_1168253_reformat_input_WITH_FILES.pl $ cat pm_1168253_reformat_input_OUTPUT.txt H432165,20151209,SINGLE LINE FOR THIS ONE H123456,20151209,THIS IS A TEST TO COMBINE ALL MY MATCHING LINES INTO +THE FIRST LINE THAT MATCHES. H654321,20151209,MATCH LINES FOR THIS ACCT INTO THE TOP LINE OF THE AC +CT

As before, you may need a different ordering for your output but I'm still in the dark as to what you require.

— Ken

Replies are listed 'Best First'.
Re^4: Match Line And Combine Into One Line
by jlope043 (Acolyte) on Jul 21, 2016 at 23:45 UTC

    Thank you very much Ken and sorry about all the confusion. This stuff is not as easy as I thought it would be, so trying to explain my goal or outcome is difficult at times. But this worked exactly how I want it to. Thank you again for all your help and explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1168271]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-26 07:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found