We don't bite newbies here... much PerlMonks

Newbie: uses/limits of perl in editing files

by wherethewild (Novice)
 on Nov 23, 2007 at 13:37 UTC Need Help??
wherethewild has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm a total newbie, not just to perl but to any form of scripting.

I'm trying to create a script which will take a text file that I have and run through it, changing bits here and there eg. on a line that starts with HEADER I want a number appended; I want an entire line inserted just before a line which starts with REMARK; and many other things.

So my question: is perl able to do this, or do I need to seek out another language/tool/something?

like I said...total newbie! Any words of wisdom greatly appreciated!

cheers,
wherethewild

• Comment on Newbie: uses/limits of perl in editing files

Replies are listed 'Best First'.
Re: Newbie: uses/limits of perl in editing files
by tirwhan (Abbot) on Nov 23, 2007 at 14:07 UTC

Welcome to the monastery. From the task description I'd say perl is very well suited. I'll give you an example for a program that roughly does what you describe

#!/usr/bin/perl

use warnings;
use strict;

my $filename = "whateveryourfileiscalled.txt"; my$newfile = "whateveryouwantthechangedfiletobecalled.txt";

open (my $rfh,"<",$filename) or die "Can't open file $filename :$!";
open (my $wfh,">",$newfile) or die "Can't open file $newfile :$!";

while (my $line = <$rfh>) {
if ($line =~ m/^HEADER/) { chomp$line;
my $number = 42; # change to whatever number you want to use$line .= $number."\n"; } if ($line =~ m/^REMARK/) {
print {$wfh} "Extra line\n" # Change to whatever extra line yo +u want } print {$wfh} $line; } close$rfh or die "Can't close $filename :$!";
close $wfh or die "Can't close$newfile : $!"; [download] Or you could do this in a perl oneliner (which will change the original file): perl -pi -e 'chomp;s/^(HEADER.*)$/${1}42/;s/^(REMARK.*)$/Extra line\n$+1/;$_.="\n"' whateveryourfileiscalled.txt
[download]
Caveat: Both of these are for systems where the line ending is "\n" (i.e. not Windows), adjust appropriately for other OSes. Update: the caveat is not actually correct, as pointed out by naikonta and wfsp, except possibly for the case outlined by Sixtease, also fixed in his update. Thanks to all of you.

All dogma is stupid.
Wow, I didn't know the print {$wfh}$line; construct. Does that disambiguate $wfh to be interpreted as a filehandle? Yes; because the "filehandle argument slot" (for lack of a better name) has to be a simple scalar value or a BLOCK. While it's superfluous in this particular case the block form is useful if you (for instance) have a hash of filehandles and want to use print {$handles->{$somekey} } "Yadda yadda yadda.\n"; directly rather than pulling it out into a tmp variable. The docs for print cover this. Yep, I picked that up from thedamians Perl Best Practices (a must-read for every Perl programmer IMO). As Fletch rightly points out, it's not necessary in this case, but I just use it wherever I print to a filehandle (easier to do than figure out what's wrong if I ever forget it :-) All dogma is stupid. Caveat: Both of these are for systems where the line ending is "\n" (i.e. not Windows), adjust appropriately for other OSes. I see no caveat in your example regarding \n. This character is just the Perl internal representative of a thing that constitutes line ending. So it will be whatever the underlying OSes (perl is run on) actually use to terminate lines. See how newlines are addressed in perlport. Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy! That's it! Wow, let's see how I go at adjusting it all for everything else I have to do to it. The weekend shall be fun. One problem though, and I guess it has something to do with the \n point you made at the end. Now I have <cr> appearing at the end of every line. I'm sitting on a Linux workstation running RedHat if that is at all helpful (I DID say I know nothing about this!) cheers wherethewild Here's my guess: Your file has Windows newlines (cr/lf), which your editor/viewer can deal with and shows it correctly. Then you add unix newlines (lf) on the lines you edit. Now there are mixed cr/lf and lf newlines, which confuses the editor and it shows the cr characters. If I'm correct, then I recommend either preprocess the file with the dos2unix tool or address this in the perl script itself update: The modified while loop could look like this: while (my$line = <$rfh>) { chomp$line;
if ($line =~ m/^HEADER/) { my$number = 42; # change to whatever number you want to use
$line .=$number;
}

if ($line =~ m/^REMARK/) { print {$wfh} "Extra line\n" # Change to whatever extra line yo
+u want
}

print {$wfh}$line, "\n";
}
[download]
Re: Newbie: uses/limits of perl in editing files
by Dominus (Parson) on Nov 23, 2007 at 15:06 UTC
Tie::File is nice for stuff like that.

It makes the file look like an array, with one line in each element. Then you modify the array. As you do, the changes appear in the file.

Re: Newbie: uses/limits of perl in editing files
by Sixtease (Friar) on Nov 23, 2007 at 14:08 UTC

You can do this with pretty much any scripting / programming language (if it has input/output capabilities and is turing-complete). And Perl may be the most comfortable one for this.

The code to do something like you said could look like

perl -pe '/^REMARK/ and print "the line you want to add\n"' < input_file > output_file
Re: Newbie: uses/limits of perl in editing files
by cdarke (Prior) on Nov 23, 2007 at 14:42 UTC
Exetending your requirements a little, there is a neat feature that is useful when replacing tokens, like your HEADER and REMARK. You can execute code from within a substitute statement, for example:
$line =~ s/(HEADER|REMARK)/mysub($1)/ge;
[download]
That will call user-written subroutine mysub every time HEADER and REMARK are found in the text. The argument passed is the text matched inside (). Whatever is returned by mysub will replace the token. It probably would not be worth it for the simple substitution you mentioned, but for more complex combinations it can be very powerful.
I was looking at those s/// but I wasn't sure I how I could get it to do some of the things I need as the text which has to be substitued is different from file to file and it also appears elswhere in the file, where it's not to be adjusted.

Was that clear?
This is a theoretical line is my file:
BOBBY X66666 A 345 674 A 123 488

The X66666 has to be changed to B22222. But the next file might have U33333 there instead of X66666, or worse still 666D3P, or even worse absolutely nothing at all. And I don't know what it might be unless I open up each of the text files and look what the previous program did to it (something I'm trying to avoid by learning this!). It SHOULD be that the spacing is constant across that line, but that's not guaranteed.

Anyway, that's some of what I'm trying to do. Thanks everyone for the speed and friendliness in helping me out!

BOBBY X66666 A 345 674 A 123 488

Assuming the piece you want to replace is the 2nd token in the line then you can do something like:

s/^(\S+)(\s+)(\S+)(.*)/${1}${2}B22222${4}/; [download] This reads like: (NOT WHITESPACE)(WHITESPACE)(NOT WHITESPACE)(EVERYTHING) That collects the first 3 pieces into variables$1-$3 then the remainder of the line into$4, then reassemblies the line with the pieces and the replacement.

Re: Newbie: uses/limits of perl in editing files
by dwm042 (Priest) on Nov 23, 2007 at 14:57 UTC
Perl is as close to a Swiss Army knife of a scripting language as exists. If you can't write the code yourself, you can, in most circumstance, find a solution written for you on CPAN.

Having said that, at this stage, you probably need a beginning text on writing Perl. Something like Learning Perl would be appropriate.

Create A New User
Node Status?
node history
Node Type: perlquestion [id://652562]
Approved by marto
Front-paged by Corion
help
Chatterbox?
 erix . o O( "what fools the french are, Jeeves" ) [choroba]: Also some Достое́вс кий [Discipulus]: mmh windows understand something like: cd c:\\\\path\\\\to weird.. [Discipulus]: even with odd number of \

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (11)
As of 2017-05-24 08:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My favorite model of computation is ...

Results (183 votes). Check out past polls.