Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Modify a txt file

by la (Novice)
on Oct 20, 2011 at 00:47 UTC ( #932533=perlquestion: print w/ replies, xml ) Need Help??
la has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I need your advice once again. I have a .txt file that has over 200,000 rows in it. It is therefore too big to modify by hand. I am trying to write a script that I can apply to it that will do the following:

Take the original format (short example shown here):

0 ASDF ASEE ASEE 13 DERG DREG 28 QWER QWER 42 WERT WERT WERT 55 QWEASD QWEASD QWEASD QWEASD

And change the format of this file into this:

0 0 0 0 13 13 13 28 28 28 42 42 42 42 55 55 55 55 55

Can anyone give me a suggestion of how to do this in perl? Your help is greatly appreciated. Thanks in advance.

Comment on Modify a txt file
Select or Download Code
Re: Modify a txt file
by GrandFather (Cardinal) on Oct 20, 2011 at 00:54 UTC

    You tell me what the real application for such a transformation is and show me the code you have tried and I'll help you get it right. I'm not about to write what looks like a homework answer for you however. And if it's not homework then you've probably simplified the problem to the point of meaninglessness.

    Update: Oh, I see this probably relates to something you have been working on for a few days and most likely isn't Perl homework. Maybe it's time you filled us in on the bigger picture so we can help with a cohesive overall solution rather than drip feeding little parts of a solution as you run into trouble?

    True laziness is hard work
Re: Modify a txt file
by Eliya (Vicar) on Oct 20, 2011 at 01:04 UTC

    One way to do it:

    #!/usr/bin/perl -w use strict; local $/ = ""; # paragraph mode while (<DATA>) { my ($num) = /^(\d+)/; s/\w+/$num/g; print; } __DATA__ 0 ASDF ASEE ASEE 13 DERG DREG 28 QWER QWER 42 WERT WERT WERT 55 QWEASD QWEASD QWEASD QWEASD

    (note that $/ = "" is not the same as $/ = undef, see perlvar)

Re: Modify a txt file
by Marshall (Prior) on Oct 20, 2011 at 03:18 UTC
    Given that the input file has a very large number of lines, I would try to just process the file line by line (no slurping the file into a single $file_content or @lines).

    Here is one way:

    #/usr/bin/perl -w use strict; my $cur_num; while (<DATA>) { $cur_num = $1 if (/^(\d+)/); # new $cur num if line starts # with digits s/(\S+)$/$cur_num/; # substitute the non-spaces at the end # of the line with the cur_num print; #a blank line is not modified } =Prints 0 0 0 0 13 13 13 28 28 28 42 42 42 42 55 55 55 55 55 =cut __DATA__ 0 ASDF ASEE ASEE 13 DERG DREG 28 QWER QWER 42 WERT WERT WERT 55 QWEASD QWEASD QWEASD QWEASD
Re: Modify a txt file
by davido (Archbishop) on Oct 20, 2011 at 05:08 UTC

    If you look at each grouping as a record, and set the input record separator to "\n\n" (which is what appears to separate the records in your example), it gets really easy:

    use strict; use warnings; $/ = "\n\n"; while ( <DATA> ) { if( my( $number ) = m/(\d+)/ ) { s/\b\p{Alpha}+\b/$number/g; } else { warn "Malformed record in input line $.\n$_\nContinuing.\n"; } print; } __DATA__ 0 ASDF ASEE ASEE 13 DERG DREG 28 QWER QWER 42 WERT WERT WERT 55 QWEASD QWEASD QWEASD QWEASD

    Here is the output from your test data:

    0 0 0 0 13 13 13 28 28 28 42 42 42 42 55 55 55 55 55

    Dave

      Wow! Thanks everyone for all of the help! I will work on it and let you know how it goes. Thanks again :)

      Hey Dave, Thanks for the advice. Although I am using your code, I am not getting the correct output...

      Code I am using:

      #!/usr/bin/perl use warnings; use strict; if($#ARGV<0){ die "Usage: $0 <*.txt>\n"; } open(IN,$ARGV[0]) ; $/ = "\n\n"; while ( <IN> ) { if( my( $number ) = m/(\d+)/ ) { s/\b\p{Alpha}+\b/$number/g; } else { warn "Malformed record in input line $.\n$_\nContinuing.\n"; } print; }

      Input:

      0 ASDF ASEE ASEE 13 DERG DREG 28 QWER QWER 42 WERT WERT WERT 55 QWEASD QWEASD QWEASD QWEASD

      Getting this output:

      0 0 0 0 13 0 0 28 0 0 42 0 0 0 55 0 0 0 0

      Desired Output:

      0 0 0 0 13 13 13 28 28 28 42 42 42 42 55 55 55 55 55

      I see that it is in this line s/\b\p{Alpha}+\b/$number/g; where the substitution is being made. Is the code referring back to the original 0 in the top left hand column perhaps?

        Why do you suppose this is happening? Have you taken any steps besides posting to figure out why the solution isn't working for you?

        When debugging it's often helpful to check the state of the program's logic at one or more points. An easy way to do this is with "print" statements that give you clues as to where you are within the program's control flow.

        For example, if you added a print "Record: $.\n"; statement as the first line of the block of your while() loop you would see each time the loop iterates over a new record. And if you added print "New match: $number\n"; as the first line of the if() block, you would see each time a new number is matched and captured into $number. After running the script with those two debugging aids you would probably see that the file is being read in as one big record, rather than as multiple records.

        That seems impossible if your input data matches the data you showed us, and if you're executing the code you say you are. Either you've got $/ = ''; in your code, or you have data that isn't separated by two newlines like it appears in your post. At least those are my best guesses without seeing exact cut&pastes of the first few records of your data, and of the script exactly as it's being run.

        For what it's worth, I copied and pasted the exact data you posted here and used that as the sample run for my solution. I also copied and pasted the exact data that you posted in your followup, and it also produced the correct results.

        Is it possible that you're re-typing the data rather than copy/pasting it, and that the blank line between records actually contains some space characters that we can't see, and that you didn't paste into your example data?

        By the way: This isn't contributing to your problem, but it is a darn good idea anyway: Put use autodie; right after the use warnings; line at the top of your script. That will alert you if a file fails to open (among other things).


        Dave

        The code you posted works as advertised by davido. Consider:

        use strict; use warnings; my $data = <<DATA; 0 ASDF ASEE ASEE 13 DERG DREG 28 QWER QWER 42 WERT WERT WERT 55 QWEASD QWEASD QWEASD QWEASD DATA open my $inFile, '<', \$data; $/ = "\n\n"; while (<$inFile>) { if (my ($number) = m/(\d+)/) { s/\b\p{Alpha}+\b/$number/g; } else { warn "Malformed record in input line $.\n$_\nContinuing.\n"; } print; }

        Prints:

        0 0 0 0 13 13 13 28 28 28 42 42 42 42 55 55 55 55 55

        so there is a mismatch between what you are telling us and what you are actually doing. If you really want help you need to really tell us what you are doing and show us (as per the sample code above) how things are going wrong. We can't fix what ain't broke!

        True laziness is hard work
Re: Modify a txt file
by mrstlee (Beadle) on Oct 20, 2011 at 09:36 UTC
    This one uses the new(ish) treat-strings-as-file-handles feature (See Effective Perl Programming)
    my $data = q(0 ASDF ASEE ASEE 13 DERG DREG 28 QWER QWER 42 WERT WERT WERT 55 QWEASD QWEASD QWEASD QWEASD ); open $hndl , "<", \(my $s = $data); open $out , ">", \(my $formatted = ''); my $substitute_field; READ_FILE: while (my $line = <$hndl>) { chomp $line; ## Ignore any lines that don't contain relevant text $line =~ /\S+/ or next READ_FILE; $line =~ /^\s*(\d+)(\s+)([A-Z]+)/ and do { $substitute_field = $1; print $out $substitute_field, $2 ,$substitute_field,"\n"; next READ_FILE; }; ## If we don't have a field to substitute move on defined $substitute_field or next READ_FILE; ## Must have valid line $line =~ s/^(\s*)([A-Z]+)/$1$substitute_field/; print $out $line,"\n"; } print $formatted; close $hndl; close $out;
    In:
    0       ASDF
            ASEE
            ASEE
    
    13      DERG
            DREG
    
    28      QWER
            QWER
    
    42      WERT
            WERT
            WERT
    
    55      QWEASD
            QWEASD
            QWEASD
            QWEASD
    
    out:
    0       0
            0
            0
    13      13
            13
    28      28
            28
    42      42
            42
            42
    55      55
            55
            55
            55
    

      You can dispense with the need for an input scalar by opening a filehandle on a reference to a HEREDOC. Also, there's no need to initialise your output scalar to the empty string as any previous content will be clobbered by the '>' in the open.

      knoppix@Microknoppix:~$ perl -e ' > open my $inFH, q{<}, \ <<EOD or die qq{open: << HEREDOC: > $!\n}; > line 1 > line 2 > line 3 > EOD > > my $out = qq{some rubbish here\n}; > open $outFH, q{>}, \ $out or die qq{open: > scalar: $!\n}; > > while ( <$inFH> ) > { > print $outFH uc; > } > > print $out;' LINE 1 LINE 2 LINE 3 knoppix@Microknoppix:~$

      I hope this is of interest.

      Cheers,

      JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://932533]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-04-19 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (478 votes), past polls