Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file

by fasoli (Beadle)
on Aug 17, 2017 at 12:48 UTC ( [id://1197553]=perlquestion: print w/replies, xml ) Need Help??

fasoli has asked for the wisdom of the Perl Monks concerning the following question:

Hi Wise Monks!

I've been really confused about a problem I'm having with a file. It's a text file, with 4 columns, and with a few thousand lines. The contents are numbers that look like this

1.234 5.6789 -1.235

Those files occur as outputs from a software. The problem is that in some cases the contents look like this

1.234 5.6789-12.235

*notice the number of the last column: because now there are 2 numbers before the decimal point, the number gets stuck on the second column.

Naturally now I'm having trouble plotting this file. So I'm trying to match strings where there is a digit, followed by a hyphen, followed by another digit, and then I want to replace this -hopefully correctly- with an added whitespace so that the numbers are correct.

I'm trying this and the regex match works, it does print the problematic bits:.

#!/usr/bin/perl use warnings; use strict; my $test; open my $INPUT, '<', "file.txt" or die $!; while (<$INPUT>) { chomp $_; if ($_=~/(\d)(-)(\d)/) { print "$1$2$3 \n"; } }

But now I'm stuck: how do I complete the replace action? And how do I print the new contents of the file? I haven't succeeded in anything more than compilation errors. In terms of replacing, I've tried this

   if ($_=~s/(\d)(-)(\d)/(\d)    (-)(\d)/) {

(supposedly telling the script to add spaces between the digit before the hyphen and the hyphen itself)

but I get this error

Unrecognized escape \d passed through

Then I tried it with the $1$2$3 but again it was wrong. Can you give me any hints about how to make the replace function work?? Thank you so much!

  • Comment on match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file
  • Select or Download Code

Replies are listed 'Best First'.
Re: match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file
by choroba (Cardinal) on Aug 17, 2017 at 12:55 UTC
    If you just want to insert a space between a digit and a minus sign:
    while (<>) { s/(\d)-/$1 -/g; print; }

    Are you sure the output isn't in a fixed-width format?

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      No, it's not fixed width - these are supposed to be atomic coordinates and sometimes they're just printed wrong. I hope I understood your question although not sure... Brilliant, this was so easy! Well - it seems like it works, but need to use printf to be able to read the file to make sure, as it's very long. What am I doing wrong with printf here? It doesn't seem to make any difference. Am I using $_ wrong?
      #!/usr/bin/perl use warnings; use strict; my $test; my $output; open my $INPUT, '<', "file.txt" or die $!; while (<$INPUT>) { chomp $_; s/(\d)-/$1 -/g; open $output, '>>', "test"; printf $output ("%10s \n", "$_"); }
        > need to use printf

        Why do you need it?

        Also, don't open the output file in each iteration of the loop. Just open it once before the loop starts.

        Moreover, $_ and "$_" makes no difference in this case. Drop the useless quotes.

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file (updated x2)
by haukex (Archbishop) on Aug 17, 2017 at 13:11 UTC

    When I think "I need to match a number" I think of Regexp::Common::number, for example:

    use warnings; use strict; use Regexp::Common qw/number/; use Data::Dump; #Debug while (<DATA>) { /^($RE{num}{decimal}\s*){4}$/ or die; my @cols = /$RE{num}{decimal}/g; dd @cols; #Debug } __DATA__ 1.234 5.6789 -1.235-4 1.234 5.6789-12.235-4

    Which outputs:

    (1.234, 5.6789, -1.235, -4) (1.234, 5.6789, -12.235, -4)

    For an approach that is probably overkill for parsing something like this, see Re: How to split a non even number of string elements into a hash [RESOLVED].

    Update: If you want to just rewrite the file, you can use the following oneliner, but note this will discard anything that doesn't match the regex! If you want to add validity checking, you could add /^($RE{num}{decimal}\s*){4}$/ or die; (which I have now added to the above example to make it more robust, although it's a little less efficient now since it matches each line twice).

    $ perl -MRegexp::Common=number -nle '$,=" ";print/$RE{num}{decimal}/g' + input.txt >output.txt

    Update 2: TIMTOWTDI, this inserts a space between two numbers that previously did not have a space between them (and it's a bit safer because inserting spaces should be all it does, instead of rewriting the entire lines):

    $ perl -MRegexp::Common=number -pe 's/(?>$RE{num}{decimal})\K(?=$RE{nu +m}{decimal})/ /g' input.txt >output.txt

    Sorry, this post also had a few ninja edits. OP msg'd.

      This is brilliant! However the output now needs further formatting as it's printed with brackets etc. I tried adding a print statement but dd still prints everything. A bit confusing :(
        However the output now needs further formatting as it's printed with brackets etc. I tried adding a print statement but dd still prints everything.

        You can print @cols any way you want, using a regular print or printf, instead of Data::Dump, which is just a debugging aid. If you could show what code you are trying and what output format you expect, we could help better (How do I post a question effectively?). Also, note I updated my node right around the time you replied, check out the update too.

Re: match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file
by Marshall (Canon) on Aug 18, 2017 at 02:24 UTC
    Sometimes just modifying the input line with a very simple expression works out fine. In this case, apparently sometimes the "-", minus sign is sometimes not preceded by a space. A simple idea is to fix that in the input line (if it occurs). The code below does not modify the input line unless there is a negative number which is not preceded by a space.

    Once a simple space separated token rule has been enforced, then a simple split statement suffices. I don't know if this is true in this case, but often running a couple of simple regex/split statements can execute faster than a single complicated regex.

    use warnings; use strict; while (my $line = <DATA>) { chomp $line; print "Input Line: '$line'\n"; $line =~ s/(\S)-/$1 -/g; # add space before minus if needed # otherwise don't add a space print "Modified Line: '$line'\n"; my @numbers = split ' ', $line; print "Numbers are @numbers\n\n"; } =prints: Input Line: '1.234 5.6789 -1.235-4' Modified Line: '1.234 5.6789 -1.235 -4' Numbers are 1.234 5.6789 -1.235 -4 Input Line: '1.234 5.6789-12.235-4' Modified Line: '1.234 5.6789 -12.235 -4' Numbers are 1.234 5.6789 -12.235 -4 =cut __DATA__ 1.234 5.6789 -1.235-4 1.234 5.6789-12.235-4
Re: match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file
by kcott (Archbishop) on Aug 18, 2017 at 09:51 UTC

    G'day fasoli,

    TMTOWTDI

    An array variable in a quoted (interpolated quotes) string will, by default, add spaces between the elements when interpolated. See 'perlvar: $"' for details.

    $ perl -E 'my @x = (1,2,3); say ">@x<"' >1 2 3<

    If you get your captures into an array, i.e. ($1,$2,$3), they too will output with spaces separating the elements.

    You can do it like this:

    $ perl -E 'my @x = "1.234 5.6789 -1.235" =~ /(-?\d+?\.?\d+)/g; say "@x +"' 1.234 5.6789 -1.235 $ perl -E 'my @x = "1.234 5.6789-12.235" =~ /(-?\d+?\.?\d+)/g; say "@x +"' 1.234 5.6789 -12 235

    Perl v5.26 introduced some new special variables to handle this sort of thing for you (see "perldelta: @{^CAPTURE}, %{^CAPTURE}, and %{^CAPTURE_ALL}"). Using @{^CAPTURE} removes the need for an intermediate variable. Here's a couple of quick examples with your posted data:

    $ perl -E '"1.234 5.6789 -1.235" =~ /^(-?\d+?\.?\d+)\s*(-?\d+?\.?\d+)\ +s*(-?\d+?\.?\d+)$/; say "@{^CAPTURE}"' 1.234 5.6789 -1.235 $ perl -E '"1.234 5.6789-12.235" =~ /^(-?\d+?\.?\d+)\s*(-?\d+?\.?\d+)\ +s*(-?\d+?\.?\d+)$/; say "@{^CAPTURE}"' 1.234 5.6789 -12.235

    — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1197553]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-24 09:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found