fasoli has asked for the wisdom of the Perl Monks concerning the following question:
Hi Wise Monks!
I've been really confused about a problem I'm having with a file. It's a text file, with 4 columns, and with a few thousand lines. The contents are numbers that look like this
1.234 5.6789 -1.235
Those files occur as outputs from a software. The problem is that in some cases the contents look like this
1.234 5.6789-12.235
*notice the number of the last column: because now there are 2 numbers before the decimal point, the number gets stuck on the second column.
Naturally now I'm having trouble plotting this file. So I'm trying to match strings where there is a digit, followed by a hyphen, followed by another digit, and then I want to replace this -hopefully correctly- with an added whitespace so that the numbers are correct.
I'm trying this and the regex match works, it does print the problematic bits:.
#!/usr/bin/perl
use warnings;
use strict;
my $test;
open my $INPUT, '<', "file.txt" or die $!;
while (<$INPUT>) {
chomp $_;
if ($_=~/(\d)(-)(\d)/) {
print "$1$2$3 \n";
}
}
But now I'm stuck: how do I complete the replace action? And how do I print the new contents of the file? I haven't succeeded in anything more than compilation errors. In terms of replacing, I've tried this
if ($_=~s/(\d)(-)(\d)/(\d) (-)(\d)/) {
(supposedly telling the script to add spaces between the digit before the hyphen and the hyphen itself)
but I get this error
Unrecognized escape \d passed through
Then I tried it with the $1$2$3 but again it was wrong. Can you give me any hints about how to make the replace function work?? Thank you so much!
Re: match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file
by choroba (Cardinal) on Aug 17, 2017 at 12:55 UTC
|
If you just want to insert a space between a digit and a minus sign:
while (<>) {
s/(\d)-/$1 -/g;
print;
}
Are you sure the output isn't in a fixed-width format?
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
|
No, it's not fixed width - these are supposed to be atomic coordinates and sometimes they're just printed wrong. I hope I understood your question although not sure...
Brilliant, this was so easy! Well - it seems like it works, but need to use printf to be able to read the file to make sure, as it's very long.
What am I doing wrong with printf here? It doesn't seem to make any difference. Am I using $_ wrong?
#!/usr/bin/perl
use warnings;
use strict;
my $test;
my $output;
open my $INPUT, '<', "file.txt" or die $!;
while (<$INPUT>) {
chomp $_;
s/(\d)-/$1 -/g;
open $output, '>>', "test";
printf $output ("%10s \n", "$_");
}
| [reply] [d/l] |
|
> need to use printf
Why do you need it?
Also, don't open the output file in each iteration of the loop. Just open it once before the loop starts.
Moreover, $_ and "$_" makes no difference in this case. Drop the useless quotes.
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
Re: match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file (updated x2)
by haukex (Archbishop) on Aug 17, 2017 at 13:11 UTC
|
use warnings;
use strict;
use Regexp::Common qw/number/;
use Data::Dump; #Debug
while (<DATA>) {
/^($RE{num}{decimal}\s*){4}$/ or die;
my @cols = /$RE{num}{decimal}/g;
dd @cols; #Debug
}
__DATA__
1.234 5.6789 -1.235-4
1.234 5.6789-12.235-4
Which outputs:
(1.234, 5.6789, -1.235, -4)
(1.234, 5.6789, -12.235, -4)
For an approach that is probably overkill for parsing something like this, see Re: How to split a non even number of string elements into a hash [RESOLVED].
Update: If you want to just rewrite the file, you can use the following oneliner, but note this will discard anything that doesn't match the regex! If you want to add validity checking, you could add /^($RE{num}{decimal}\s*){4}$/ or die; (which I have now added to the above example to make it more robust, although it's a little less efficient now since it matches each line twice).
$ perl -MRegexp::Common=number -nle '$,=" ";print/$RE{num}{decimal}/g'
+ input.txt >output.txt
Update 2: TIMTOWTDI, this inserts a space between two numbers that previously did not have a space between them (and it's a bit safer because inserting spaces should be all it does, instead of rewriting the entire lines):
$ perl -MRegexp::Common=number -pe 's/(?>$RE{num}{decimal})\K(?=$RE{nu
+m}{decimal})/ /g' input.txt >output.txt
Sorry, this post also had a few ninja edits. OP msg'd. | [reply] [d/l] [select] |
|
This is brilliant! However the output now needs further formatting as it's printed with brackets etc. I tried adding a print statement but dd still prints everything. A bit confusing :(
| [reply] [d/l] |
|
However the output now needs further formatting as it's printed with brackets etc. I tried adding a print statement but dd still prints everything.
You can print @cols any way you want, using a regular print or printf, instead of Data::Dump, which is just a debugging aid. If you could show what code you are trying and what output format you expect, we could help better (How do I post a question effectively?). Also, note I updated my node right around the time you replied, check out the update too.
| [reply] [d/l] [select] |
Re: match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file
by Marshall (Canon) on Aug 18, 2017 at 02:24 UTC
|
Sometimes just modifying the input line with a very simple expression works out fine. In this case, apparently sometimes the "-", minus sign is sometimes not preceded by a space. A simple idea is to fix that in the input line (if it occurs). The code below does not modify the input line unless there is a negative number which is not preceded by a space.
Once a simple space separated token rule has been enforced, then a simple split statement suffices. I don't know if this is true in this case, but often running a couple of simple regex/split statements can execute faster than a single complicated regex.
use warnings;
use strict;
while (my $line = <DATA>)
{
chomp $line;
print "Input Line: '$line'\n";
$line =~ s/(\S)-/$1 -/g; # add space before minus if needed
# otherwise don't add a space
print "Modified Line: '$line'\n";
my @numbers = split ' ', $line;
print "Numbers are @numbers\n\n";
}
=prints:
Input Line: '1.234 5.6789 -1.235-4'
Modified Line: '1.234 5.6789 -1.235 -4'
Numbers are 1.234 5.6789 -1.235 -4
Input Line: '1.234 5.6789-12.235-4'
Modified Line: '1.234 5.6789 -12.235 -4'
Numbers are 1.234 5.6789 -12.235 -4
=cut
__DATA__
1.234 5.6789 -1.235-4
1.234 5.6789-12.235-4
| [reply] [d/l] |
Re: match digit, followed by hyphen, followed again by digit - then add whitespaces and replace in file
by kcott (Archbishop) on Aug 18, 2017 at 09:51 UTC
|
G'day fasoli,
TMTOWTDI
An array variable in a quoted (interpolated quotes) string will, by default,
add spaces between the elements when interpolated.
See 'perlvar: $"' for details.
$ perl -E 'my @x = (1,2,3); say ">@x<"'
>1 2 3<
If you get your captures into an array, i.e. ($1,$2,$3),
they too will output with spaces separating the elements.
You can do it like this:
$ perl -E 'my @x = "1.234 5.6789 -1.235" =~ /(-?\d+?\.?\d+)/g; say "@x
+"'
1.234 5.6789 -1.235
$ perl -E 'my @x = "1.234 5.6789-12.235" =~ /(-?\d+?\.?\d+)/g; say "@x
+"'
1.234 5.6789 -12 235
Perl v5.26 introduced some new special variables to handle this sort of thing for you
(see "perldelta: @{^CAPTURE}, %{^CAPTURE}, and %{^CAPTURE_ALL}").
Using @{^CAPTURE} removes the need for an intermediate variable.
Here's a couple of quick examples with your posted data:
$ perl -E '"1.234 5.6789 -1.235" =~ /^(-?\d+?\.?\d+)\s*(-?\d+?\.?\d+)\
+s*(-?\d+?\.?\d+)$/; say "@{^CAPTURE}"'
1.234 5.6789 -1.235
$ perl -E '"1.234 5.6789-12.235" =~ /^(-?\d+?\.?\d+)\s*(-?\d+?\.?\d+)\
+s*(-?\d+?\.?\d+)$/; say "@{^CAPTURE}"'
1.234 5.6789 -12.235
| [reply] [d/l] [select] |
|
|