rickoy has asked for the wisdom of the Perl Monks concerning the following question:
I have a string:
2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1
As you can see, each character in the string are separated by 1 space and the date and time are separated by 2 spaces. I would wish to reduce the spaces so that if there are 2 spaces in between characters, it will become 1 space and if there is 1 space in between characters, then it will be gone.
Re: Removing extra spaces
by davido (Cardinal) on Jul 31, 2012 at 02:26 UTC
|
s/\s(\s?)/$1/g
Match a single space, and optionally a second space. Capture that second space if it exists. Replace with the capture, which will be either nothing, or the second space.
| [reply] [d/l] |
Re: Removing extra spaces
by Rudolf (Pilgrim) on Jul 31, 2012 at 01:56 UTC
|
Being lazy, I would abuse the power of regex's and say:
my $string = '2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1';
$string =~ s/ /x/g;
$string =~ s/ //g;
$string =~ s/x/ /g;
print $string;
just did it out in steps.. since you want to remove all the spaces I put a spot holder where all the double spaces are supposed to be, then later replaced the 'x' with ' '. perhaps give tr/// a look, that switches out sets but I'm not sure how to switch out spaces with it. | [reply] [d/l] |
Re: Removing extra spaces
by johngg (Canon) on Jul 31, 2012 at 09:16 UTC
|
You could use a negative look-ahead to replace any space that is not followed by a space with nothing. This will break down if there are more than two spaces though.
knoppix@Microknoppix:~$ perl -E '
> $dateStr = q{ 2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1 };
> $dateStr =~ s{\s(?!\s)}{}g;
> say $dateStr;'
2012-7-27 9:37:31
knoppix@Microknoppix:~$
| [reply] [d/l] |
Re: Removing extra spaces
by NetWallah (Canon) on Jul 31, 2012 at 01:28 UTC
|
s/\s\s?(\S)/$1/g
Update: See the correction below. Thanks Anonymonk and davido.
I hope life isn't a big joke, because I don't get it.
-SNL
| [reply] [d/l] |
|
$ perl -E '$s="2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1"; $s =~ s/\s\s?(\S)/$1
+/g; say $s'
2012-7-279:37:31
$ perl -E '$s="2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1"; $s =~ s/\s(\S)/$1/g;
+ say $s'
2012-7-27 9:37:31
| [reply] [d/l] [select] |
Re: Removing extra spaces
by Athanasius (Archbishop) on Jul 31, 2012 at 02:09 UTC
|
Update: rickoy, welcome to the Monastery!
The specification is a little unclear, but assuming you want to (a) remove all single spaces, and (b) squash all sequences of 2 or more spaces down to a single space:
#! perl
use strict;
use warnings;
my $string = ' 2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1 ';
# NB: 2 spaces here ^^
# (a) Remove single spaces
1 while $string =~ s/(^|[^ ])[ ]([^ ]|$)/$1$2/g;
# (b) Squash multiple spaces down to one
$string =~ s/[ ]{2,}/ /g;
print "'", $string, "'\n";
Outputs:
'2012-7-27 9:37:31'
HTH,
Athanasius <°(((>< contra mundum
| [reply] [d/l] [select] |
Re: Removing extra spaces
by GrandFather (Saint) on Aug 02, 2012 at 01:47 UTC
|
Where did your string come from? Strangeness of that sort looks like 16 bit Unicode strings or some such imported in some odd fashion into Perl where the high 0 byte (for an ASCII character) has been replaced by a space. Maybe you would be better to get the conversion right if possible rather than try to fix it up later?
True laziness is hard work
| [reply] |
Re: Removing extra spaces
by harangzsolt33 (Chaplain) on Aug 25, 2019 at 05:32 UTC
|
I know, this question was asked more than 7 years ago, but I would
like to post a sub that I wrote that does exactly what you want:
sub CollapseWhitespace{@_ or return'';my$T=shift;defined$T
or return'';my$L=length($T);$L or return'';my$c;my$N=0;my$P
=0;my$U=1;for(my$i=0;$i<$L;$i++){$c=vec($T,$i,8);if($c<33){
$U=0;if($N++==1){vec($T,$P++,8)=32;}}else{$N=0;$U or vec($T
,$P,8)=$c;$P++;}}return$U?$T:substr($T,0,$P);}
^^ This looks a bit obfuscated, so here is a nicer expanded version:
##############################################################
#
# This function removes single instances of whitespace and
# converts multiple adjacent whitespace characters to a single
# space. In this function, "whitespace" is defined as a character
# whose ASCII value is less than 33. (This includes many special
# characters such as new line characters, nul, bel, etc.)
#
# Usage: STRING = CollapseWhitespace(STRING)
#
# Example:
# CollapseWhitespace("\n\t abc 123 xxx\n") --> " abc123 xxx"
#
sub CollapseWhitespace
{
@_ or return '';
my $T = shift;
defined $T or return '';
my $L = length($T);
$L or return '';
my $c;
my $N = 0; # consecutive whitespace counter
my $P = 0; # target pointer to overwrite original str $T
my $U = 1; # string length will be left unchanged
for (my $i = 0; $i < $L; $i++)
{
$c = vec($T, $i, 8);
if ($c < 33)
{
$U = 0;
if ($N++ == 1) { vec($T, $P++, 8) = 32; }
}
else
{
$N = 0;
$U or vec($T, $P, 8) = $c;
$P++;
}
}
return $U ? $T : substr($T, 0, $P);
}
| [reply] [d/l] [select] |
|
c:\@Work\Perl\monks>perl -wMstrict -le
"use warnings;
use strict;
;;
use Test::More 'no_plan';
use Test::NoWarnings;
;;
use Data::Dump qw(pp);
;;
note qq{perl version: $]};
;;
my @TESTS = (
[ undef , qq{} ],
[ qq{} , qq{} ],
[ qq{ } , qq{} ],
[ qq{\n} , qq{} ],
[ qq{\n\t} , qq{ } ],
[ qq{\n\t\x00} , qq{ } ],
[ qq{\n\t \x00} , qq{ } ],
[ qq{\n\t abc 123 xxx\n} , qq{ abc123 xxx} ],
[ qq{\nabc 123\a\b\fxxx\n\t }, qq{abc123 xxx } ],
[ qq{abc 123\n\r xxx} , qq{abc123 xxx} ],
);
;;
note 'special case';
is CollapseWhitespace(), '', 'no arguments';
;;
note 'general cases';
VECTOR:
for my $ar_vector (@TESTS) {
if (not ref $ar_vector) {
note $ar_vector;
next VECTOR;
}
;;
my ($str, $expected) = @$ar_vector;
;;
is CollapseWhitespace($str), $expected,
pp($str) . ' -> ' . pp($expected)
;
}
;;
done_testing;
;;
exit;
;;
sub CollapseWhitespace {
my $s = shift;
return '' unless defined $s;
$s =~ s{ [\x00-\x20]+ }{ $+[0] - $-[0] == 1 ? '' : ' ' }xmsge;
return $s;
}
"
# perl version: 5.008009
# special case
ok 1 - no arguments
# general cases
ok 2 - undef -> ""
ok 3 - "" -> ""
ok 4 - " " -> ""
ok 5 - "\n" -> ""
ok 6 - "\n\t" -> " "
ok 7 - "\n\t\0" -> " "
ok 8 - "\n\t \0" -> " "
ok 9 - "\n\t abc 123 xxx\n" -> " abc123 xxx"
ok 10 - "\nabc 123\a\b\fxxx\n\t " -> "abc123 xxx "
ok 11 - "abc 123\n\r xxx" -> "abc123 xxx"
1..11
ok 12 - no warnings
1..12
If you have Perl version 5.14+, a slightly conciserer variation is:
sub CollapseWhitespace {
my $s = shift;
return defined $s
? $s =~ s{ [\x00-\x20]+ }{ $+[0] - $-[0] == 1 ? '' : ' ' }xmsger
: ''
;
}
See the s/// /r modifier in perlop. I leave it to you to Benchmark whether the s///e version is actually faster than the for-loop version.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|