abhy has asked for the wisdom of the Perl Monks concerning the following question:

I have a code snippet which works as desired but it does not look elegant to me. Can someone please suggest a simpler and shorter way of doing it? I am not bothered about performance as this is a one time task on a relatively small data set.

Requirement: Original String => converted String Basically what I am trying to do is strings that have the format of alphabet followed by space (multiple occurrences of this pattern) followed by an optional string should be converted to the format where in the group of alphabets at the begining should be stringified. If the alphabet pattern group occurs at the end this should not happen. Here is my code snippet:
use strict; use warnings; my @str = ("j k l foobar", "foobar", "jkl foobar", "1 2 3", + "jk l foobar", "foobar j k l", "foobar j kl", " ", " ", "j + jk foobar", "j k jk foobar", "j k l"); my @sanitisedNames = (); for(@str) { $_ =~ s/\s+/ /g; if ($_ =~ /^\s$/) { next; } my $boundary = &sanitise($_); my $sanitisedName; if ($boundary == 0) { $sanitisedName = $_; } elsif ($boundary == length($_)) { $_ =~ s/\s+//g; $sanitisedName = $_; } else { my $firstPart = substr($_, 0, $boundary); $firstPart =~ s/\s+//g; my $secondPart = substr($_, $boundary); $sanitisedName = $firstPart.' '.$secondPart; } push(@sanitisedNames, $sanitisedName); } print $_, "\n" for (@sanitisedNames); sub sanitise { my $str = shift; my @chars = split('', $str); my $count = 0; my $len = length($str); while ($count < $len) { if ($chars[$count++] ne ' ' && $count < $len && $chars[$count+ ++] eq ' ') { } else { if ($count == $len) { return $len; } if ($count > 3) { $count = $count - 3; return $count; } else { return 0; } } } }
Also can this be done in a single line with a regex? I could not come up with one. So I am coming to the abode of the monks for wisdom :).

Replies are listed 'Best First'.
Re: String manipulation
by GrandFather (Saint) on Jun 24, 2009 at 10:59 UTC

    Something like:

    use strict; use warnings; my @strs = ( "j k l foobar", "foobar", "jkl foobar", "1 2 3", "jk l foobar", "foobar j k l", "foobar j kl", " ", " ", "j jk foobar", "j k jk foobar", "j k l" ); my @sanitisedNames = (); for my $str (@strs) { my $oStr = $str; $str =~ s/\G([a-z])\s+(?!\w{2})/$1/gi; $str =~ s/\s+/ /g; push @sanitisedNames, [$oStr, $str]; } printf "%-25s >%s<\n", @$_ for (@sanitisedNames);

    Prints:

    j k l foobar >jkl foobar< foobar >foobar< jkl foobar >jkl foobar< 1 2 3 >1 2 3< jk l foobar >jk l foobar< foobar j k l >foobar j k l< foobar j kl >foobar j kl< > < > < j jk foobar >j jk foobar< j k jk foobar >jk jk foobar< j k l >jkl<

    True laziness is hard work
Re: String manipulation
by citromatik (Curate) on Jun 24, 2009 at 10:08 UTC

    Something like this may do the job:

    use strict; use warnings; use feature qw/ :5.10 /; my @str = ("j k l foobar", "foobar", "jkl foobar", "1 2 3", + "jk l foobar", "foobar j k l", "foobar j kl", " ", " ", "j + jk foobar", "j k jk foobar", "j k l"); my $rx = qr/^((?:\w(?:\s+|\z))*)(.*)/; for my $str (@str){ my ($i,$s) = $str =~ /$rx/; $i =~ s/\s+//g; say $i ? join " ",$i,$s : $s; }

    Outputs:

    jkl foobar foobar jkl foobar 123 jk l foobar foobar j k l foobar j kl j jk foobar jk jk foobar jkl

    citromatik

      If I am not asking for too much, can you please expand the regex with comments?

        Sure:

        my $rx = qr/ ^ # At the beginning of the string ( # Capture in $1 (?: # Paren for grouping (don't capture in $2) \w # match a "wordy" character (?: # paren for grouping, \s+|\z # match at least one space or the end of the + string ) )* # zero or more times ) # end of capture $1 (.*) # and match what comes after storing it in $ +2 /x # allow these comments ;

        The rationale is to match at the beginning all the one-character strings, storing them in $1. The rest of the input will be catched in $2

        citromatik

        abhy,
        If you don't know about YAPE::Regex::Explain, it is worth a look. I don't think it has been updated to take into consideration the new regex doodads in Perl 5.10 but it should be helpful regardless.

        Cheers - L~R