Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: regex to remove space

by Don Coyote (Hermit)
on Dec 16, 2015 at 08:58 UTC ( [id://1150470]=note: print w/replies, xml ) Need Help??


in reply to regex to remove space

Hello perlUser345

Update: unclear question, on re-read is ambiguous which spaces op wants to keep, the start of line sentence, or the spaces in lines which start with spaces. Should have listened to first responder.

All sentences start with a space. Did you mean each new-line?

A sentence ends, in English_GB, with a full-stop (in US-English that will be a period). This is important, as a sentence may continue through a new-line. This is then followed by a double-space in formal English, and a starting Upper-case letter. Though informally and more usually, in modern print, there is only a single-space used. Also a sentence at the start of a Paragraph invariably begins with an indent or tab.

Your example portends you meant a non-printing character at the start of a new line. Examples given help greatly with clarification of what code you have tried and why it is not working.

Firstly you want to determine if a sentence/new-line starts with a space, or non-printing character

A non-printing character at the start of a new-line

my $n = # choose your rgx number # skip a new-line beginning with a literal space. my $rgx1 = qr/^ /; # skip a new-line beginning with a non-printing character # \s = space, \t = tab, \r = carriage-return, \n = new-line my $rgx2 = qr/^\s/;

A new sentence beginning with a non-printing character. For example, if you want to reduce double-space starting sentences to single-spaces.

# $rgx3 is untested, but provided for example. # see note below my $rgx3 = qr/\G # Global Anchor (?<=\.\ ) # Pattern for start of sentence ( # $1 (\ +)? # Extra leading spaces ? (.*?\.) # $2 - the sentence )/x; # end $1 and regex

In the first construct we only need to discard lines starting with spaces, and apply the comma substitution to the remaining lines.

my $regex = $n == 1 ? $rgx1 : $n==2 ? $rgx2 : $rgx3 ; unless( $n == 3 ){ while(<INPUT>){ next if $regex; tr* *,* ; # replace all spaces with a comma tr* *,*s ; # replace all squeezed spaces with a comma next; } }else{ die 'regex number choice fail' unless $n == 3; #slurp in input for multi-line parse my $input = do { local $/ = undef; <INPUT> }; # apply substitution operator with regex to string; $input =~ s/$rgx3/ $1 =~ /^\s/ ? $2 =~ tr* *,* : $2 /gme; # each sentence substituted, if starts with extra # leading spaces, substitutes commas. # else substitues existing sentence. # untested. thx to Cristoforo for idea of tr in e }

Otherwise, we need to apply the substitution throughout. The trick here is to use multi-line flag (/m) to the operator, and then globally find the start of sentences with the global operator flag (/g). Thanks to Cristoforo for the idea of using the evaluating tr in the substitution.

However, now we need to globally substitute on the sentences which start with double-space. The biggest problem facing us here, is to define which cases constitute an end of sentence correctly, as a full-stop can be used for more things than just ending a sentence...

Note: see update, re: unclear question.

While the regex by Cristoforo comes close, the example output as you can see, still substitutes spaces for commas, whether the line begins with a space or not. Somehow we need to get the end of sentence marker. Taking a simplistic view of a sentence, starting with a space and ending with a full-stop, the regex here (should) be close to working correctly. Untested though. I will review later.

At this point, time for detail scrutiny is required. I might start looking for a module to handle such matters, and/or delve into regex anchors such as (G,K). For now I will presume you meant per line substitution. And that the first implementation is what you are after. (and that is why you need to provide examples, to get a desirable answer!)

Untested code. Of importance is structure and approach.


my $Don_Coyote = select(undef,undef,undef,BigInt);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1150470]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-20 00:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found