Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
go ahead... be a heretic
 
PerlMonks  

global whitespace delete

by physi (Friar)
on Jul 31, 2001 at 07:16 UTC ( [id://101086]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

physi has asked for the wisdom of the Perl Monks concerning the following question:

I currently got a problem with the substitution of whitespaces.
On a single line I want to reduce more then one whitespace to exactly one .
 s/\s+/ /g is doing this, but there is a little trap :)
The substitution should not be made, if the whitespaces are inside a ' ' or " " block.
Can anybody help with that, or is this not possible in a single substitute line ?
Any help is welcome.
----------------------------------- --the good, the bad and the physi-- -----------------------------------

Replies are listed 'Best First'.
Re: Global Whitespace Delete
by tadman (Prior) on Jul 31, 2001 at 07:41 UTC
    You can do this, but the issues stem from how you parse the quotes, and how they are delimited, especially with respect to embedded quotes. In some cases you have formats which encode a quote like:
    " \" " " "" "
    Some DB scripting languages have a truly horrific way of doing it, but the basics are the same. In terms of a regex, you are looking for a quote, zero or more non-quote or delimited quote characters, and the terminating quote. You can easily change the delimited quote character bit to suit your fancy.

    Here is my rather unruly specimen:
    s/((?:"(?:\\"|[^"])*?")|(?:'(?:\\'|[^'])*?'))|(\s+)/$2?" ":$1/ge;
    Here is what it did to my test data:
    A language by "any other \"name\"", would it smell as sweet? A language by "any other \"name\"", would it smell as sweet?
    If you weren't concerned about delimited quotes, as HTML has no such thing, really, then you could use a simplified version of same:
    s/((?:"[^"]*?")|(?:'[^']*?'))|(\s+)/$2?" ":$1/ge;
      that's really fantastic, many thanks. I have modified it to:
      s/((?:"[^"]*?")|(?:'[^']*?'))|(\s+)/$2?$2 ne"\n"?" ":"\n":$1/ge;
      so that the "\n" on the end of the line isn't changed to ' '.
      And now I will read more about ?: and try to understand it. :-)
      btw. It was no homework as BrentDax suspects, it's just a single line in a convertscript for comfiche jobs.
      Thanks
      ----------------------------------- --the good, the bad and the physi-- -----------------------------------
        I'm not sure where BrentDax got the homework idea. A little quick to judge, is all, I suppose.

        Your comparison is peculiar. You might want to specify a set instead of \s+, such as:
        s/((?:"[^"]*?")|(?:'[^']*?'))|([ \t]+)/$2?" ":$1/ge;
        The set of space and tab is probably more efficient than asking for more than you want, and then discarding the extras. \s by default contains tab, space, and newline. Since you have no use for newline, just don't ask for it.
Re: global whitespace delete
by BrentDax (Hermit) on Jul 31, 2001 at 07:46 UTC
    This smells like homework.

    Nevertheless, I'll point you in the right direction. Try recognizing the quoted strings *before* you reduce the whitespace. (If you can use modules, Text::Balanced will come in handy; if not, just do it yourself, ya lazy bum.) I'm not aware of any way to do this in one pass; however, it's nearly 1am here, so cleverness is in short supply. :^)

    =cut
    --Brent Dax
    There is no sig.

Re: global whitespace delete
by bwana147 (Pilgrim) on Jul 31, 2001 at 08:25 UTC

    How about Text::ParseWords?

    I haven't tested it but you may use this module to split your strings on whitespace, while still ignoring those that are quoted. Then join the words together:

    use Text::ParseWords; @words = quotewords('\s+', 1, $text); $text = join ' ' => @words;

    --bwana147

    Update: finally, I tested it and it works!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://101086]
Approved by root
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.