Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Regular Expression Doubt

by Anonymous Monk
on Apr 05, 2005 at 11:30 UTC ( #444937=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a variable contains like:

$text=' 1 &plus; 2 <maths> &plus; dfdf</maths> ';

I have to replace the &plus; to &thinsp;&plus;&thinsp; Except inside <maths> Tags.

Output Should be:

1&thinsp;&plus;&thinsp;2 <maths> &plus; dfdf</maths>

Comment on Regular Expression Doubt
Select or Download Code
Replies are listed 'Best First'.
"Question" vs "Doubt"
by merlyn (Sage) on Apr 05, 2005 at 14:44 UTC
    I suspect you are not a native English speaker. I'm pretty sure you don't mean "doubt" there, which means "I understand, but I do not agree". You almost certainly want "question", which means "I do not understand".

    I see this mistake frequently. I suspect it is because we can sometimes use "question" in place of "doubt", as in "I question the integrity of that bridge". But the opposite is never true.

    Just putting this node here so I can refer to it in the future. Thanks for the opportunity.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      No; the essential concept of doubt is uncertainty, not disagreement. See doubt @
        jdporter wrote on May 17, 2005
        No; the essential concept of doubt is uncertainty, not disagreement.

        I question whether this is accurate. I see that here are two components to the "idea" of doubt, even going only by the referenced definition(s):

        • uncertainty
        • belief

        The examples given at clearly indicate a strong tendency for doubt to be used with regards to matters of belief, judgement, speculation or opinion. What's a "matter of belief, judgement, speculation, or opinion"? I'll try to give two examples to illustrate.

        Hank: "Hey Ernest, did you hear that they've replaced that traffic signal at 5th and Main with a traffic-directing trained Indian elephant!?"

        Ernest: "I strongly question whether that's true, Hank."

        Ponder that verses this:

        Valerie: "Oh my, my! What's going to happen when every fact about everything in the world is available on the Internet?! We teachers will be out of work!"

        Yves: "I doubt that will happen in our lifetime."

        Does the difference emerge clearly into view? The traffic signal vs. elephant is rather easily verified by direct examination of the location or even by third-hand reportage by a reliable party (say, for the sake of argument, the evening news broadcast).

        The second case -- "everything anyone will want to know about anything will be published somewhere on the Internet (in the near future)" is far more conjectural, subjective, and vague. It's a fitting subject for doubt. Matters that pertain to developments in the future are inherently viewed by different people through a complex process that relates to many indeterminate factors like beliefs about the fate of humanity, the objective reality of a destined "progress" in human development over time, etc.

        Questions about a programming technology are matters of absence of understanding, of lack of possession of facts, of incomplete digestion of information, insufficient experience, etc. When one says "I doubt this or that" in matters of engineering or science, they generally mean that they don't accept as true a broad theoretical framework or hypothesis. It doesn't make sense to "doubt" that binary logic circuits work: it's demonstrated sufficiently (for any normal person) that they do. It's another thing to "doubt" the hypothesis that the present physical Universe began in a "Big Bang" -- one may have reasons to doubt that pertain to philosophical teachings about the nature of reality or to some competing scientific hypothesis; whatever the case, the matter is sufficiently large, far away and difficult that "doubt" is not an unreasonable thing.

        When non-native-English speakers come to Perlmonks and state that they "have doubts about perl (and this continues to happen often), most all native English speakers that don't have some extensive experience with non-native "mistakes" in English, are confused, and may interpret the meaning differently than "I question". When we talk about "doubt" in terms of this Perl technology and the community built around it, "doubt" takes on a very negative connotation that implies something like "I don't think Perl works" or "I think Perl is poor technology" or "I think Perl may be headed for technological extinction". Saying these things in a forum like this, especially when not accompanied by a hell of a good supporting argument, is the classic, sordid, pedestrian trolling that earns those who do it the dislike, distrust, anger, and rejection of its intended targets. That's why it is important to clarify why using this term is "incorrect" in this context, even if native English speakers are also often not especially correct in saying doubt instead of question in other contexts.

            Soren A / somian / perlspinr / Intrepid

        Words can be slippery, so consider who speaks as well as what is said; know as much as you can about the total context of the speaker's participation in a forum over time, before deciding that you fully comprehend the intention behind those words. If in doubt, ask for clarification before you 'flame'.

      Nice catch. Although I feel its worth pointing out that there a few rare cases where you can substitute doubt for question: "I have doubts that need to be resolved" vs "I have questions that need to be resolved". See definition 6.


      I suspect you are not a native English speaker. I'm pretty sure you don't mean "doubt" there, which means "I understand, but I do not agree". You almost certainly want "question", which means "I do not understand".
      Hehe, you're right: I'm not a native English speaker. And thank you for pointing out the difference about these two terms.

      However there's a quite a difference between the literal translation of "question" and the homologous term to "doubt" in my native language -i.e. Italian- so that I wanted to stress the nature of my "question" as being that of the latter term...

      In any case now that I know, I will stick to "question".

      I see this mistake frequently. I suspect it is because we can sometimes use "question" in place of "doubt", as in "I question the integrity of that bridge". But the opposite is never true.
      Curious, in Italian one can say the same of the opposite. Well, mostly.
      Just putting this node here so I can refer to it in the future. Thanks for the opportunity.
      Oh, it was just so easy for me! I bet I have more opportunities to give...

      I agree that choosing words that are mutually understandable is very important, but I think we might be overlooking a core issue: whose dialect of English is normative: UK, US, Indian, African, other.

      I grew up in the US and went to college there. I went to grad school in the UK and currently have family there. Growing up, about half my friends were Indian or Pakistani. My father did work for AID. My step-mother worked at World Bank and UN. My high school was a favorite for the children of foreign diplomats. I was born in Uganda. Though I left as an infant, Africans tend to appreciate that detail, so it has become the basis for friendships with people from various parts of Africa. I spent childhood summers in Southeast Asia (Korea, Thailand). And here in Israel, the "Anglo" community has people from places ranging from Australia to South Africa to the farthest reaches of Alaska. Getting used to all of these different versions (and accents) of English has been hard work for me, but I am loath to say that any of them are "wrong" - they all have large communities of mutual understanding.

      If, as a site, we want to publish a convention that Perl Monks uses the American (or UK, or whatever) dialect of English, that is fine by me. That's reasonable for the sake of mutual understandability.

      But telling anyone that they aren't speaking English correctly because their dialect isn't ours strikes me a bit well, um, arrogant? English grew and developed in the British Isles, so it has a claim to being the authoritative "source" for English. Yet I doubt many North Americans (myself included) would take kindly to a Brit telling them how to speak English: that "jelly" is the wrong word for the thing they eat with peanut butter; that "while" is a corruption of "whilst"; that they are being inconsistent because they say "in the hospital", but "in school".

      If Perl Monks has a standard dialect that is great, but perhaps we could be a bit more respectful to others if we called it a norm for our site rather than the "one right way"?

      Best, beth

        In an abstract sense virtually any dialect of English can be said to be correct. Yet virtually all of us accept that our colloquial dialects are sometimes wrong, and we accept that there is a "correct" way to say it even if we don't speak that way.

        Why? Because educated people are taught to speak "correctly". So speaking that way makes you sound more educated and intelligent. Which makes people respond better to you. (For instance they are more likely to give you a good job.)

        Admittedly there are actually multiple dialects associated with education. However as far as most of the world is concerned, only two really count. Those two are standard American English (as spoken on most American TV), and the Queen's English (as spoken on the BBC). Those have an undue impact on the speech at the top English speaking universities (who have taught more than their share of world leaders), news organizations and markets. Therefore those are the dialects of international affairs and business.

        Therefore it is reasonable to call something incorrect if it is incorrect according to both of those dialects. Because worldwide people will agree that it makes you sound uneducated. This is true no matter how common or well-established that speech pattern may be somewhere in the world.

        So in an international forum like this, using "doubt" where you mean "question" will cause people to think that you don't know English very well. Perhaps you live in India and everyone you know speaks that way. You still created a suboptimal impression. And this is not just true for this forum. This is going to be true in general.

        That said, I personally respect the fact that we have people here from all over the world, including people for whom English is a second or third language. If I believe effort was put out and I can understand what is meant, I will respond. We're here to talk about Perl, not English. However I still notice it. And I guarantee that others do as well.

        I am loathe to say

        You mean loath. ;-)

      It's a common Indianism, and synonymous with "question".

      "Randal, I have a doubt. What happened to your other 'l'?"

      Marvin Humphrey
      Rectangular Research ―
        indianism ???
Re: Regular Expression Doubt
by bart (Canon) on Apr 05, 2005 at 21:03 UTC
    My favourite idiom to tackle this kind of problems, is "replace stuff that shouldn't change, by itself", that's an easy way to simply skip past it. It is, in general, like this:
    s/($NOCHANGE)|$TOCHANGE/defined $1 ? $1 : $REPLACE /ge;

    If matching values for $1 cannot ever be false, this can also be written as

    s/($NOCHANGE)|$TOCHANGE/$1 || $REPLACE /ge;

    For your particular example, this becomes:

    $text =~ s<(<maths>.*?</maths>)|&plus;>{ $1 || '&thinsp;&plus;&thinsp; +' }ge;
Re: Regular Expression Doubt
by Jaap (Curate) on Apr 05, 2005 at 11:42 UTC
    What have you tried?
    Here's a possible solution:
    • Substitute <math>.+?</math> for AAAAAAAA, saving the math contents
    • Substitute the +
    • Substitute AAAAAAAAm for the math blocks
Re: Regular Expression Doubt
by borisz (Canon) on Apr 05, 2005 at 11:47 UTC
    my $text = ' 1 &plus; 2 <maths> &plus; dfdf</maths> '; my @t; my $x = -1; local *_ = \$text; s!(<maths>.*?</maths>)!$x++;push@t, $1;"##XAHJASH $x##"!egs; s/&plus;/&thinsp;&plus;&thinsp;/g; s/##XAHJASH (\d+)##/$t[$1]/ge; print
Re: Regular Expression Doubt
by Anonymous Monk on Apr 05, 2005 at 14:25 UTC
    $text =~ s {( # Capture in $1 <maths> # Tag [^<]* # Not a <, zero or more times (?: # Group, non-capture. < # <, (?!/maths>) # not followed by /maths> [^<]* # Not a <, zero or more times )* # Repeat. </maths> # End tag. ) # End capture $1. | # Or &plus; # Target. } {$1 || "&thinsp;&plus;&thinsp;"}exg;
Re: Regular Expression Doubt
by sh1tn (Priest) on Apr 05, 2005 at 12:26 UTC
    $text = '1 &plus; 2 <maths> &plus; dfdf</maths>'; for( split '\s', $text ){ if(/<maths>/.../<\/maths>/){ next }else{ if(/&plus;/){ $text =~ s/$_/&thinsp;&plus;&thinsp;/ } } } #STDOUT: 1 &thinsp;&plus;&thinsp; 2 <maths> &plus; dfdf</maths>

      You seem to be arbitrarily splitting on spaces. This will break unless there's a clear space either side of the +. A better option would be as follows.
      my $text = '1&plus;2<maths> &plus; dfdf</maths>'; my $output = ""; while ($text =~ m/((<maths>.*?<\/maths>)|([^<]*))/gs) { if ($2) { $output .= $2; } else { my $segment = $3; $segment =~ s/&plus;/&thinsp;&plus;&thinsp;/g; $output .= $segment; } } print $output . "\n";
      The basic premise being to scoop up and ignore (push onto output) anything in <maths>, or scope up as much that can easily be determined not to be in <maths> (ie no angle brackets) and parse that before putting it on the output.
        my $text = '1 &plus; 2 <br> <maths> &plus; dfdf</maths>'; my $output = ""; while ($text =~ m/((<maths>.*?<\/maths>)|([^<]*))/gs) { if ($2) { $output .= $2; } else { my $segment = $3; $segment =~ s/&plus;/&thinsp;&plus;&thinsp;/g; $output .= $segment; } } print $output . "\n"; __END__ 1 &thinsp;&plus;&thinsp; 2 br> <maths> &plus; dfdf</maths>
        You lost a < here.
        # in rare cases with no spaces: ... $text =~ s/(?!\s)(<maths>)/ $1/g; ...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://444937]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2016-02-14 21:25 GMT
Find Nodes?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?

    Results (471 votes), past polls