Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

A few random questions from Learning Perl 3

by sulfericacid (Deacon)
on Jan 06, 2003 at 04:33 UTC ( #224537=perlquestion: print w/ replies, xml ) Need Help??
sulfericacid has asked for the wisdom of the Perl Monks concerning the following question:

I decided to reread Learning Perl 3 for the 2nd time, I am still new to perl but after working with it before rereading the book makes it a lot easier to understand. Most of the first 10 chapters are pretty self explanatory now, yay!! But I came across a few misc. questions:
1) What is the purpose of a naked block? All it says is it's like while in some way but I don't see the real purpose of having one.
2) I read a post a few days ago about someone saying something like "shame on you for using regex like that". I just finally understand what regex is and that statement doesn't make much sense to me. Regex is like patterns and substitutions, are there specific times when you're not supposed to use them or there is something better to use?
3) next brings things to the bottom iteration of a loop, right? Does that mean you can't call next more than once in any given loop? If not, what happens if you need to?

Thanks for your help everyone!

"Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

sulfericacid

Comment on A few random questions from Learning Perl 3
Re: A few random questions from Learning Perl 3
by Anonymous Monk on Jan 06, 2003 at 04:44 UTC
    I read a post a few days ago about someone saying something like "shame on you for using regex like that". I just finally understand what regex is and that statement doesn't make much sense to me. Regex is like patterns and substitutions, are there specific times when you're not supposed to use them or there is something better to use?

    Yeah, you hear this crap sometimes, especially when you are trying to parse HTML. Basically, there are modules to handle the parsing, and these modules (e.g. HTML::Parse) tend to be a very reliable and are a easy way to get the job done. HOWEVER, there is nothing wrong with using a regex. After all, Perl is optimized for regex. Many times using a regex will result in faster code than using a module. HOWEVER, most people do not fully understand regex, so they often overlook something. Basically, I love regex. Use them! Even for parsing HTML! Just make sure you know what you are doing! And have fun! But if your job is on the line, go ahead and use a module.

      It might be useful to read up a bit on the theory of formal languages. You'll see that there's a whole family of languages, each described by a certain mathematical formalism. Regular languages are an example, and as you can guess they're described by regular expressions. Unfortunately, HTML is not a regular language and hence can not be described by regular expressions since they're just not powerful enough.

      By way of example, consider <em>hello beautiful HTML <em>world</em></em>: easy to write a regular expression to get the inner "world", isn't it? Now consider <em>hello <em>beautiful<em>HTML world</em></em></em>, if you want to match something, again you can write a regular expression... as long as you know the maximum number of times the <em>...</em> tags will be embedded.

      HTML allows unbounded nesting of tags, so this means that you can't write a general regular expression that describes every possible nesting situation. Regular expression are simply not powerful enough, you'll need at least context free languages, hence a tool such as HTML::Parser or for general cases something like Parse::RecDescent.

      Now you can argue:

      1. yeah right, but real world HTML is not that complicated, or
      2. you can fiddle with embedded code and cuts in regular expressions.
      As to the first argument: you don't always know this in advance if you don't control the HTML generation yourself, people are bound to do weird things, mostly not even on purpose.
      As to the second argument: true, but these are still experimental features (as the docs specify for 5.6.1) and they're not at all obvious to use, even up to the point that it is easier to use a more powerful tool than get the particular regular expression right. (Note from a formal language theory point of view: embedded code, cuts and the like increase Perl "regular expressions" beyond regular languages.)

      Given this story, your claim that one can deal with all problems HTML by using regular expressions shows some unwarranted optimism on your part. Obviously there's no reason to believe me, so I'll suggest a number of references on the subject:

      And who knows, maybe our own mstone will write a MOPT on the subject one of these days? (Hint, hint ;-)

      Just my 2 cents, -gjb-

      Update: Thanks TheHobbit for reiterating the points I actually mention in my text if you bother to read it carefully. (?{...}) and /e are called code embedding.

        Hi,
        I'll add some considerations which looks needed. This will also be an answer to the 'Anonymous' below, who thinks he or she can hide and insult people without even disturbing him ore herself to register into the community...

        Stricly speaking, Perl regex are realy much more powerfull than those described in the wonderfull books you refer to. To understand regex as they are used in perl (but also in other langages & tools) I'd rathere refer to

        A basic thing that one always see written about regex is that the can not count. Meaning that you must know the maximum number of times the <em>...</em> will be embedded..

        While this is true of 'standard' regex, this is not true for Perl regex. By using carefull combination of the /e modifier and of the (?{}) programmatic pattern you can do using regex, everithing a parser will do.

        IMHO, using a regex or another approach is a matter of taste, and a careful crafted and optimized regex will be more efficent than a sloppy written rec descent parser.

        Just my 5 (euro) cents.

        Cheers


        Leo TheHobbit

        You're right, and you're wrong... I'm fairly certian that while ordinary regular expressions aren't up to parsing HTML, even on a theorical basis. Perl regular expressions are a whole 'nother breed. Regular expressions with backreferences are NP-complete; it's been proven at least twice. (Well, three times, but one of them is buggy.) I suspect I'm missing somthing here... if anybody knows what (other then my mind), I'd love to hear it.


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      If you still think that regular expressions are appropriate for parsing HTML then I guarantee that you don't understand regular expressions as well as you think that you do, and you have written some really crappy parsing code.
Re: A few random questions from Learning Perl 3
by pfaut (Priest) on Jan 06, 2003 at 04:48 UTC

    By a naked block, I think you mean a block of code surrounded by braces that stands by itself and isn't part of an if, while, do, etc. You use these to limit scope. You can declare lexical variables inside that block. Once you reach the end of the block, those variables go away. You can exit the block with next.

    There are certain things to use regex's for and certain things that are easier in other ways. There are also some things that you should avoid within regex's like .* (see Death to Dot Star! for the gorey details).

    next brings you to bottom of an iteration loop. It does not bring you to the bottom iteration of the loop. It kind of jumps to the closing brace. Actually, it goes to the continue action of the loop. You can have more than one next in your loop but only one of them can execuate on any given iteration.

    --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';
Re: A few random questions from Learning Perl 3
by Paladin (Priest) on Jan 06, 2003 at 04:48 UTC
    Congrats on your continuted progress to Sainthood. :)

    1) A naked block can be used for various things. One of the more common is to limit the scope of variables. ie:

    { my $tempvar = somefunc(); # do something with $tempvar } # $tempvar no longer exists here.
    or it's used to limit the scope of local()
    { local(*INPUT, $/); open (INPUT, $file) || die "can't open $file: $!"; $var = <INPUT>; } # $/ is set back to it's original value here.

    2) Without knowing what you were doing with the regex, it is hard to say how it may have been wrong.

    3) next goes to the next interation of the loop, not the last one. In effect it skips the rest of the current iteration (although it does execute the continue block if you have one). One common use is for skipping certian lines while reading a file.

    while (<FH>) { next if /^#/; # Skip comments # now do something with $_ }
    Update: Fix explaination of next
Re: A few random questions from Learning Perl 3
by blokhead (Monsignor) on Jan 06, 2003 at 04:52 UTC
    There are many uses for naked blocks, but one common way is for restricting access to lexical variables. Declare them inside the block and they are inaccessible (by name) outside the block. Their values can still exist in memory if there are references to them.
    { my $counter = 0; sub inc_counter { $counter++ }; # $counter is visible here sub get_counter { $counter }; # and here }
    The $counter variable has gone out of scope at the end of the block, therefore, you can never directly modify its value explicitly. However, since references to the variable are still made (in the two subroutines), the $counter variable's value still exists. It can only be accessed through these subs, however. But no one code can (maliciously) say $counter = -100, for example. This type of construction is called a closure, BTW.

    Naked blocks are also convenient for setting temporary values to globals:

    $/ = "\n"; my $line = <STDIN>; # <> operator reads just one line my $whole_file; { local($/) = undef; $whole_file = <F>; # <> operator reads in all of the data from fi +lehandle F } $line = <STDIN>; # reads just one line again, since $/ returns +to "\n"

    As for the next keyword, sure it can be used twice or more inside a looping block:

    for (0 .. 100) { next unless $_ % 2; # skip multiples of 2 next unless $_ % 5; # skip multiples of 5 print "$_\n"; }
    This prints all numbers that aren't divisible by 2 or 5. You can code next many times in your block, but of course only one of them can be followed per iteration. next does not skip you automatically to the 100th iteration of this loop, just the next one. Try it out!

    blokhead

      Just as a sidenote, this

      my $whole_file; { local($/) = undef; $whole_file = <F>; # <> operator reads in all of the data from fil +ehandle F }
      could also be written more concisely like this:
      my $whole_file = do { local $/; <F> };

      -- Hofmator

Re: A few random questions from Learning Perl 3
by fredopalus (Friar) on Jan 06, 2003 at 04:53 UTC
Re: A few random questions from Learning Perl 3
by theorbtwo (Prior) on Jan 06, 2003 at 04:56 UTC

    1: Naked blocks. You can use naked blocks for quite a number of purposes. They can be used to scope (lexical/my) variables, which is what I normaly use them for. You can also use the loop control operators (next,redo,last...) on them, and build your own loop that operates exactly how /you/ want it to.

    2: I can think of several reasons. One is using a m// where an eq would do -- regexes are /expensive/. Also, interpolating a variable into a regex will use the regex metachars in the interpolated variable as such, which is normaly not what you want.

    3: next takes you directly to the next iteration of the loop. You can call it only once per iteration (obviously), but as many places as you want.... for example:

    while (1) { print "a"; next; print "b"; next; exit; }
    will keep printing "a" over and over, and never get to the second next. It's still perfectly valid code... just not very useful.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: A few random questions from Learning Perl 3
by jpfarmer (Pilgrim) on Jan 06, 2003 at 04:59 UTC
    3) next brings things to the bottom iteration of a loop, right? Does that mean you can't call next more than once in any given loop? If not, what happens if you need to?

    Actually, it bypasses the remaining statements in the loop and goes directly to the next iteration. You can put more than one next in a loop, but once the first one is executed, the rest will be irrelevant.

    However, you could do something like this:

    while($runLoop){ next if ($x == $y); $x++; next if ($x == $y); print "x and y were never equal"; }

    Then you could have more than one next and it make sense.

Re: A few random questions from Learning Perl 3
by pg (Canon) on Jan 06, 2003 at 05:52 UTC
    For the completeness of the "next" topic, we have to introduce the two brothers of "next": "last" and "redo".

    1. next

      The "next" statement allows you to jump to the next iteration of the loop without executing the remaining statements of the current iteration.
      foreach (1..10) { next if ($_ == 9); print("$_ "); }
      Here's what it looks like:

      1 2 3 4 5 6 7 8 10

      As you can see, 9 is missing - this is because when the value of $_ hits 9, Perl uses the "next" statement to skip to the next iteration of the loop, and so 9 never gets printed.

    2. last

      The "last" statement is used to exit the loop completely.
      foreach (1..10) { last if ($_ == 9); print("$_ "); }
      And here's the output:

      1 2 3 4 5 6 7 8

    3. redo

      And finally, the redo statement lets you restart a particular iteration of the loop:
      foreach (1..10) { print("$_ "); if ($_ == 9 && $flag != 1) { $flag=1; redo; } }
      In this case, here's what you'll see:

      1 2 3 4 5 6 7 8 9 9 10

      Excellent answer, with one small adjustment to the last example to do with redo ..

      my $flag = 0; foreach (1..10) { print("$_ "); if ($_ == 9 && $flag != 1) { $flag=1; redo; } }

      I have initialized flag to zero at the beginning of the loop. This was, of course, understood :) by all readers, but for the newcomer, it might be helpful.

      --t. alex
      Life is short: get busy!
Re: A few random questions from Learning Perl 3
by davis (Vicar) on Jan 06, 2003 at 09:49 UTC
    Because nobody's said it already:

    1)Naked Blocks can be used to turn off warnings for the scope of the block - useful if you're doing some work that will encounter a lot of "uninitialized variable" warnings.

    #Some code up here, with warnings on. { no warnings; print $foo{$bar}; #Causes lots of "uninitialized variable" warnings, b +ut in this case it's ok to ignore them. }
    Wether or not you should do this is up to you.
    Woo. 100th post. Verily, feh.
    davis
    Is this going out live?
    No, Homer, very few cartoons are broadcast live - it's a terrible strain on the animator's wrist

      1)Naked Blocks can be used to turn off warnings for the scope of the block - useful if you're doing some work that will encounter a lot of "uninitialized variable" warnings.

      Blocks are used to create scope. What you do with that scope is up to you.

      This kind of scope is 'lexical'. You can do a lot with it:

      • Use a lexical pragma (like strict, warnings or utf8)
      • Use a lexical variable (declared with my)
      • Declare a global variable lexically (with our)
      • Temporarily give a package global another value (using local)
      Each of these use the lexical scope the block creates.

      You don't create a block to turn off warnings. You use no warnings to turn them off. The block only limits the effect of that statement. It's important to know that the block itself has nothing to do with warnings (well, you can get warnings regarding the block of course).

      This lexical stuff goes for all code blocks, bare or belonging to something like if or while.

      Bare blocks (in this thread called 'naked blocks'; haven't seen them called that anywhere else) act like loops. In the code blocks for while, until and for, you can the use loop controlling operators next, redo and last. These can also be used with bare blocks: redo goes to the beginning of the block, last goes to the end.

      You cannot use loop control operators with non-loop constructs like do, if and eval.

      if (...) { ...; last if ...; # cannot use C<last> with G<if> ...; }
      You can put a bare block in if's block to create the loop you want:
      if (...) { { ...; last if ...; # can use C<last> with a bare block ...; } }
      Bare blocks are used like this everywhere, but often disguised: double curlies are used to make the code look nicer.
      if (...) {{ ...; last if ...; # same ...; }}
      So if you see doubled curly brackets, the extra block is probably only there to make breaking out of it easy.

      - Yes, I reinvent wheels.
      - Spam: Visit eurotraQ.
      

Re: A few random questions from Learning Perl 3
by OM_Zen (Scribe) on Jan 06, 2003 at 16:56 UTC
    Hi ,

    The naked block of code is a block of any perl set of statements that are with in brackets.

    The next; actually goes to the end of current iteration of the loop and the last; is that you said , that which lets you go to the end of the iterations of the loop rather kind of exit from the loop itself .
Re: A few random questions from Learning Perl 3
by mowgli (Friar) on Jan 06, 2003 at 19:55 UTC
    With regard to your second question, no matter what anyone else might say, using regular expressions to solve a given problem is never a bad thing, and surely nothing one should be ashamed of, either; regular expressions are just another tool that perl gives you to help you solve your problems, and there is nothing wrong with using it, even when other solutions are available - There Is More Than One Way To Do It.

    --
    mowgli

      Geeee! at last I read all the way down to the bottom until I saw it. Indeed, TIMTOWTDI is the answer to the regex question. Shame on some of you other folk, who while accumulating perl wisdom seem to have forgotten the first lesson.

        There is more than one way to do it right, but there are lots of ways to do it wrong.

        Parsing HTML with regexes falls into the latter category.

        Using a regex can be one of the right ways to do it if you are parsing a pseudo-HTML format which is not allowed to have nesting beyond a very shallow depth in the portions you actually want to parse.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://224537]
Approved by diotalevi
Front-paged by FamousLongAgo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-09-01 15:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (14 votes), past polls