Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Getting data out of __DATA__ and __END__

by George_Sherston (Vicar)
on Sep 10, 2001 at 18:33 UTC ( #111465=perlquestion: print w/replies, xml ) Need Help??

George_Sherston has asked for the wisdom of the Perl Monks concerning the following question:

If I have a string of data items after __END__ or __DATA__, separated by a standard delimiter, and I want to put them all into an array - what's the neatest / least resource-intensive way to do this? And in particular, why doesn't this work?:
my @ary = split (" ",$_) while (<DATA>); print join(" / ", @ary); __DATA__ boop noop boop crep # prints "", whereas I wd have expected it # to print "boop / noop / boop / crep"
With humble thanks to all monks great and greater,

George Sherston

Replies are listed 'Best First'.
How about this?
by Rhose (Priest) on Sep 10, 2001 at 18:46 UTC
    I don't know if this is the "least resource-intensive" method, but it does seem to do what you requested. Please note that blank lines are not added to the array.

    use strict; #-- Change the end of record character to read the whole file. undef $/; #-- Create the array my @ary = split (/\s+/,<DATA>); print join(" / ", @ary); __DATA__ boop noop boop crep a1 a2 a3 a4 a5 b1 b2 c1 c2 c3 c4 d1 d2 d3

    Is this what you wanted?

      Oh bravo! Me and my big mouth. Yours is much better:

      my @ary = split (/\s+/,<DATA>);
      But isnt
      my @ary = split (" ",<DATA>);
      better still? :-)

      Yves
      --
      You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

Re: Getting data out of __DATA__ and __END__
by demerphq (Chancellor) on Sep 10, 2001 at 19:18 UTC
    Hey George.

    At first I thought this was a no brainer and wondered why you posted it. Then I played with it a bit and realized its a bit more subtle than first glance and is a worthy post :-).

    As far as I understand modifiers do not create a local scope, so regarding the first if the local() call affects the whole sub, the second if only effects the scope of the if.

    sub test { local $\="NEWLINE" if 1; if (1) { local $\; } print "text:"; }
    So at first glance this suggests that your code should work. But I believe that the my decl happens over and over. This sounds confused (cause I am a bit confused) but consider if you had multiple lines in your __DATA__ block? Which @ary that was declared should be used? Now you see the angle I am getting at. Personally I suspect that this is a bug, insofar that perls behaviour with regard to modifiers and my() doesnt seem quite right.

    Consider the following:

    # This works. my @ary; @ary = split (" ",$_) while (<DATA>); print join(" / ", @ary); #As does this my ($str,@ary); @ary = split (" ",$str) while ($str=<DATA>); print join(" / ", @ary); # This doesnt work. (as you pointed out) my @ary = split (" ",$_) while (<DATA>); print join(" / ", @ary); #Nor does this either my (@ary); @ary = split (" ",$str) while (my $str=<DATA>); print join(" / ", @ary);
    So it looks like my doesnt work as part of a while modifier which is like the foreach modifier, but unlike foreach NO ERROR is raised. That to me makes it a bug. Incidentally the normal while() does not suffer this problem.

    Anyway, if as you say you just want an array of all of the words in your __DATA__ segement then this is probably the smallest way to do it.. (uh oh did I say that? Shoot, now the obfus crew are gunna turn it into a bunch of line noise.. :-)

    my @words = map { split " ",$_ } <DATA>; print "@words\n";
    Of course you could also mess with $/ as so:
    $/=" "; my @words=<DATA>;
    But personally I wouldn't, cause you'll just end up cleaning off the newlines and spaces anyways.

    Yves
    --
    You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

Re: Getting data out of __DATA__ and __END__
by jj808 (Hermit) on Sep 10, 2001 at 18:52 UTC
    By writing my @ary= as part of the loop, the array will be cleared out on each iteration, with the effect that if your script finishes with a blank line the array will be replaced with split " ","" on the last iteration. Try using push instead:
    while (<DATA>) { push @ary,split; } print join " / ",@ary; __DATA__ one two three four boop noop boop crep
    This will work as expected.

    Update: You could also write it as:

    push @ary,split while (<DATA>); print joint " / ",@ary;
    but personally I think the first version is more readable.

    JJ

      By writing my @ary= as part of the loop, the array will be cleared out on each iteration, with the effect that if your script finishes with a blank line the array will be replaced with split " ","" on the last iteration.

      Sorry but as you said earlier I beg to differ. :-)

      There is something deeper and funkier happening than what you have stated is going on. I dont claim to know what it is but its definately not the 'if the last line is blank' problem. You might want to take a look at my other reply in this thread for a couple of examples, but try this on for size:

      my @ary=split(" ",$_) while (<DATA>); print "@ary"; __DATA__ These are words

      Now this DOESNT work. Even though there is no blank line. (use an editor that can show you newlines to make sure.) Now make one tiny change and what happens

      my @ary; @ary=split(" ",$_) while (<DATA>); print "@ary"; __DATA__ These are words
      Voila it works! So now we KNOW that it isnt anything to do with those darn sneaky hidden blank lines.

      Quite frankly until somebody at the wizard/god level tells me this isnt a bug and explains exactly what is going on my money is on the cockroaches...

      Yves
      --
      You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

        Well I'm definately not at the wizard/god level, but I think I might know what's really going on. It's up to others to call it a bug or not.

        I threw this in a debugger to see what exactly happens where. With this code..

        my @ary=split(" ",$_) while (<DATA>); print "@ary"; __DATA__ These are words These are more words
        I set a breakpoint upon reaching the suspect line. When it hits the line, the my variable @ary is created. It's empty. This is done before the evaluation of the expression. Unfortunately the debugger I was using won't show the reiteration of that line. But needless to say, when I stepped off that line, @ary was still empty.

        Next, I used

        my @ary; @ary=split(" ",$_) while (<DATA>); print "@ary"; __DATA__ These are words These are more words
        I set the breakpoint on the my declaration. As before, when I hit that line, @ary was created. When I left the line, it contained the last line of text, each word in an array element.

        What that information, here's my opinion:

        Each time the line in question is accessed, @ary is redeclared (blank). This even happens when the line is executed just for <DATA> to return false. So the last iteration redeclares @ary, there's no more DATA and the program moves on.

        Is that a bug? I'll let someone else answer that :)

        Rich

(tye)Re: Getting data out of __DATA__ and __END__
by tye (Sage) on Sep 10, 2001 at 20:39 UTC
    my @list= split ' ', do { local $/; <DATA> };
            - tye (but my friends call me "Tye")
Re: Getting data out of __DATA__ and __END__
by chromatic (Archbishop) on Sep 10, 2001 at 20:27 UTC
    How about a one-liner?

    print join(' / ', split(" ", $_) ), "\n" while <DATA>;

      Hmmm. /me scratches head. Sorry if this is getting OT, but earlier in this thread John M. Dlugosz pointed out that my original formulation, my @ary = split (" ",$_) while (<DATA>) doesn't work because the magical assignment to $_ only happens in a regular while loop, not in a statement modifier. And of course, he's quite correct. But why, then, does your version work? What's the difference? Do I need to change my aftershave?

      Update: Looks like that wasn't the problem with my original formulation. Well, the problem's solved, for which again thanks. But I'd still be very interested to know why my version didn't work.

      George Sherston
        I hope this doesn't come out wrong but...
        Where are people getting this "$_ only happens in a regular while loop stuff? I would like to see a reference. I routinely use one liners like print while (<>). I think this might be confussion with the $_ magic assignment only happens in a while loop when using the <> operator. Can anyone correct me on this? BTW I <e>really</e> liked demerphq's
        my @words = map { split " ",$_ } <DATA>;
        solution. Ugh, Zog say map gooood.

        I also subscribe to the unsubstantiated opinion that a statement like

        my @ary = split (" ", $_) while (<DATA>);
        is exactly like
        while (<DATA>) { my @ary = split (" ", $_); }
        which would kinda explain this behavior.

        again, I would love to be corrected on this and shown the errors of my ways.

        I'm a sinner! Ira punnishes his wicked flesh bad, bad flesh. After running some code samples I have no answers but only questions.

        So clearly the my declaration of array is *not* local to some implicit scope because use strict doesn't complain when you access it later in the file. And so I used the Dump procedure from Devel::Peek and the plot thickened...

        use strict; use warnings; use Devel::Peek; my @array = split (" ", $_) while (<DATA>); print Dump(\@array); __DATA__ ksg gae agdg ekau eg Gke geo g ep ge
        Gave me...
        SV = RV(0x1a97584) at 0x1a7f0bc REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x1a72ffc SV = PVAV(0x1a7dd3c) at 0x1a72ffc REFCNT = 2 FLAGS = (PADBUSY,PADMY) IV = 0 NV = 0 ARRAY = 0x1a7012c FILL = -1 MAX = 4 ARYLEN = 0x0 FLAGS = (REAL)
        Note that MAX is 4? When I do this on an empty, never accessed array MAX is -1. So I think that somewhere along the way @array got 4 elements assigned to it. But that's also weird because there should have been 5.

        Any hubris I had is now confussion. All is lost... all is lost...

        Ira.

Re: Getting data out of __DATA__ and __END__
by John M. Dlugosz (Monsignor) on Sep 10, 2001 at 18:48 UTC
    Because the magical assignment of <handle> to $_ only happens in a regular while loop, not in a statement modifier. Update: that's not correct.

    Write while (<DATA>) { @ary = split (" ",$_) }; and split will find something in $_ to operate on. Of course, you're still only going to print the result of the last line of data. If that's a blank line...

    —John

      I beg to differ.

      Try this code:

      print while (<DATA>); __DATA__ one two three four five six seven eight
      $_ is assigned when using while as a statement modifier.

      But as you said, if $_ is a blank line...

      JJ

Re: Getting data out of __DATA__ and __END__
by George_Sherston (Vicar) on Sep 10, 2001 at 19:11 UTC
    Thanks all. Problem solved. + learnt a couple of other interesting things.

    George Sherston
Re: Getting data out of __DATA__ and __END__
by mirod (Canon) on Sep 10, 2001 at 22:24 UTC

    If all the _DATA__ section is on one line then the easiest seems to be:

    #!/bin/perl -w use strict; { local $/=' '; # here's your fixed string! my @ary = <DATA>; print join(" / ", @ary); } __DATA__ boop noop boop crep

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://111465]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2019-08-18 19:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If you were the first to set foot on the Moon, what would be your epigram?






    Results (135 votes). Check out past polls.

    Notices?