Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

golf anyone? (taking first field)

by John M. Dlugosz (Monsignor)
on Jan 07, 2003 at 06:34 UTC ( #224874=perlquestion: print w/replies, xml ) Need Help??

John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

I read a file into an array, and want to take just the first thing on each line and return a list of those. I was overwhelmed with TIMTOWTDI. I typed and backspaced 3 times already. Why? Because there should be a nicer way to write it. But it turned into a distraction. If there was only one obvious way to do something, I would have just done it and be done with it. But Perl can be "fun" besides...

So, I thought I'd throw it out as a challange.

Given: variable @list contains a bunch of lines of the form

xxxxx : yyyyy blah blah
Where the :stuff is optional, and the line may be blank in which case it should be ignored. Return a list containing just the xxxxx parts. Specifically, remove blank lines, and truncate each line at the first : (if present) and get rid of trailing whitespace after the xxxxx part (including the possible "\n".

Any takers?

—John

Replies are listed 'Best First'.
Re: golf anyone? (taking first field)
by Arien (Pilgrim) on Jan 07, 2003 at 07:56 UTC

    For 26 strokes:

    # 1 2 #2345678901234567890123456 map/(.+?)\s*(?>:|$)/,@list

    Update: Also eliminating lines that contain just a colon and optional whitespace. 28 strokes:

    # 1 2 #234567890123456789012345678 map/(.+?)\b\s*(?>:|$)/,@list

    — Arien

      Ah, so the \b makes the match fail, and thus nothing gets emitted to the result list in the map? Is that Kosher? That is, the pattern in list context returns all the captured things, but normally $1 etc. are improper to use after a match fails; they sometimes have garbage even. But using the result as list context doesn't have the same problem, because it "knows" it iterated zero times. Very interesting!

      This is why golf is so interesting. Many thanks for the masterful demonstration.

      —John

Re: golf anyone? (taking first field)
by jmcnamara (Monsignor) on Jan 07, 2003 at 09:16 UTC

    The following should do what you require:     @a = map{(split)[0]}@a;

    As an un-golfed one-liner I would write it like this:     perl -lane 'print $F[0] if @F' file

    Update: I see that you added a specific requirement for lines that begin with ":" in one of your follow ups. In Golf that's not fair since it means that we would have to have read all of the other posts before posting a reply.

    Anyone interested in posting a Golf challenge should read Tips on Writing a Golf Challenge.

    The following handles the special case (the ^ is probably optional):     @a = grep!/^:/,map+(split)[0],@a;

    --
    John.

Re: golf anyone? (taking first field)
by Aristotle (Chancellor) on Jan 07, 2003 at 10:44 UTC

    Update: 'tis broken. See sauoq's reply.

    Do you really just want the first word, even if there are several before the colon? In that case, 21.

    # 1 2 #23456789012345678901 map/^([^:\s]+)/,@list
    If you want to allow optional whitespace at the start of a line, 24.
    # 1 2 #23456789012345678901234 map/^\s*([^:\s]+)/,@list

    Makeshifts last the longest.

Re: golf anyone? (taking first field)
by sauoq (Abbot) on Jan 08, 2003 at 02:09 UTC

    28 but it works.

    # 1 2 #1234567890123456789012345678 map/(^\s*[^:]*[^:\s])/,@list

    Specifically, remove blank lines, and truncate each line at the first : (if present) and get rid of trailing whitespace after the xxxxx part (including the possible "\n".

    Of all the solutions that were submitted, only this one and blokhead's (36) took into account that the 'xxxxxx' portion might itself contain whitespace. (Well, to be fair, busunsl's did too, but it didn't properly remove newlines. Update: So did Arien's as he kindly pointed out to me but his breaks by being too liberal in what it accepts/returns.)

    This is what I used to test:

    This is the output:

    -sauoq
    "My two cents aren't worth a dime.";
    
      Very impressive!

      It's amazing how people make their own additional constraints and then blame someone for not writing a clear specification.

      Your solution

      map/(^\s*[^:]*[^:\s])/,@list
      Is beautifully clear and simple, not just short because of fancy tricks. Read all the leading whitespace, read up to the first : or to the end, finally back off any whitespace. (actually, will read over consecutive :'s, not the first. But it's the first =occurance=.)

      In general, the x*y regex idiom, where y is a union of x and w, will take internal w but not trailing w. This triggers backtracking to literally "back off" if it happened to end in w.

      Furthermore, it works with only the most common regex features, not using rare backslash chars or extensions. It can be grocked by anyone with a little regex experience.

      The fact that the blank lines and blank-after-truncating lines are purged without special case logic means that the algorithm to "find the interesting part" matches neatly what the defining characteristic of that interesting part is. The desired behavior of blanks and empty lines is a natural concequence of that fundimental idea, not arbitrary rules designed to make it harder (like a putt-putt course's windmill?).

      Well done. I think you hit the sweet spot on that one.

      —John

Re: golf anyone? (taking first field)
by blokhead (Monsignor) on Jan 07, 2003 at 06:51 UTC
    Here's my shot (37 chars):
    map{/([^:]*?)(\s*\n|\s*:)/&&$1}@list; # silly me ... distributive property! update: 34 chars: map{/([^:]*?)\s*(\n|:)/&&$1}@list;
    I tested it with the following data and it seemed to work correctly:
    __DATA__ 123 : oain:b:okfbd 456 foo bar : df a dsaf asdf 111 :
    It doesn't work when the input line contains only whitespace, but it does work for blank (other than the \n) lines. It returns empty string when the colon is the first non-whitespace. I don't know if this is correct behavior or not, though! This is my first golf attempt, so I may have missed some regex shortening tricks.

    Update: Here's a slightly different approach that's the same 34 chars, although it behaves differently. This one does ignore lines of complete whitespace. It still gives empty string when a colon is the first non-whitespace in the line.

    map{(split/\s*(:|\n|$)/)[0]}@list;

    blokhead

      If the colon begins the line, it should be the same as an empty line. That is, remove the '' from the output list.
        In that case,
        map{/([^:]*?)\s*(:|$)/;$1||()}@list;
        Back to 36 chars. ;)

        blokhead

Re: golf anyone? (taking first field)
by busunsl (Vicar) on Jan 07, 2003 at 07:33 UTC
    36 chars and handles blank lines, empty lines and lines starting with a colon:
    @list = ( "123 : oain:b:okfbd", "", "456", "foo bar : df a dsaf asdf", "111 :", ": adf", " " ); $, = "\n"; print map{/(.*?)(\s*:|\s*$)/;$1||()}@list;
    Chokes if the values in front of the colon is '0'.

    Update: 33 chars (destructive):

    map{s/\s*:.*|\s*$//;$_||()}@list;
    Update: 32 chars (destructive):
    map{s/\s*(:.*|$)//;$_||()}@list;
Re: golf anyone? (taking first field)
by CountZero (Bishop) on Jan 07, 2003 at 07:23 UTC

    Here's my shot at it:

    use strict; my @list=('xxxxxa : yyyyy blah blah','xxxxxb : yyyyy blah blah',' ', +'xxxxxc : yyyyy blah blah',': this is strange','xxxxxd : yyyyy blah b +lah',"\n",'xxxxxe : yyyyy blah blah'); my @result = map {m/(.*?)\s+:/} @list; print join("\n",@result);

    It throws out blank lines, lines with only a trailing newline and lines which start with a colon.

    The operative part counts 24 characters.

    CountZero

    Update:Added ? to make the match non-greedy (see blokhead's remarks)

    Update 2:The operative part could be written as
    map/(.*?)\s+:/,@list
    and is then only 20 characters long.

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      On inputs " : space before colon\n" it returns the spaces (minus one), and "xxxxx: no space between data and colon", it returns an empty list.

      blokhead

        By adding a ? to the regex code
        my @result = map {m/(.*?)\s+:/} @list;
        I think it now works:

        • the delimiter is space before colon, hence "xxxxx: no space between data and colon" returns nothing as there is no valid delimiter
        • and on " : space before colon\n" it now returns the empty list as there is a valid delimiter and all spaces are stripped from the result, so nothing remains. Please, note that this is different from having an empty line to start with, which is to be dropped.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://224874]
Approved by mdillon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2020-10-31 00:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (286 votes). Check out past polls.

    Notices?