Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

regexp's

by djw (Vicar)
on Oct 03, 2000 at 05:57 UTC ( #35037=perlquestion: print w/replies, xml ) Need Help??

djw has asked for the wisdom of the Perl Monks concerning the following question:

I have lines in an array that look like this:

1 time 02:11:05 djw
5 time 04:20:03 bert
2 time 00:01:39 chris

I would love to be able to get the initial # on the line
as a key value in a hash, and the name at the end as the
key. What is this the best way to handle this?

I'm stuck argh.

Thanks,
djw

Replies are listed 'Best First'.
(jcwren) RE: regexp's
by jcwren (Prior) on Oct 03, 2000 at 06:24 UTC
    Or, in a few less lines...
    #!/usr/local/bin/perl -w use strict; { my %hash = (); while (<DATA>) { my @items = split; $hash {pop @items} = shift @items; } print "key=$_, val=$hash{$_}\n" foreach sort keys (%hash); } __DATA__ 1 time 02:11:05 djw 5 time 04:20:03 bert 2 time 00:01:39 chris

    [jcw@linux fs]$ perl q.pl key=bert, val=5 key=chris, val=2 key=djw, val=1 [jcw@linux fs]$


    --Chris

    e-mail jcwren

      A benchmark vs. Ovid's regex. =) Tho you all knew split beats regex, here is the numbers.

      Benchmark: timing 50000 iterations of Ovid, jcwren...
            Ovid:  4 wallclock secs ( 3.32 usr +  0.00 sys =  3.32 CPU) @ 15060.24/s (
      n=50000)
          jcwren:  2 wallclock secs ( 2.63 usr +  0.00 sys =  2.63 CPU) @ 19011.41/s (
      n=50000)
                Rate   Ovid jcwren
      Ovid   15060/s     --   -21%
      jcwren 19011/s    26%     --

      --
      $you = new YOU;
      honk() if $you->love(perl)

      My

      GOD

      that is elegant. Too bad I can only ++ you one time on that one. This is one the most elegant things I have EVER seen, save for something merlyn did in Effective Perl programming:
      ($_ & ~$_) eq 0
      to determine if a scalar is a number (i may be slightly off on the actual expression)
(Ovid) Re: regexp's
by Ovid (Cardinal) on Oct 03, 2000 at 06:20 UTC
    Assuming that the data is in a file called "data.txt", I might try something like the following (untested):
    #!/usr/bin/perl -w use strict; my %somehash; my $file = 'data.txt'; open FILE "<$file" or die "Can't open $file for reading: $!"; while (<FILE>) { if (/^(\d+)\s+[^\s]+\s+[^\s]+\s+([a-zA-Z]+)$/ { $somehash{$2} = $1; } }
    The regex breaks out as follows:
    /^ # Anchor to beginning of string ( # Capture to $1 \d+ # one or more digits ) # \s+ # One or more whitespace [^\s]+ # One or more non-whitespace \s+ # One or more whitespace [^\s]+ # One or more non-whitespace \s+ # One or more whitespace ( # Capture to $2 [a-zA-Z]+ # One or more letters ) # $/x; # Anchor to end of string
    For more information about why I did not use a simpler regex like /^(\d+).*\b(\w+)$/, you may want to read Death to Dot Star!.

    Simpler, however, would be to use a split (also untested):

    while (<FILE>) { chomp; my ($value, $key) = (split /\s/, $_)[0,3]; $somehash{$key} = $value; }

    Cheers,
    Ovid

    Update: I would just like to say that I have no frickin' idea why I wrote that regex. Yes, it works. So what? I saw regex in the title and got carried away.

    Use the split;

    UpdateII: Yup. I have the key value backwards. It's fixed now. Sigh.

    Join the Perlmonks Setiathome Group or just go the the link and check out our stats.

RE: regexp's
by vladdrak (Monk) on Oct 03, 2000 at 06:26 UTC
    How about:
    use strict; my %hash=(); while (<>) { my ($num,$name)=(split/\s/,$_)[0,3]; $hash{$name}=$num; } foreach (keys %hash) { print "Key: $_\n"; print "Data: $hash{$_}\n"; }
Re: regexp's
by Cybercosis (Monk) on Oct 03, 2000 at 10:37 UTC
    Well, if it's in a file, I'd do this:
    @ARGV = filename.txt; while(<>) { /^(\d)\s+time\s+\d{2}\:\d{2}\:\d{2}\s+(\d+)/; $hash{$1} = $2; }
    -------------------update-------------- as per merlyn's advice: (godz it's hard not to try to make spaceballs jokes...)
    while(<>) { if(/^(\d)\s+time\s+\d{2}\:\d{2}\:\d{2}\s+(\d+)/) { $hash{$1} = $2; } }
    or didn't i read something about this possibly being acceptable:
    while(<>) { { /^(\d)\s+time\s+\d{2}\:\d{2}\:\d{2}\s+(\d+)/; $hash{$1} = $2; } }
    since the enclosing {} puts it in a seperate block? i fully expect to be waaaay off on this.
      Do not use $1 except after you've verified that the match was successful, else you will get the previous $1. That would not have been fatal here, but if you'd been accumulating rather than setting, you'd get some odd stuff.

      -- Randal L. Schwartz, Perl hacker

Re: regexp's
by ChOas (Curate) on Oct 03, 2000 at 19:08 UTC
    Don't hate me for this:
    #!/usr/bin/perl -w use strict; my %Hash; $Hash{substr($_,rindex($_," ")+1,-1)} = substr($_,0,index($_," ")), wh +ile(<>); print "key=$_, val=$Hash{$_}\n" foreach sort keys (%Hash);

    GrtZ! ;)))
(Dermot) Re: regexp's
by Dermot (Scribe) on Oct 03, 2000 at 17:54 UTC
    Untested but should be enough to give you the idea:
    foreach (@array) { /^(\d+).*(\w+)$/; %hash{$2} = $1; }
    - For each element in @array
    
    - Start of line
    - Capture digits to $1
    - Don't capture stuff between $1 and $2
    - Capture alphanumerics to $2
    - End of line
    
    - Use values captured in $1 and $2 to populate hash
    
    Update: Long live Dot Star. The problem is simple, I believe the solution should be too.

    Update II: Ovid Just a nitpick, don't hate me for this but he wanted the last field as key and first field as value not the other way around. jcwren that is a beautiful solution.

      AAAAAAARRRRRRRGGGGGGGGGHHHHHHHHH! I had them backwards!!!! I hate it when I do that!!!

      Yes, I'm anal about the .* thing. I'm also anal about use strict, -w, checking that my open actually opened something, etc. I do see your point and I acknowledge that your regex is much simpler to read. However, iteration combined with dot star is begging for issues. In this case, though, since the backtracking appears to be small (just 4 characters, max) -- assuming that this is not just an over-simplified subset of data -- it's probably not that much of an issue.

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just go the the link and check out our stats.

RE: regexp's
by Anonymous Monk on Oct 03, 2000 at 20:25 UTC
    Well, this is one way: foreach (@array) { my ($key,$val); /\w+$/ and $key = $&; /^\d+/ and $val = $&; $hash{$key} = $val; } This will slow down all your pattern matches because of using $&. The alternative, if you don't use $& or similar variables elsewhere, would be $key = $_; $val = $_; and then some kind of substitution. It also assumes that numbers are always,er, numbers and the names are always one word. If not, you could use \b for boundary-matching dave
RE: regexp's
by Anonymous Monk on Oct 03, 2000 at 21:49 UTC
    # you might have to refine the regexp, but that should # match your array foreach $element (@array) { if ($element=~/(\d+).*\s+(\w+)/) { $hash{$2}=$1; } } # probably not the best, but it should work!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://35037]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (1)
As of 2021-05-09 07:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (100 votes). Check out past polls.

    Notices?