Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

reading a delimited file and selecting values from it

by Conal (Beadle)
on Mar 12, 2008 at 01:39 UTC ( [id://673653]=perlquestion: print w/replies, xml ) Need Help??

Conal has asked for the wisdom of the Perl Monks concerning the following question:

hi..

I am using another script to create a load of comma delimited files that contains a quote and a time ..they are created every half hour... an example of one file '11-03-08-2pm.txt'

1.53311 ,1:59:52 1.53311 ,1:59:5220 1.53311 ,1:59:52 1.53311 ,1:59:52hi 1.53311 ,2:00:00 1.53306 ,2:00:03 1.53307 ,2:00:06
what i want to do is grab the quote at 00 seconds or the one closest to but not after the half hour mark.. e.g 1:29:54 could be used if i have no quote for 1:30:00 , i wouldnt use 1:30:01 even tho its closer.

there can be some vagaries in the readings file e.g 1:59:52hi , i can dismiss any data after the seconds reading.

i will then save only that selected quote back to its file.

does anyone care to nudge me in the right direction here pls? its been a while since i have had a need for any perl and my skills are little rusty to say the very least and i am drawing somewhat of a blank.

thanks kindly for any input ..

conal.

Replies are listed 'Best First'.
Re: reading a delimited file and selecting values from it
by kyle (Abbot) on Mar 12, 2008 at 02:42 UTC

    Here's a start. This is actually pretty close to the algorithm that nefigah described (though I wrote it before reading that).

    use strict; use warnings; use Data::Dumper; my $last_hour = 23; my %best_of; while (<DATA>) { my ( $quote, $time ) = m{ \A # beginning of line ( [^,]+ ) # non-commas \s* , \s* # comma with optional spaces ( # open capture \d\d? # hours : \d\d # minutes : \d\d # seconds ) }xms; my ( $hour, $min, $sec ) = split /:/, $time; if ( '00' eq $sec && '00' eq $min && -1 == --$hour ) { $hour = $last_hour; } my $seconds_past = $min * 60 + $sec; if ( ! $seconds_past || $best_of{ $hour }{second} < $seconds_past ) { $best_of{ $hour } = { second => $seconds_past, time => $time, quote => $quote, }; } } print Dumper \%best_of; __DATA__ 1.53311 ,1:59:52 1.53311 ,1:59:5220 1.53311 ,1:59:52 1.53311 ,1:59:52hi 1.53311 ,2:00:00 1.53306 ,2:00:03 1.53307 ,2:00:06

    Here's the output:

    $VAR1 = { '1' => { 'quote' => '1.53311 ', 'time' => '2:00:00', 'second' => 0 }, '2' => { 'quote' => '1.53307 ', 'time' => '2:00:06', 'second' => 6 } };

    This pops out a couple of warnings ("Use of uninitialized value in numeric lt (<)") in the last condition because it's comparing $seconds_past to an undef that gets autovivified in %best_of.

    Anyway, what you end up with is a hash with each hour seen as a key. The values are hash refs that contain the data you're interested in.

      great, thats fantastic Kyle..

      the data will be a lot more useful to have the values in some kind of array cos i plan to manipulate it further later .. your code with be invaluable to me as a base structure.

      can i just ask , how i get the time variable to deal with phantom extra digits? e.g 1:59:52hi

      how do i get it to disregard the extraneous data at the end? thanks again

        The pattern I used already does that, as written. The pattern matches everything you want, up to the extraneous data, and that's where it stops.

        The problem with it (if you consider this a problem) is that the loop doesn't notice if there's a non-match. If you have some bogus line in the file, it's going to try to use it anyway. This will probably manifest as an undef quote at midnight. That's part of why I said it's a start.

Re: reading a delimited file and selecting values from it
by nefigah (Monk) on Mar 12, 2008 at 02:28 UTC

    It's possible there's some tools out there that would make this easier, but to tackle it yourself, here are some ideas to get you started. If you need specific help on a certain part of this, just ask.

    Create a hash or something to hold the data you want to keep. One idea might be to have the keys be like '1', '1.5', '2', '2.5' etc. (i.e. a key for the hour and a key for the half hour for each of the 24 hours).

    Open the file and go through it line by line, using a regex to separate each line into separate chunks for Quote, Hour, Minute, and Second. This should be relatively painless because the lines seem to be delimited pretty clearly.

    Look at the Hours and Minutes returned from the current match to find out where that line should be placed in your hash. (i.e. if $hour = 12 and $minute = 37, it would go in $hash{'12.5'}) and place the quote there. Assuming the file is in chronological order, this should do it for ya. If it's not, you may have to keep track of the seconds and do a comparison to determine if the new value is "higher".

    Hopefully that gets you going in the right direction.


    I'm a peripheral visionary... I can see into the future, but just way off to the side.

Re: reading a delimited file and selecting values from it
by Conal (Beadle) on Mar 12, 2008 at 02:42 UTC
    Ok , thanks for the input.. but the thing is i have a file for each half hour interval, with upto 10 seperate readings..

    I have put this ugly looking thing together.. which does seem to work..

    use strict; my $input = "quotes.txt"; my ($quote,$time); my $actualquote; my $actualtime; my $hour; my $minute; my $second; my $picksecond; open(DATAFILE, "$input") || die("Can't open $input:!\n"); $picksecond =1; while (<DATAFILE>) { chomp $_; ($quote,$time) = split(",", $_); ($hour,$minute,$second) = split(":",$time); if ($second==0) {$actualtime=$time; $actualquote=$quote;$picksecond=99;} if ($second>$picksecond) {$picksecond=$second;$actualtime=$time; $actualquote=$quote;} print "$quote\t$time\t$second\n"; } print "$picksecond\t$actualtime\t$actualquote\n";

    Does anyone care to comment on my code..??

    i have another outstanding issue..

    when i have a messed up time field e.g 17:30:07fg

    how do i get the split command to ignore any characters after the 2 seconds digit? thanks.

    conal.
Re: reading a delimited file and selecting values from it
by Anonymous Monk on Mar 12, 2008 at 02:06 UTC
    7-27-setII.pm
    could be if 1:29:54 you would subtract from the half or the zero

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://673653]
Approved by pc88mxer
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 06:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found