Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
No such thing as a small change
 
PerlMonks  

parsing with two files

by perlthirst (Scribe)
on Feb 20, 2009 at 11:24 UTC ( #745335=perlquestion: print w/ replies, xml ) Need Help??
perlthirst has asked for the wisdom of the Perl Monks concerning the following question:

hi,

I have two files.

file1
------
12131
34234
53435
46566
34522

file2
------
some content
some content
some content
some content
some content
blah. blah.
Code id -46566-
some content
some content
some content
some content
Event id -445778441211-
some content
some content
some content
some content
some content
some content
some content
Code id -12131-
some content
some content
some content
some content
some content
some content
some content
some content
some content
Event id -123443111131-
Code id -12342-
some content
some content
some content
some content
some content
some content
some content
some content
Event id -445987432141-

file1 contains the list of codes, and file2 contains, code with its id and after some lines corresponding event ids for that code id.

My output should have list of code ids and its corresponding event id.

I have written the following code to achieve the above requirement

use strict; use warnings; open (FH1, "file1"); open (FH2, "file2"); my ($code, $event, $line); my %data; while (<FH1>) { chomp; # assigning code. $code = $_; while ( <FH2> ) { chomp; if ( $_ =~ /\-$code\-/ ) { $line = $_; while ( $line !~ /Event id (.*)/ ) { $line = <FH2>; chomp; } if ( $line =~ /Event id (.*)/ ) { $data{$code} = $1; } } } seek(FH2,0,0); } print "OUTPUT"; print %data;

It works, but i want some other way which should be simple and effective.

Comment on parsing with two files
Download Code
Re: parsing with two files
by targetsmart (Curate) on Feb 20, 2009 at 12:18 UTC
    Assuming that file1 contains only code-ids and it is not huge in size.
    use Data::Dumper; $filename = "codes.txt"; open $fh1, $filename or die "can't open $filename : $!\n"; $filename2= "./context.txt"; open $fh2, $filename2 or die "can't open $filename2 : $!\n"; chomp(@Ids=<$fh1>); LABEL: while(chomp(my $line = <$fh2>)){ foreach my $codeid (@Ids){ if($line =~ /-$codeid-/){ s/Event id -(\d+)-/$data{$codeid}=$1/e && next LABEL while +(<$fh2>); } } } print Dumper \%data;
    the shown code will work and may be effective than yours(because I have not run benchmark on this).

    Vivek
    -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.
Re: parsing with two files
by Bloodnok (Vicar) on Feb 20, 2009 at 12:39 UTC
    How about something along the lines of i.e. untested :-), the following...
    use warnings; use strict; use autodie qw/open/; open FILE, "<file1"; my %results = map { chomp; ($_ => []) } <FILE>; close FILE; open FILE, "<file2"; while (<FILE>) { next unless /^(Event|Code) -(\d+)-/; my $code = $2, next if $1 eq 'Code'; push @{ $results{$code} }, $2; } close FILE;
    Obviously, you'd need to put filters in place to handle whitespace e.g. blank lines in file1...

    A user level that continues to overstate my experience :-))
Re: parsing with two files
by puudeli (Pilgrim) on Feb 20, 2009 at 12:55 UTC

    Your problem description was not exact. You didn't tell and the example data didn't tell either whether multiple event id's can follow one code id. You once used plural for event ids and in another sentence you used the singular.

    If only one event id can follow one code id, I thought I'd skip the multiple reading of file2. Instead using a lookup table I would parse the valid code id's in advance.

    #! /usr/bin/perl use strict; use warnings; use Data::Dumper; # State machine use constant CODE => 'CODE'; use constant EVENT => 'EVENT'; # Codes pre-parsed from file1 my %codes = ( 12131 => 1, 34234 => 1, 53435 => 1, 46566 => 1, 34522 => 1, ); my %results; my ($code_id, $event_id); my $state = 'CODE'; my $re_code = qr{Code \s+ id \s+ -(\d+)- }x; my $re_event = qr{Event \s+ id \s+ (-\d+-) }x; LINE: while( <DATA> ) { chomp(); if( ($state eq CODE) && /$re_code/ ) { # Check that code is valid if( $codes{$1} ) { $code_id = $1; $state = EVENT; } else { next LINE; } } else { if( ($state eq EVENT) && /$re_event/ ) { $results{$code_id} = $1; $state = CODE; } else { next LINE; } } } print Dumper(\%results); 1; __DATA__ some content some content some content some content some content blah. blah. Code id -46566- some content some content some content some content Event id -445778441211- some content some content some content some content some content some content some content Code id -12131- some content some content some content some content some content some content some content some content some content Event id -123443111131- Code id -12342- some content some content some content some content some content some content some content some content Event id -445987432141-

    Output:

    $VAR1 = { '12131' => '-123443111131-', '46566' => '-445778441211-' };

    Update: removed the odd __DATA2__ tag from the end..

    --
    seek $her, $from, $everywhere if exists $true{love};
Re: parsing with two files
by repellent (Priest) on Feb 20, 2009 at 17:45 UTC
    use warnings; use strict; use Data::Dumper; # store codes in hash for fast lookup my %code; { open(my $FH1, "<", "file1") or die($!); chomp() and ++$code{$_} while <$FH1>; } # read in chunks that end with 'Code id -' my %data; { local $/ = "Code id -"; open(my $FH2, "<", "file2") or die($!); while (<$FH2>) { /^(\d+)-.*\nEvent id -(\d+)-/s or next; $data{$1} = $2 if $code{$1}; } } print Dumper(\%data); __END__ $VAR1 = { '12131' => '123443111131', '46566' => '445778441211' };

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://745335]
Approved by Bloodnok
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (10)
As of 2014-04-18 13:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (468 votes), past polls