Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Condition on multiple lines and big file

by epimenidecretese (Acolyte)
on Dec 06, 2013 at 10:17 UTC ( [id://1065950]=perlquestion: print w/replies, xml ) Need Help??

epimenidecretese has asked for the wisdom of the Perl Monks concerning the following question:

Ciao guys,

I'm trying to solve the following problem but I'm not sure if perl it's the right tool for it. My data look like this:

Buenos - SPN Buenos B-GPE Aires - SPN Aires I-GPE Afghanistan - SPN Afghanistan B-GPE Europa - SPN Europa B-GPE UE - I UE B-GPE Italia - SPN Italia B-GPE Provincia - SS Provincia B-GPE di - E di I-GPE Lucca - SPN Lucca I-GPE ...

As yoou can see, whene there is an I-GPE it means that the name has to be composed whit the line before (i.e. Buenos Aires); when you have just B-GPE and the following line is also B-GPE, than it means they are different names.

Problem is the file is very big and I can't slurp it all at once.

I would like an output like the following

Buenos Aires Afghanistan Italia ... Provincia di Lucca

Someone has any idea?

One of Crete's own prophets has said it: 'Cretans are always liars, evil brutes, lazy gluttons'.
He has surely told the truth.

Replies are listed 'Best First'.
Re: Condition on multiple lines and big file
by kcott (Archbishop) on Dec 06, 2013 at 10:33 UTC

    G'day epimenidecretese,

    "I'm not sure if perl it's the right tool for it."

    Perl would be an admirable tool for this task.

    #!/usr/bin/env perl use strict; use warnings; while (<DATA>) { my ($name, $code) = (split)[0,-1]; print $code eq 'I-GPE' ? " $name" : $. == 1 ? $name : "\n$name"; } print "\n"; __DATA__ Buenos - SPN Buenos B-GPE Aires - SPN Aires I-GPE Afghanistan - SPN Afghanistan B-GPE Europa - SPN Europa B-GPE UE - I UE B-GPE Italia - SPN Italia B-GPE Provincia - SS Provincia B-GPE di - E di I-GPE Lucca - SPN Lucca I-GPE

    Output:

    Buenos Aires Afghanistan Europa UE Italia Provincia di Lucca

    -- Ken

      Your solution is beautiful. I admire its simplicity.

Re: Condition on multiple lines and big file
by choroba (Cardinal) on Dec 06, 2013 at 10:28 UTC
    No need to slurp the file. Process it line by line, remembering just the name:
    my $name = q(); while (<>) { my @cols = split; if ('I-GPE' eq $cols[-1]) { $name .= $cols[0] . ' '; } else { print $name, "\n" if chop $name; $name = $cols[0] . ' '; } } # Do not forget to output the last name. chop $name; print "$name\n";
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Condition on multiple lines and big file
by Anonymous Monk on Dec 06, 2013 at 13:33 UTC
    And, for simpler situations of the same basic problem, also consider awk. This is the original tool from which, in one sense, the earliest Perl interpreters grew. It's still great for text-file munging.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1065950]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2024-04-19 01:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found