Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Remove line above matching criteria

by Anonymous Monk
on Nov 20, 2006 at 17:14 UTC ( [id://585093]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

What is the best way to remove the line that matches a criteria as well as the line above it? For example, I have a text file that contains user names and addresses. I need to extract the user names and ID's for a confidential project:

Doe, John
<id:>123456789
<street:>1234 Broadway Circ.
<city:> Metropolis
<state:>XX
<phone:>000-000-0000

I am needing to remove the ID and name for each entry in the file. I would print the four lines below the ID, but some of the data is inconsistent such as no phone number, or no street info. Thanks for your help, you guys are great!

Replies are listed 'Best First'.
Re: Remove line above matching criteria
by idsfa (Vicar) on Nov 20, 2006 at 17:28 UTC

    You haven't shown us what you have tried so far, nor what your criteria for "best" are (fastest, least memory, simplest, etc). We cannot easily help you if you don't do your homework first.

    From what you've given us, I'd egrep -v the data from the command line and skip perl altogether ...


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
      If you're going to go that route, you can also (with GNU grep, at least) grep -B 1 '<id:>' to get the ID line together with the previous line.

      ---
      It's all fine and dandy until someone has to look at the code.
Re: Remove line above matching criteria
by jbert (Priest) on Nov 20, 2006 at 18:05 UTC
    One way to do this (discard N lines previous to a match) is to keep a buffer of seen lines which is at least N long. You print lines which overflow out normally and then just discard the buffer on a match. Lastly, remember to print anything in the buffer at the end.

    In code:

    #!/usr/bin/perl use strict; use warnings; # We don't really need an array for one line, but it seems # conceptually nicer (and generalises more easily) my $max_buffer_size = 1; my @buffer; my $line; while ($line = <ARGV>) { push @buffer, $line; # Replace the pattern match with your criterion if this # isn't right. @buffer = () if $line =~ /^.id/; if (scalar @buffer > $max_buffer_size) { print shift @buffer; } } print @buffer;
Re: Remove line above matching criteria
by swampyankee (Parson) on Nov 20, 2006 at 19:17 UTC

    I can, off the top of my head, think of at least three ways, which I view as distinct:

    • Use Tie::File, which lets you treat a file (more or less) as an array.
    • Use File::ReadBackwards, which (duh!) reads a file backwards.
    • Read the file a record at a time, but keep track of the contents of the current and previous record, and print them as needed.

    If your data are as shown (a name, followed by lines starting with labels, such as <id:>, <city:>, etc), something like this may work:

    #!perl use strict; use warnings; open($my fh, "<", $infile) or die "Could not open $infile because $!\n +"; while(<$fh>){ next if /^<id:>|^[A-Za-z]/; print; }

    Now, if my regex brain is turned on, this regex should skip lines which start with <id:> or start with letters. Incidently, how is this anonymyzing data if you're leaving addresses and phone numbers?


    Update

    Having noticed ww's comment in a message, I may have misread or confused the title ("Remove line above matching criteria") and "extract name and id". The regex I put in the sample above (unless I screwed it up) should skip the name and id; to skip everything else one could change the "if" to "unless".

    emc

    If it's not foggy out, I need new glasses.

Re: Remove line above matching criteria
by madbombX (Hermit) on Nov 20, 2006 at 17:56 UTC
    To reiterate what idsfa said, we need to know what you have tried, what the record separators are, etc, etc. Just generally more information.

    That being said, you could always load each record into a variable (hash or array). Then pass that variable off to a function that checks the data for inconsistancies, or whatever you are looking for. Removes whatever needs to be removed (since manipulating a hash or array is simple if the structure doesn't change), and then return the new variable to the main program. Obviously this can also be done via an OO method (and this is likely preferred since this sounds to be repetative data).

Re: Remove line above matching criteria
by swampyankee (Parson) on Nov 21, 2006 at 16:21 UTC

    Your first paragraph ("extract user names and ID's…") and your final paragraph ("remove the ID and name for each entry") are contradictory.

    "Extract" usually means "copy from a file (or database, etc) for use elsewhere". "Remove" usually means "erase"; they are not (in the English version of computerese) synonyms, although they are in normal English (having "a tooth extracted" and "a tooth removed" both result in one fewer teeth in one's mouth).

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://585093]
Approved by andyford
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-19 15:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found