Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

doubt in storing a data of 2 lines in an array.

by heidi (Sexton)
on Oct 30, 2006 at 13:22 UTC ( [id://581281]=perlquestion: print w/replies, xml ) Need Help??

heidi has asked for the wisdom of the Perl Monks concerning the following question:

hi all, i need a small clarification in the following program. i have a datafile like this.
ENTRY CCHU #type complete TITLE cytochrome c [validated] - human Homo sapiens ORGANISM #formal_name Homo sapiens #common_name man ACCESSIONS A31764; A05676; I55192; A00001 MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGMIYARAJLFGRKTSEKGQAPGYSYTAANKN +KGIIWGEDTLMEYLENPKKYIP ENTRY CCCZ #type complete TITLE cytochrome c - chimpanzee (tentative sequence) ORGANISM #formal_name Pan troglodytes #common_name chimpanzee ACCESSIONS A00002 GDVEKGKKIFIMKCSQCHTSEKVEKGSSSKHKSSSTGPNLHGLMIYARAJFGRKTGSEKQAPGYSYTAAN +KNKGIIWGED ENTRY CCMQR #type complete TITLE cytochrome c - rhesus macaque (tentative sequence) Macaca mulatta ORGANISM #formal_name Macaca mulatta #common_name rhesus macaq +ue ACCESSIONS A00003 GDVEKGKKIFIMKCSQSEKCHTVEKGGSSSSKHKTGPNLHGSSEKEMIYARAJKSEKLFGAAAAAAAARK +TGQAPGYSYTAANKSSSSNKGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE ENTRY CCMKP #type complete TITLE cytochrome c - spider monkey ORGANISM #formal_name Ateles sp. #common_name spider monkey ACCESSIONS A00004 GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLMIYARAJSEKFGSSSSSSSSSSR
i have written a program to save each and every line in a seperate array. this is the program
open (PIR,'/home/guest/sampir.txt'); my @arr = (); while (<PIR>) { chomp; if( /^ENTRY/ ) { $entry = $_ } elsif ( /^(TITLE)\s+(\S.*)/ ) { $title = "$1\n\t $2" } elsif ( /^(ORGANISM)\s+(\S.*)/ ) { $org = "$1\n\t $2" } elsif ( /^ACCESSIONS/ ) { $acc = $_ } else { push @se, $_; } }
but the line which is under the TITLE heading is not giving the 2nd line of its data. instead it gives only the first line. eg; when i print the title of the first entry it prints only "cytochrome c validated - human "and its not printing the second line "Homo sapiens"... How do i print the second line too in the same first line? plz help out. thanks.

Replies are listed 'Best First'.
Re: doubt in storing a data of 2 lines in an array.
by davorg (Chancellor) on Oct 30, 2006 at 13:42 UTC

    You are reading the file a line at a time. So when you process the TITLE line, the data in $_ only contains the line that begins with TITLE. So that is all that ends up in $title. The rest of the TITLE data is processed on the next iteration of the loop (and, probably ends up as the first element in @se).

    There are a couple of ways to solve this:

    1. Write a more complex parser. Keep a note of the _previous_ line's tag and if the current line starts with whitespace then append this data to the end of the previous tag.
    2. Read all the whole file into memory and parse it using more complex regular expressions.
    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: doubt in storing a data of 2 lines in an array.
by johngg (Canon) on Oct 30, 2006 at 14:30 UTC
    davorg's reply suggested one way to approach the problem would be to read the whole file into memory then parse with regular expressions. The following script shows one possible way of doing this using two stages, the first to break into records and the second to break each record into fields. Here it is

    use strict; use warnings; my $rxRecord = qr {(?xs) (ENTRY.*?\n) (?=ENTRY|\z) }; my $rxFieldHdrs = qr{(?:ENTRY|TITLE|ORGANISM|ACCESSIONS)}; my $rxField = qr {(?xs) ($rxFieldHdrs.*?\n) (?=$rxFieldHdrs|\z) }; my $fileText; { local $/; $fileText = <DATA>; } my @records = $fileText =~ m{$rxRecord}g; foreach my $record (@records) { print qq{$record}, q{+} x 50, qq{\n}; my @fields = $record =~ m{$rxField}g; foreach my $field (@fields) { print qq{$field}, q{-} x 50, qq{\n}; } print q{*} x 50, qq{\n}; } __END__ ENTRY CCHU #type complete TITLE cytochrome c [validated] - human Homo sapiens ORGANISM #formal_name Homo sapiens #common_name man ACCESSIONS A31764; A05676; I55192; A00001 MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGMIYARAJLFGRKTSEKGQAPGYSYTAANKN +KGIIWGEDTLMEYLENPKKYIP ENTRY CCCZ #type complete TITLE cytochrome c - chimpanzee (tentative sequence) ORGANISM #formal_name Pan troglodytes #common_name chimpanzee ACCESSIONS A00002 GDVEKGKKIFIMKCSQCHTSEKVEKGSSSKHKSSSTGPNLHGLMIYARAJFGRKTGSEKQAPGYSYTAAN +KNKGIIWGED ENTRY CCMQR #type complete TITLE cytochrome c - rhesus macaque (tentative sequence) Macaca mulatta ORGANISM #formal_name Macaca mulatta #common_name rhesus macaq +ue ACCESSIONS A00003 GDVEKGKKIFIMKCSQSEKCHTVEKGGSSSSKHKTGPNLHGSSEKEMIYARAJKSEKLFGAAAAAAAARK +TGQAPGYSYTAANKSSSSNKGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE ENTRY CCMKP #type complete TITLE cytochrome c - spider monkey ORGANISM #formal_name Ateles sp. #common_name spider monkey ACCESSIONS A00004 GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLMIYARAJSEKFGSSSSSSSSSSR

    and here is the output showing for each record the whole record then each individual field. As you can see, your two-line title is preserved.

    I hope this is of use

    Cheers,

    JohnGG

      hi john, thank ya for the reply. the program works out very well, and the coding was really smart, i just need to clarify one last doubt of mine, ie., i am not able to print the TITLE's content in the same line, its printing in 2 lines watever i do. plz reply. thank u once again.
        You could either do a global substitution something like $field =~ s{\n}{ }g to replace any newline with a space or you could achieve the same thing with split and join, something like $field = join q{ }, split m{\n}, $field;. In each case you are going to have to handle a big gap in your line because of the indentation of the second line of the title. However, this post should give you enough clues about s{this}{the other} to solve that for yourself. Big hint, \s+ means one or more white-space characters.

        Best of luck,

        JohnGG

Re: doubt in storing a data of 2 lines in an array.
by Hofmator (Curate) on Oct 30, 2006 at 13:57 UTC
    heidi, as in your last posts, your program is a bit confusing, your description and program do not really fit together, ...

    A couple of pointers and maybe then you can clarify what you want to achieve

    • Your program declares a @arr variable but never uses it. Please try to post the smallest code possible that exhibits your problem.
    • You are saying that you are saving each line of the file in a separate array. That is not the case. You are overwriting scalar variables (like $entry or $title) each time you encounter a matching line. So e.g. at the end of your loop, you have the last ORGANISM line in the scalar variable $org.
    • You are reading through your file line by line. How do you expect the 2nd line of a title to end up in the $title variable? None of your regular expressions match on this 2nd line so the else brach is executed and this line is pushed onto the array @se.

    -- Hofmator

    Code written by Hofmator and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: doubt in storing a data of 2 lines in an array.
by Fletch (Bishop) on Oct 30, 2006 at 14:08 UTC

    Not to mention that if this is a common format BioPerl may already have an interface to read it off the shelf.

      Good point! This page contains the available formats ...

      -- Hofmator

      Code written by Hofmator and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: doubt in storing a data of 2 lines in an array.
by shmem (Chancellor) on Oct 30, 2006 at 15:10 UTC
    i have written a program to save each and every line in a seperate array. this is the program

    which does not meet it's purpose, since you are saving all lines of your data file which don't begin with either ENTRY, TITLE, ORGANISM or ACCESSIONS into a single array which you name @se.

    Let's look at your data file. It seems to be composed of multi-line records, in which each field begins on a separate line. Each field has an identifier up front (except the last record field which is just a sequence of chars with no blank in it), and some fields appear to be multi-line as well.

    Since there is no record separator, you can only tell that all fields of a record are read when all field contents are read. Since your records appear to be ordered, I assume that is the case when that single-word line appears. All fields are stored in an anonymous array, which is pushed onto an array when done reading. After storing each record, a new anonymous array is initialized for the next record:

    my $file = '/home/guest/sampir.txt'); open (PIR, '<', $file) or die "Can't read '$file': $!\n"; my @arr = (); my $se = []; # anonymous record array while(<PIR>) { chomp; if (/^(\w+)\s+/) # new field identifier, followed by blanks { push @$se, $_; } elsif (s/^\s+/ /) # if we can strip leading blanks, # it's a continuation line { $se->[-1] .= $_; # append to last field of this record } elsif(/^\w+$/) # must be the last field of the record { push @$se, $_; # save the last field push @arr, $se; # save the record array reference $se = []; # and make a new array reference for the next + record } else { die "Unknown line type at line $. of '$file'\n"; } }

    Now you have all records in an array of arrays. See perldsc.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Thank you very much... i learnt how to do it, i can manage such problems myself later. thanks again

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://581281]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-26 09:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found