Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

newbie need help with a simple code

by perlmonk007 (Novice)
on Mar 05, 2013 at 01:16 UTC ( #1021730=perlquestion: print w/ replies, xml ) Need Help??
perlmonk007 has asked for the wisdom of the Perl Monks concerning the following question:

I have a subroutine outputs an array like this

ATOM 1276 CB ASP B 63 52.957 58.788 67.683 1.00 44.59 C

ATOM 1280 N GLY B 64 52.075 55.496 66.522 1.00 35.29 N

ATOM 1281 CA GLY B 64 51.005 54.534 66.322 1.00 32.64 C

ATOM 1282 C GLY B 64 49.693 55.103 65.854 1.00 30.87 C

ATOM 1283 O GLY B 64 48.630 54.545 66.092 1.00 27.93 O

ATOM 1284 N LYS B 65 49.739 56.255 65.189 1.00 31.10 N

ATOM 1285 CA LYS B 65 48.527 56.912 64.720 1.00 33.91 C

ATOM 1286 C LYS B 65 48.629 57.147 63.216 1.00 32.04 C

ATOM 1287 O LYS B 65 49.675 57.571 62.721 1.00 31.87 O

here each line is an element of an array, but i need only certain columns of this output. for example the first row is all ATOM, but i do not need any of that, could anyone please tell me, how to extract the required rows, a simple psuedocode would help me a lot.

UPDATE #1

ok i used the regular expressions code that was suggested and I am facing a problem with the output, the array in the output is empty, could some one tell me where I am going wrong

the code is after i get the array from the subroutine, the array consists of the output as the format above and every line of the output is a single element of the array. If i type $line1 i get the first line.

my @opin = &attributes(); my @lines = @opin; my ($selected, @selections); for my $lines(@lines) { if ( $lines =~ /ATOM\s+ \d+\s+ ([A-Z]{1,2})\s+ ([A-Z]{3})\s+ [AB]\s+ \d+\s+ \d+\.\d+\s+ \d+\.\d+\s+ \d+\.\d+\s+ \d+\.\d+\s+ \d+\.\d+\s+ [CNO]$/x ) { $selected = $1 . " | " . $2 ; push @selections, $selected; } } print O "@selections"; close (O); close (I);

any help would be appreciated.

Comment on newbie need help with a simple code
Download Code
Re: newbie need help with a simple code
by LanX (Canon) on Mar 05, 2013 at 01:27 UTC
    unclear, do you need certain columns or certain rows?

    using split should help in both cases

    DB<108> $line='ATOM 1287 O LYS B 65 49.675 57.571 62.721 1.00 31.87 +O' => "ATOM 1287 O LYS B 65 49.675 57.571 62.721 1.00 31.87 O" DB<109> @col =split / /,$line => ( "ATOM", 1287, "O", "LYS", "B", 65, "49.675", "57.571", "62.721", "1.00", "31.87", "O", ) DB<110> @col[1..3,5] => (1287, "O", "LYS", 65)

    Cheers Rolf

Re: newbie need help with a simple code
by ww (Bishop) on Mar 05, 2013 at 03:14 UTC

    You could use a regex, but as you see, that can get clumsy (it need not be quite this clumsy; this is babytalk for clarity). Also, it assumes that each line of data you show is actually "an element of an array" as stated, and is NOT itself an array from an AoA:

    #!/usr/bin/perl use 5.016; use warnings; use strict; use Data::Dumper; # 1021730 my @arr = ("ATOM 1276 CB ASP B 63 52.957 58.788 67.683 1.00 44.59 C", "ATOM 1280 N GLY B 64 52.075 55.496 66.522 1.00 35.29 N", "ATOM 1281 CA GLY B 64 51.005 54.534 66.322 1.00 32.64 C", "ATOM 1282 C GLY B 64 49.693 55.103 65.854 1.00 30.87 C", "ATOM 1283 O GLY B 64 48.630 54.545 66.092 1.00 27.93 O", "ATOM 1284 N LYS B 65 49.739 56.255 65.189 1.00 31.10 N", "ATOM 1285 CA LYS B 65 48.527 56.912 64.720 1.00 33.91 C", "ATOM 1286 C LYS B 65 48.629 57.147 63.216 1.00 32.04 C", "ATOM 1287 O LYS B 65 49.675 57.571 62.721 1.00 31.87 O" #Col1 2 3 4 5 6 7 8 9 10 11 12 ); # So, let's say columns 3, 6, 7, 10 and 11 are the ones you want to se +lect: my ($selected, @selections); for my $line(@arr) { if ( $line =~ /ATOM\s # leading "ATOM" followed by a space -- +e.g. Column 1 \d{4}\s # 4 decimal digits followed by a space ( +FBAS) ([A-Z]{1,2})\s # CAPTURE to $1: One or Two UC letters F +BAS from Col 3 [A-Z]{3}\s # Three UC letters B\s # "B" -- constant in example data at Col + 5 (\d\d)\s # CAPTURE TO $2 exactly 2 digits from Co +l 6 (\d+\.\d+)\s # CAPTURE to $3: digit(s), decimal pt, d +igit(s) Col 7 \d+\.\d+\s # 1 or more digits, decimal pt, digits C +ol 8: \d+\.\d+\s # Col 9 \d+\.\d+\s # Col 10 (\d\d\.\d\d)\s # CAPTURE to $4 exactly 2 digits, decima +l, 2 digits Col 11 ([CNO]$)/x ) # CAPTURE TO $5: one UC "C", "N", or "O" + adjacent to EOL { $selected = $1 . " | " . $2 . " | " . $3 . " | " . $4. " +| " . $5; say $selected; # for DEMO and DEBUG only push @selections, $selected; } # .... and do whatever sleight-of-hand you need with @selections +... } say Dumper @selections;
      thank you so much for the prompt reply, could you please tell me what this line does? $selected = $1 . " | " . $2 . " | " . $3 . " | " . $4. "
        This is using the regex captures to create a string. see perldoc perlre

      help anyone????

Re: newbie need help with a simple code
by kielstirling (Scribe) on Mar 05, 2013 at 03:55 UTC
    I would use split.
    #!/usr/bin/perl use Modern::Perl; use Data::Dumper; my @lines = ( "ATOM 1276 CB ASP B 63 52.957 58.788 67.683 1.00 44.59 C", "ATOM 1280 N GLY B 64 52.075 55.496 66.522 1.00 35.29 N", "ATOM 1281 CA GLY B 64 51.005 54.534 66.322 1.00 32.64 C", "ATOM 1282 C GLY B 64 49.693 55.103 65.854 1.00 30.87 C", "ATOM 1283 O GLY B 64 48.630 54.545 66.092 1.00 27.93 O", "ATOM 1284 N LYS B 65 49.739 56.255 65.189 1.00 31.10 N", "ATOM 1285 CA LYS B 65 48.527 56.912 64.720 1.00 33.91 C", "ATOM 1286 C LYS B 65 48.629 57.147 63.216 1.00 32.04 C", "ATOM 1287 O LYS B 65 49.675 57.571 62.721 1.00 31.87 O", ); my @wants; for my $line (@lines) { push @wants, join ' | ', (split " ", $line)[3,6,7,10,11]; } print Dumper(\@wants);
      I tried this, I did not understand what the pipeline does though. also it is not giving the desired result, will try modifying it a little
        It is a simple example of how you could use split to manage a line of text like you supplied. It will not work out of the box. You will in fact have to learn what the code does and customize it to your needs ... or even refactor it in a way that suits you better
        ... I did not understand what the pipeline does ...

        The  | (pipe) character merely serves as a visual demarcation in the code example to allow the result of the operations to be better seen. Same thing as
            $selected = $1 . " | " . $2 . " | " . $3 . " | " . $4. " | " . $5;
        here.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1021730]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2014-09-16 17:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (36 votes), past polls