Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

reading file

by aseee (Novice)
on Nov 29, 2012 at 10:11 UTC ( #1006220=perlquestion: print w/ replies, xml ) Need Help??
aseee has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a file in which

ID AaaI ET R2 AC RB00001; OS Acetobacter aceti ss aceti PT XmaIII RS CGGCCG, 1; CR . RN [1] RA Tagami H., Tayama K., Tohyama T., Fukaya M., Okumura H., Kawamura + Y., RA Horinouchi S., Beppu T.; RL FEMS Microbiol. Lett. 56:161-166(1988). //

patterns repeats itself hundred of times. What I want is to store the AaaI in Id column, R2 in ET column, RB00001 in AC, Acetobacter aceti ss aceti in OS column, XmaIII in PT column and CGGCCG in RS column of an database table.I know it it can be done in regular expression but I am unable to grep regular expression. Please also give some basic and advance links to tutorials of regular expression.

Comment on reading file
Download Code
Re: reading file
by Anonymous Monk on Nov 29, 2012 at 10:37 UTC

    I know it it can be done in regular expression but I am unable to grep regular expression. Please also give some basic and advance links to tutorials of regular expression.

    Tutorials, perlintro, perlrequick

Re: reading file
by tobyink (Abbot) on Nov 29, 2012 at 10:37 UTC

    This is SwissProt format, right? There exist a number of existing SwissProt tools on CPAN. Have you investigated any of them? If they are not sufficient for your needs, then you could try peeking at their source code to see how they handle parsing.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: reading file
by marto (Chancellor) on Nov 29, 2012 at 10:39 UTC
Re: reading file
by tobyink (Abbot) on Nov 29, 2012 at 11:15 UTC

    Here's a fun parsing example though...

    use MooX::Struct Record => [qw( $id $et @ac $os $pt @rs @ra )], Person => [qw( $surname $initials )], ; use Data::Dumper; my %IS_PERSON = ( ra => 1, ); my %IS_LIST = ( ac => 1, rs => 1, ra => 1, ); my %record; my @records; while (<DATA>) { chomp; my ($field, $value) = /^(..)\s*(.+)$/; $field = lc $field; if ($field eq 'id' and keys %record) { push @records, Record->new(%record); %record = (); # start new record } if ($IS_LIST{$field}) { push @{$record{$field}}, map { $IS_PERSON{$field} ? Person[split] : $_ } split m{,\s*}, $value; } else { $record{$field} = $IS_PERSON{$field} ? Person[split / /, $valu +e] : $value; } } # EOF, push last record push @records, Record->new(%record); print $records[1]->ra->[0]->surname; __DATA__ ID AaaI ET R2 AC RB00001; OS Acetobacter aceti ss aceti PT XmaIII RS CGGCCG, 1; RA Tagami H., Tayama K., Tohyama T., Fukaya M., Okumura H., Kawamura + Y., RA Horinouchi S., Beppu T.; ID AaaII ET R2 AC RB00001; OS Acetobacter aceti ss aceti PT XmaIII RS CGGCCG, 1; RA Horinouchi S., Beppu T.; ID AaaIII ET R2 AC RB00001; OS Acetobacter aceti ss aceti PT XmaIII RA Horinouchi S., Beppu T.; RS CGGCCG, 1;
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: reading file
by bart (Canon) on Nov 29, 2012 at 12:31 UTC
    I don't see why you need a regular expression for this. Unless your problem is more complex than what you described here... Here is basically what I'd do:
    my %row; while(<>) { chomp; my($key, $value) = split ' ', $_, 2 or next; $row{$key} = $value; }
    To test, store your data in a text file and use the file name as the argument for the test script.

    Now all data are in a hash. You can see what's in there:

    use Data::Dumper; print Dumper \%row;
    To put it in an SQL database, I prefer to use DBIx::Simple with support of SQL::Abstract, for which the code could simply be:
    # $db is the DBIx::Simple database connection handle object $db->insert($table, \%row);

    p.s. The article that got me on my way in regular expressions, is Tom Christiansen's newsgroup post "Irregular Expressions" which has been republished on the net and even on CPAN under the name "FMTEYEWTK (= Far More Than Everything You Ever Wanted To Know) about regexes". You can find a copy here.

    It's ancient (duh) and contains some obsolete remarks, but it's still excellent.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1006220]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (15)
As of 2014-11-26 15:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (171 votes), past polls