Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

parse problem

by Anonymous Monk
on Apr 20, 2003 at 01:35 UTC ( #251753=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I want to get the number from a html source file, I want to parse the data like:
and get 12345678, I did as below:
$data = gi|12345678|ref|NP_001234.1|; @data = split ('gi|',$data); @data1 = split ('|ref',$data[1]); $number = $data1[0];

I got e, g,..., some weird letter, when I changed the code to below:
$data = gi|12345678|ref|NP_001234.1|; @data = split ('gi',$data); @data1 = split ('ref',$data[1]); $number = $data1[0];
I got:|12345678|, I try use regular expression to remove the |:
$number =~ m/[0-9]*/;

I got the same thing which has |12345678|, What can I do? Please help and Thanks in advance! Please help and Thanks in advance!

Comment on parse problem
Select or Download Code
Re: parse problem
by dpuu (Chaplain) on Apr 20, 2003 at 01:46 UTC
    Your problem may be that the first arg to split is a regular extression -- and the vertical bar is a pattern separator with an empty extression on its left -- which can always match. If you are only wanting the one number you show, then your could use:
    $data =~ /gi\|(\d+)\|ref/ and $number = $1;
    Note that the vertical bar is escaped using the backslash. --Dave
Re: parse problem
by DrManhattan (Chaplain) on Apr 20, 2003 at 02:05 UTC
    The first argument to split() needs to be a regular expression matching the string that delimits the fields in your data. In your case, the fields in your line are separated by a '|', so the code could look like this:
    #!/usr/bin/perl use strict; my $data = 'gi|12345678|ref|NP_001234.1|'; my @data = split /\|/, $data; my $number = $data[1];
    Or more concisely:
    #!/usr/bin/perl use strict; my $data = 'gi|12345678|ref|NP_001234.1|'; my $number = (split(/\|/, $data))[1];


Re: parse problem
by artist (Parson) on Apr 20, 2003 at 04:50 UTC
    You have already received good solutions.
    Your algorithm should be:
    A. split the data with the pattern . (pipe symbol in your case)
    B. get the second item from the result of the above split.
    Learn more about split.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://251753]
Approved by vek
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2014-09-01 08:03 GMT
Find Nodes?
    Voting Booth?

    The best computer themed movie is:

    Results (299 votes), past polls