Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

parse problem

by Anonymous Monk
on Apr 20, 2003 at 01:35 UTC ( #251753=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I want to get the number from a html source file, I want to parse the data like:
and get 12345678, I did as below:
$data = gi|12345678|ref|NP_001234.1|; @data = split ('gi|',$data); @data1 = split ('|ref',$data[1]); $number = $data1[0];

I got e, g,..., some weird letter, when I changed the code to below:
$data = gi|12345678|ref|NP_001234.1|; @data = split ('gi',$data); @data1 = split ('ref',$data[1]); $number = $data1[0];
I got:|12345678|, I try use regular expression to remove the |:
$number =~ m/[0-9]*/;

I got the same thing which has |12345678|, What can I do? Please help and Thanks in advance! Please help and Thanks in advance!

Comment on parse problem
Select or Download Code
Re: parse problem
by dpuu (Chaplain) on Apr 20, 2003 at 01:46 UTC
    Your problem may be that the first arg to split is a regular extression -- and the vertical bar is a pattern separator with an empty extression on its left -- which can always match. If you are only wanting the one number you show, then your could use:
    $data =~ /gi\|(\d+)\|ref/ and $number = $1;
    Note that the vertical bar is escaped using the backslash. --Dave
Re: parse problem
by DrManhattan (Chaplain) on Apr 20, 2003 at 02:05 UTC
    The first argument to split() needs to be a regular expression matching the string that delimits the fields in your data. In your case, the fields in your line are separated by a '|', so the code could look like this:
    #!/usr/bin/perl use strict; my $data = 'gi|12345678|ref|NP_001234.1|'; my @data = split /\|/, $data; my $number = $data[1];
    Or more concisely:
    #!/usr/bin/perl use strict; my $data = 'gi|12345678|ref|NP_001234.1|'; my $number = (split(/\|/, $data))[1];


Re: parse problem
by artist (Parson) on Apr 20, 2003 at 04:50 UTC
    You have already received good solutions.
    Your algorithm should be:
    A. split the data with the pattern . (pipe symbol in your case)
    B. get the second item from the result of the above split.
    Learn more about split.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://251753]
Approved by vek
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-09-22 04:59 GMT
Find Nodes?
    Voting Booth?

    How do you remember the number of days in each month?

    Results (178 votes), past polls