Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
This regex will work as well:
/molecule_idref="([^"]+)/
This will match everything in between double quotes after molecule_idref=. If you are sure your id only contains numbers then you indeed better check for "digits" (\d+) as was already suggested.

Note that when parsing XML-files, there is no guarantee white-space, EOL, ... will be where you expect them to be, so reading such files on a line by line basis or expecting your "start of line" anchors to always be reliable may be causing subtle errors. What would you have done if your tags did not start at the beginning of the line, or the tag was broken over several lines?

Consider using an XML-parser, such as XML::Simple which will turn your XML into a nice Perl-datastructure.

For example:

use strict; use warnings; use XML::Simple; use Data::Dumper; my $xml; { local $/=''; $xml = <DATA>; } my $xs = XML::Simple->new(); my $ref = $xs->XMLin($xml); print Dumper($ref); __DATA__ <xml><ComplexComponent1 molecule_idref="1"/> <ComplexComponent2 molecule_idref="2"/><ComplexComponent3 molecule_idr +ef="3"/> <ComplexComponent4 molecule_idref="4"/><ComplexComponent5 molecule_idref="5"/> </xml>
Will turn the mess in the __DATA__ section into:
$VAR1 = { 'ComplexComponent3' => {'molecule_idref' => '3'}, 'ComplexComponent5' => {'molecule_idref' => '5'}, 'ComplexComponent1' => {'molecule_idref' => '1'}, 'ComplexComponent4' => {'molecule_idref' => '4'}, 'ComplexComponent2' => {'molecule_idref' => '2'} };
a nice hash-of-hashes which you can access in any "Perlish"-way.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James


In reply to Re: extract ids by CountZero
in thread extract ids by snape

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-04-24 09:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found