Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Using a Regex to parse into keys

by Kage (Scribe)
on Oct 07, 2002 at 00:57 UTC ( #203228=perlquestion: print w/replies, xml ) Need Help??
Kage has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I have strings that are like 4~<font color="#FF0000"><b>,</b></font> and I want to parse them so that the digit preceeding the tilde is the key, and the font hex value is the key's value. Like, $foo{4} would value #FF0000.. Any ideas how to do this, without alot of split()s and foreach()s and stuff?

Replies are listed 'Best First'.
Re: Using a Regex to parse into keys
by jarich (Curate) on Oct 07, 2002 at 02:30 UTC
    Okay, putting aside the arguments that HTML is rarely well formed and that regexps aren't the answer when you can't guarantee the purity of your data etc (ie use HTML::Parser), an answer to your question is as follows:
    $_ = qq{<html><foo>4~<font color="#FF0000"><b>,</b></font><fush> 5~ <fish>#555555<>6~ <font #444444>}; print "$_\n"; my %colours = (m!(\d+)~\s*<\s*font\s+[^#]*(#[A-F0-9]{6})!igs); print $colours{4}; # will print #FF0000

    This code will handle (some cases where) '#' appears elsewhere and will ignore any case where the colour does not contain 6 hex digits. It's case insensitive, and allows spaces all over the place. I still hope that you have some idea of your expected data, as there are many cases that this can still break.

    Hope it helps.


    Update: mdillon pointed out a copy error. Was @colours, needed to be %colours.

Re: Using a Regex to parse into keys
by krusty (Hermit) on Oct 07, 2002 at 02:06 UTC
    The following example assumes these strings are sitting in a file somewhere,
    that every instance of this search pattern has the possibility of more than one digit preceding the tilde,
    and that there won't ever be an empty key.

    { local $/=undef; $string = <FILE>; #where FILE has been opened previously %colors = $string =~ /(\d+)~<font color="(.*?)"/ig; }
    If your strings are sitting in an array, you could join the array $string = join '', @array;
    If you data doesn't precisely match, it may be necessary to make it might be necessary to make the search pattern a little more general, but hope this helps for a start.


    Update: I didn't refresh to catch your replies before I hit submit, but ironically, this addresses your other concern, so it all works out in the end. This method is more efficient than a foreach loop.
Re: Using a Regex to parse into keys
by Felonious (Chaplain) on Oct 07, 2002 at 02:16 UTC
    To do exactly what you asked for, in a non-flexible way...
    #!/usr/bin/perl use Data::Dumper; my $text = join('', <DATA>); my %hash = ($text =~ /(\d+)\~\<font color="(#[0-9A-Fa-f]{6})"\>/g); print Dumper(\%hash); __DATA__ 4~<font color="#FF0000"><b>,</b></font> Some junk 5~<font color="#FF0200"><b>,</b></font> more junk 6~<font color="#FF03 +00"><b>,</b></font>sd 7~<font color="#FF0400"><b>,</b></font> last junk 99 8~<font color="#FF0500"><b>,</b></font>

    [ shh]$ su real
Re: Using a Regex to parse into keys
by kelan (Deacon) on Oct 07, 2002 at 01:46 UTC
    Well, if the font tag always immediately follows the tilde, you could do this:
    $foo{$1} = $2 if (/(\d+)~<font\s+color="(#[[:xdigit:]]+)">/);


    Yak it up with Fullscreen ChatterBox

      The only problem with that, is how do I get it to do that for each occurance of the string.. Each string is held in one value of @eachcolor, so what are some possible methods, I would suppose in using a ForEach()?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://203228]
Approved by Zaxo
[usemodperl]: this is crazy, the web is broken!
[usemodperl]: just because of banking apps on free wifi, and porn, i guess...
usemodperl thinks every1 should serve HTTP on port 80 incase we don't care about encryption, right? wtf

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2018-06-24 09:29 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.