Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Using a Regex to parse into keys

by Kage (Scribe)
on Oct 07, 2002 at 00:57 UTC ( #203228=perlquestion: print w/ replies, xml ) Need Help??
Kage has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I have strings that are like 4~<font color="#FF0000"><b>,</b></font> and I want to parse them so that the digit preceeding the tilde is the key, and the font hex value is the key's value. Like, $foo{4} would value #FF0000.. Any ideas how to do this, without alot of split()s and foreach()s and stuff?
Buzza

Comment on Using a Regex to parse into keys
Select or Download Code
Re: Using a Regex to parse into keys
by kelan (Deacon) on Oct 07, 2002 at 01:46 UTC
    Well, if the font tag always immediately follows the tilde, you could do this:
    $foo{$1} = $2 if (/(\d+)~<font\s+color="(#[[:xdigit:]]+)">/);

    kelan


    Yak it up with Fullscreen ChatterBox

      The only problem with that, is how do I get it to do that for each occurance of the string.. Each string is held in one value of @eachcolor, so what are some possible methods, I would suppose in using a ForEach()?
      Whoo
Re: Using a Regex to parse into keys
by krusty (Hermit) on Oct 07, 2002 at 02:06 UTC
    The following example assumes these strings are sitting in a file somewhere,
    that every instance of this search pattern has the possibility of more than one digit preceding the tilde,
    and that there won't ever be an empty key.

    { local $/=undef; $string = <FILE>; #where FILE has been opened previously %colors = $string =~ /(\d+)~<font color="(.*?)"/ig; }
    If your strings are sitting in an array, you could join the array $string = join '', @array;
    If you data doesn't precisely match, it may be necessary to make it might be necessary to make the search pattern a little more general, but hope this helps for a start.

    Cheers,
    Kristina

    Update: I didn't refresh to catch your replies before I hit submit, but ironically, this addresses your other concern, so it all works out in the end. This method is more efficient than a foreach loop.
Re: Using a Regex to parse into keys
by Felonious (Chaplain) on Oct 07, 2002 at 02:16 UTC
    To do exactly what you asked for, in a non-flexible way...
    #!/usr/bin/perl use Data::Dumper; my $text = join('', <DATA>); my %hash = ($text =~ /(\d+)\~\<font color="(#[0-9A-Fa-f]{6})"\>/g); print Dumper(\%hash); __DATA__ 4~<font color="#FF0000"><b>,</b></font> Some junk 5~<font color="#FF0200"><b>,</b></font> more junk 6~<font color="#FF03 +00"><b>,</b></font>sd 7~<font color="#FF0400"><b>,</b></font> last junk 99 8~<font color="#FF0500"><b>,</b></font>

    [TINPC@perlcabal.com shh]$ su real
Re: Using a Regex to parse into keys
by jarich (Curate) on Oct 07, 2002 at 02:30 UTC
    Okay, putting aside the arguments that HTML is rarely well formed and that regexps aren't the answer when you can't guarantee the purity of your data etc (ie use HTML::Parser), an answer to your question is as follows:
    $_ = qq{<html><foo>4~<font color="#FF0000"><b>,</b></font><fush> 5~ <fish>#555555<>6~ <font #444444>}; print "$_\n"; my %colours = (m!(\d+)~\s*<\s*font\s+[^#]*(#[A-F0-9]{6})!igs); print $colours{4}; # will print #FF0000

    This code will handle (some cases where) '#' appears elsewhere and will ignore any case where the colour does not contain 6 hex digits. It's case insensitive, and allows spaces all over the place. I still hope that you have some idea of your expected data, as there are many cases that this can still break.

    Hope it helps.

    jarich

    Update: mdillon pointed out a copy error. Was @colours, needed to be %colours.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://203228]
Approved by Zaxo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2014-11-26 10:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (166 votes), past polls