Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^7: Parsing HTML/XML with Regular Expressions (regex)

by RonW (Parson)
on Oct 20, 2017 at 21:29 UTC ( #1201776=note: print w/replies, xml ) Need Help??


in reply to Re^6: Parsing HTML/XML with Regular Expressions (regex)
in thread Parsing HTML/XML with Regular Expressions

I ran your version of my code and got the same output you did.

Since I already discovered the embedded newlines in the elements list, I added tr/\n//d; at the top of the for loop:

for (@elements) { tr/\n//d;

After doing that, the id for Saturday picked up correctly. Also, out of curiosity, I removed the s/\W+//g; you added. The result was:

Zero=, One=Monday, Two=Tuesday, Three=Wednesday, Four=Thursday, Five=F +riday, Six=Saturday, Foo= Sundaybbbdddeeeggg

So, Saturday is cleaned up.

I know why the id for Sunday is Foo, but still not sure why the "bbbdddeeeggg" is picked up. I will have to step through the code to see what's happening.

As for the  , that's encoding dependent. Not sure why it would get excluded other than by explicitly filtering out non-ASCII characters.

The y is the y in Sunday. Just requires entity decoding.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1201776]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2019-08-19 08:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If you were the first to set foot on the Moon, what would be your epigram?






    Results (138 votes). Check out past polls.

    Notices?