Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re^7: Parsing HTML/XML with Regular Expressions (regex)

by RonW (Parson)
on Oct 20, 2017 at 21:29 UTC ( #1201776=note: print w/replies, xml ) Need Help??

in reply to Re^6: Parsing HTML/XML with Regular Expressions (regex)
in thread Parsing HTML/XML with Regular Expressions

I ran your version of my code and got the same output you did.

Since I already discovered the embedded newlines in the elements list, I added tr/\n//d; at the top of the for loop:

for (@elements) { tr/\n//d;

After doing that, the id for Saturday picked up correctly. Also, out of curiosity, I removed the s/\W+//g; you added. The result was:

Zero=, One=Monday, Two=Tuesday, Three=Wednesday, Four=Thursday, Five=F +riday, Six=Saturday, Foo= Sundaybbbdddeeeggg

So, Saturday is cleaned up.

I know why the id for Sunday is Foo, but still not sure why the "bbbdddeeeggg" is picked up. I will have to step through the code to see what's happening.

As for the  , that's encoding dependent. Not sure why it would get excluded other than by explicitly filtering out non-ASCII characters.

The y is the y in Sunday. Just requires entity decoding.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1201776]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2020-08-05 08:10 GMT
Find Nodes?
    Voting Booth?
    Which rocket would you take to Mars?

    Results (35 votes). Check out past polls.