|No such thing as a small change|
Re^7: Parsing HTML/XML with Regular Expressions (regex)by RonW (Parson)
|on Oct 20, 2017 at 21:29 UTC||Need Help??|
I ran your version of my code and got the same output you did.
Since I already discovered the embedded newlines in the elements list, I added tr/\n//d; at the top of the for loop:
After doing that, the id for Saturday picked up correctly. Also, out of curiosity, I removed the s/\W+//g; you added. The result was:
So, Saturday is cleaned up.
I know why the id for Sunday is Foo, but still not sure why the "bbbdddeeeggg" is picked up. I will have to step through the code to see what's happening.
As for the  , that's encoding dependent. Not sure why it would get excluded other than by explicitly filtering out non-ASCII characters.
The y is the y in Sunday. Just requires entity decoding.