I am looking to extract patterns of URL from given sites.
is a valid question-answer node.
Where as http://www.perlmonks.org/index.pl?node=Recently%20Active%20Threads
is not. There is a certain pattern follows here that node_id=\d+ is a valid question-answer node. Extracting these type of patterns from given site, can help me to determine the nature of the link. I like to do these site-wide, automatically.
Hopefully, I am making sense here.