Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^5: Reg Exp to handle variations in the matched pattern

by bitingduck (Chaplain)
on Feb 23, 2012 at 06:33 UTC ( [id://955662]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Reg Exp to handle variations in the matched pattern
in thread Reg Exp to handle variations in the matched pattern

I think this does what you want, except I've put in "<stuff>" where you had "$$"-- when I use Perl to tag text I tend to put in HTML or XML-like tags and then use an XML or HTML parser to extract a data structure to stick into a database or whatever.

open(MYINPUTFILE, "<sdnew02.txt"); while (<MYINPUTFILE>){ $_ =~ s/(\s-)$/$1\<stuff\>/; $_ =~ s/(:)$/$1\<stuff\>/; print $_,""; }

You seem to have gotten hung up on worrying about the returns or newlines, when you should have recognized that you needed the end of line anchor. If you want to make the replacement more robust you could put in some matches to arbitrary amounts of whitespace before and after the "-" or ":", but before the $ anchor.

From what you describe, Perl would probably do all the text munging you need. Databases are great for randomly accessing data based on whatever relationships you want to select on, but Perl is hard to beat for dismantling text. Most of what I use Perl for is taking apart text and sticking it into databases for other purposes. Friedl's book "Mastering Regular Expressions" is still a great place to start. There are probably free tutorials floating around the web, but MRE gives clear explanations and gets you up to speed fast.

Replies are listed 'Best First'.
Re^6: Reg Exp to handle variations in the matched pattern
by markjrouse (Initiate) on Feb 23, 2012 at 10:48 UTC
    Thanks for this. This is a great help. Do you happen to have an example of code that you would use to tag a text file? I like the idea of tag with HTML/XML style tags, but I don't have time to build something, so maybe I'll use Perl to convert this text file to a delimited file and use a db to extract text.
      That pretty much was code to tag a text file:
      open(MYINPUTFILE, "<sdnew02.txt"); while (<MYINPUTFILE>){ $_ =~ s/(\s-)$/\<tag\>$1<\/tag\>/g; $_ =~ s/(:)$/\<tag\>$1<\/tag\>/g; print $_,""; }
      All I've done is wrap tags around the found object and stick in a global modifier. replace the search regex and tags with whatever you want to tag. It won't quite work around your line breaks, but you can start from there.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://955662]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-25 12:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found