Re: Reg Exp to handle variations in the matched pattern

Replies are listed 'Best First'.
Re^2: Reg Exp to handle variations in the matched pattern by markjrouse (Initiate) on Feb 22, 2012 at 13:23 UTC
Essentially, it's match any text where there is: a space, followed by a dash, followed by a carriage return OR a colon, followed by a carriage return BUT NOT a colon, followed by carriage return, followed by a digit, or a letter. One of the text files is actually located here: http://www.treasury.gov/resource-center/sanctions/SDN-List/Documents/sdnew02.txt I'm not interested in the text before the colon, as I want to search and replace, but having problems getting the regexp just write.	[reply]
Re^3: Reg Exp to handle variations in the matched pattern by moritz (Cardinal) on Feb 22, 2012 at 14:19 UTC
a space, followed by a dash, followed by a carriage return OR a colon, followed by a carriage return So far that's simple `/ -[:r\r]\r/` BUT NOT a colon, followed by carriage return If you're looking for two carriage returns in a row, then you'll never find something where the first carriage return is followed by a colon (because then it's not two carriage returns in a row, d'oh), so I don't see why you emphasize it like that. followed by carriage return, followed by a digit, or a letter. `\r\w` One of the text files is actually located here: http://www.treasury.gov/resource-center/sanctions/SDN-List/Documents/sdnew02.txt The pattern you describe matches nowhere in that file; in fact I can't find a single occurence of a carriage return in that file. If you describe what information you want to extract from that file, we might be able to help you. But right now it seems that you don't have a clear mental image yourself, so it's pretty hard to help you. Perl 6 - second systems done right	[reply] [d/l] [select]
Re^4: Reg Exp to handle variations in the matched pattern by markjrouse (Initiate) on Feb 22, 2012 at 16:32 UTC
Ultimately, I'm looking to ascertain how Perl could parse this file to turn it into a structured format. My first thoughts are to update the file using Perl to break out elements that would then constitute a line, then import these lines into a database to extract out the various fields. Unless, Perl is able to do this better and more efficiently. I'm still learning Perl, so I'm sure there is a better way of doing it in Perl. If you look at that file, from line 27: `Licensing at 202/622-2480. The following changes have occurred with respect to the Office of Foreign Assets Control Listing of Specially Designated Nationals and Blocked Persons since January 1, 2002: 01/09/02: The following have been named as "Specially Designated Global Terrorists" [SDGTs] -` [download] There are two distinct patterns that I'm trying to match here, hence my original regexp `(\s-\r)\|(:\r)`. After the "January 1,2002:" text is a cariage return, line feed x2. Hex values 0D 0A 0D 0A. I'm looking to insert a string between ":" and the cariage return. So the first pattern is `/(:)\r\n\r\n/` Therefore, my substuition code is this `s/(:)\r\n\r\n/\1\$\$\n/g` but of course this insertion is not working It may be my hex/text editor, but It tells me there are lots of carriage returns in this data. The second pattern is after the "01/09/02: The following have been named as "Specially Designated Global Terrorists" SDGTs -" text, where the dash at the end is proceeded by a space, and followed by a carriage return, new line feed x2, so my match regexp is `/(\s-)\r\n\r\n/` Therefore, my substuition code is this `s/(\s-)\r\n\r\n/\1\$\$\n/g` but of course this insertion is not working The subsequent result would be: `Licensing at 202/622-2480. The following changes have occurred with respect to the Office of Foreign Assets Control Listing of Specially Designated Nationals and Blocked Persons since January 1, 2002:$$ 01/09/02: The following have been named as "Specially Designated Global Terrorists" [SDGTs] -$$` [download] sorry for it not being much clearer. It's a bit difficult to explain.	[reply] [d/l] [select]
Re^5: Reg Exp to handle variations in the matched pattern by bitingduck (Chaplain) on Feb 23, 2012 at 06:33 UTC
Re^6: Reg Exp to handle variations in the matched pattern by markjrouse (Initiate) on Feb 23, 2012 at 10:48 UTC
Some notes below your chosen depth have not been shown here
Re^4: Reg Exp to handle variations in the matched pattern by markjrouse (Initiate) on Feb 22, 2012 at 17:02 UTC
Hi Moritz, Yes your right. I've just re-downloaded the file and there are no carriage returns. I'll try this again.	[reply]


Don't ask to ask, just ask
	PerlMonks