Ultimately, I'm looking to ascertain how Perl could parse this file to turn it into a structured format. My first thoughts are to update the file using Perl to break out elements that would then constitute a line, then import these lines into a database to extract out the various fields. Unless, Perl is able to do this better and more efficiently. I'm still learning Perl, so I'm sure there is a better way of doing it in Perl.
If you look at that file, from line 27:
Licensing at 202/622-2480. The following changes have occurred
with respect to the Office of Foreign Assets Control Listing of
Specially Designated Nationals and Blocked Persons since January 1,
2002:
01/09/02: The following have been named as "Specially
Designated Global Terrorists" [SDGTs] -
There are two distinct patterns that I'm trying to match here, hence my original regexp (\s-\r)|(:\r). After the "January 1,2002:" text is a cariage return, line feed x2. Hex values 0D 0A 0D 0A. I'm looking to insert a string between ":" and the cariage return. So the first pattern is /(:)\r\n\r\n/ Therefore, my substuition code is this s/(:)\r\n\r\n/\1\$\$\n/g but of course this insertion is not working
It may be my hex/text editor, but It tells me there are lots of carriage returns in this data.
The second pattern is after the "01/09/02: The following have been named as "Specially
Designated Global Terrorists" SDGTs -" text, where the dash at the end is proceeded by a space, and followed by a carriage return, new line feed x2, so my match regexp is /(\s-)\r\n\r\n/ Therefore, my substuition code is this s/(\s-)\r\n\r\n/\1\$\$\n/g but of course this insertion is not working
The subsequent result would be:
Licensing at 202/622-2480. The following changes have occurred
with respect to the Office of Foreign Assets Control Listing of
Specially Designated Nationals and Blocked Persons since January 1,
2002:$$
01/09/02: The following have been named as "Specially
Designated Global Terrorists" [SDGTs] -$$
sorry for it not being much clearer. It's a bit difficult to explain.
|