Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^6: raw data formatting

by teamassociated (Novice)
on Nov 16, 2012 at 13:25 UTC ( #1004183=note: print w/ replies, xml ) Need Help??


in reply to Re^5: raw data formatting
in thread raw data formatting

Here is the entire file: I need to skip the 1st 88 indexes and do not need any line that do not start w/ a space then a digit...this seemed to of work:

next if ($lines[0] !~ /\A\s+\d+/);
thank you again!
root@facsxxxx [/var/adm/scripts] # cat 20121106_C1SA_174204.tmp 12.311 17.40.21 000001 RPT=*$BACCS1 DATE 11/06/12 TIME 05:36 P.M. PAGE 1 REPORT *$BACCS1 xxxxxxx HO +SPITAL A80 *A080* SMSLIST OPTIONS PAGE LENGTH 055 MARGINS 001 00132 LIMIT 01000 LINE SPACING 0 SUPPRESS DATE YYYY REQUEST TITLE 'xxxxx HOSPITAL CCS BAD DEBT ACCOUNTS ALPHA A-Z' COLUMNS BEGINCOL HEADING 'PT NO' 'GUAR NO' 'GUAR PHONE' 'GUAR SSA' +'PT SSA' SPACING 00 , "PT NO WOSCD" GRAND TOTALS CNT CHANGE TOTALS CNT , "CURR GUAR NO" , "GUAR SHORT PHONE1 NO" , "GUAR SSA NO" , "PT SSA NO" , ENDCOL , "PT NO SCD" HEADING ' ' SPACING 00 , BEGINCOL HEADING 'FC' 'PT' 'RC' 'I1' 'I2' SPACING 01 , "FC" , "PT TYPE" , "RESP CD" , "INS1 CLASF" , "INS2 CLASF" , ENDCOL , BEGINCOL HEADING 'PT NAME' 'GUAR NAME' 'GUAR ADDR LINE 1' 'GUAR ADDR LINE 2' 'GUAR ADDR LINE 3' SPACING 01 , "PT NAME" GRAND TOTALS CNT CHANGE TOTALS CNT , "GUAR SHORT NAME" , "GUAR SHORT ADDR1" , "GUAR SHORT ADDR2" , "GUAR SHORT CITY STATE" , ENDCOL , BEGINCOL HEADING 'ACCT BAL' 'INS BAL' 'CTRCT AMT' 'INS1 BL$$' 'I +NS2 BL$$' JUSTIFY R SPACING 01 , ( "BAL ACCT BAL" POS -02 LEN 008 ) GRAND TO +TALS SUM CHANGE TOTALS SUM , ( "BAL TOT INS BAL" POS -02 LEN 008 ) , ( "CTRCT AMT" POS -02 LEN 008 ) , ( ( ( "INS1 TOT BAL" - ( "INS1 PAY AMT" + "INS1 ADJ AMT" ) +) ) POS -02 LEN 008 ) , ( ( ( "INS2 TOT BAL" - ( "INS2 PAY AMT" + "INS2 ADJ AMT" ) +) ) POS -02 LEN 008 ) , ENDCOL , BEGINCOL HEADING 'ADM DT' 'D.O.B.' 'LAST SVC' 'DSCH DT' 'PT PAYD +T' SPACING 01 , "ADM DATE" , "BIRTH DATE" , "LAST SVC DATE" , "DSCH DATE" , "LAST PT PAY DATE" , ENDCOL , BEGINCOL HEADING 'INS1 BL' 'INS2 BL' 'PT BL DT' 'CTRCT DT' 'GUAR + ZIP' SPACING 01 , "LAST REAL INS1 BL DATE" , "LAST REAL INS2 BL DATE" , "LAST NON ACTV PT BL DATE" , "NEXT CTRCT CALC DATE" , "GUAR ZIP CD" , ENDCOL , BEGINCOL HEADING 'TOT CHG$' 'INS1 POL' 'INS2 POL' 'INS1 GRP' 'IN +S2 GRP' JUSTIFY R SPACING 01 , "BAL TOT CHG AMT" , "INS1 POL NO" , "INS2 POL NO" , 12.311 17.40.21 000002 RPT=*$BACCS1 DATE 11/06/12 TIME 05:36 P.M. PAGE 2 REPORT *$BACCS1 xxxxxxx HO +SPITAL A80 *A080* "INS1 GRP NO" , "INS2 GRP NO" , ENDCOL , BEGINCOL HEADING 'INS1 GRP' 'INS2 GRP' 'SUB1 GRP' 'SUB2 GRP' 'EMPLYR NAME' SPACING 01 , "SUBSCR1 INS GROUP NAME" , "SUBSCR2 INS GROUP NAME" , "SUBSCR1 INS GROUP ID" , "SUBSCR2 INS GROUP ID" , ( "EMPR SHORT NAME" POS 001 LEN 020 ) , ENDCOL , BEGINCOL WIDTH 10 HEADING 'V1' 'V2' 'MM' 'DD' 'YYYY' SPACING 01 + , ( "INS1 PLAN NO" POS 001 LEN 001 ) , ( "INS2 PLAN NO" POS 001 LEN 001 ) , (MONTH OF "ACCT BD XFR DATE" ) , (DAY OF "ACCT BD XFR DATE" ) , (YEAR OF "ACCT BD XFR DATE" ) , ENDCOL ORDERED BY "PT NAME" WHERE ( "BAL ACCT BAL" EX (00 THRU 2.99) AND "ACCT BD XFR DATE" +IN ( 10/31/12 THRU 11/06/12) AND "FC" IN ('X','W','Z') AND "PATIENT" H +AS NO "USER CMPNT ID" EQ ('2xxxxxx') ) 12.311 17.40.21 000003 RPT=*$BACCS1 DATE 11/06/12 TIME 05:36 P.M. PAGE 3 REPORT *$BACCS1 xxxxxx HOS +PITAL A80 *A080* xxxxxx HOSPITAL CCS BAD DEBT ACC +OUNTS ALPHA A-Z PT NO FC PT NAME ACCT BAL ADM DT I +NS1 BL TOT CHG$ INS1 GRP V1 GUAR NO PT GUAR NAME INS BAL D.O.B. I +NS2 BL INS1 POL INS2 GRP V2 GUAR PHONE RC GUAR ADDR LINE 1 CTRCT AMT LAST SVC P +T BL DT INS2 POL SUB1 GRP MM GUAR SSA I1 GUAR ADDR LINE 2 INS1 BL$$ DSCH DT C +TRCT DT INS1 GRP SUB2 GRP DD PT SSA I2 GUAR ADDR LINE 3 INS2 BL$$ PT PAYDT G +UAR ZIP INS2 GRP EMPLYR NAME YYYY 50###### 8 Z Axxxx ,xxxxxxx 19793.43 xx/xx/xxx2 + 32989.05 0######### S JOYCE xxxxx .00 xx/xx/xxxx + 0 0 7 1471 xxxxxx .00 xx/xx/xxxx xx/xx/xxx +x NOV 0 U MILWAUKEE, xx ##### 32989.05 xx/xx/xxxx + 3 3######## MILWAUKEE xx .00 5 +3214 12 5######## 8 Z ALxxx,xxxx 75.00 05/27/2012 06 +/05/2012 1845.00 0######### E xxxxx ALxxx .00 09/xx/1xxx 06 +/05/2012 MYT###### 73######## S ##### xxxxx xxxx .00 06/06/2012 0 +6/14/2012 MYT9###### NOV 0 C TAYLOR,xx 48180 1845.00 05/27/2012 + 3 3 37####### S TAYLOR xx .00 4 +8180 3 12 5####### 9 Z ALVAxxxx xxxx ,xxxxx 75.00 05/05/2012 05/1 +5/2012 2045.00 0000000000 E SUSE asbdsdfs xxxx .00 03/18/1989 05/ +15/2012 AJT9##### 313####### 440 PARK AVE .00 05/16/2012 0 +6/15/2012 AJT####### NOV 0 C LINCOLN PARK,xx ##### 2045.00 05/05/2012 + 1 3 0 S LINCOLN PARK xx .00 4 +8146 1 12 50###### 7 Z ANDREWS ,xxxxx 321.81 05/24/2012 +05/31/2012 1209.00 CALIFORNIA 03####### R xxxxx ANDREWS .00 xx/xx/xxxx 0 +5/31/2012 YDP###### 73####### 24549xxxxx xxx .00 06/01/2012 06/ +19/2012 YDP840####1 NO NOV 0 C FLAT ROCK,xx ##### 1209.00 05/24/2012 + 1 NO 3 12.311 17.40.21 000025 RPT=*$BACCS1 DATE 11/06/12 TIME 05:36 P.M. PAGE 25 REPORT *$BACCS1 xxxxxx HOS +PITAL A80 *A080* 36###### S FLAT ROCK xx .00 48 +134 1 UNEMPLOYED 12 50#####2 4 Z BACCExxx ,xxx xxxx 31.12 05/29/2012 +06/05/2012 175.00 0000000000 Z xxxx x BACCELLA .00 03/xx/xxxx 06 +/05/2012 OMP89##### 31#####1 xxxx ROSEDxxx xxxx .00 06/06/2012 06/ +14/2012 OMP89##### NOV 0 C ALLEN PARK,xx 48101 175.00 05/29/2012 + 9 3 0 S ALLEN PARK xx .00 4 +8101 9 12 5####### 5 Z xxxxx ,AMBER 100.00 xx/xx/xxxx + 3672.00 0000000000 E AMBER xxxxx .00 xx/xx/xxxx 04 +/25/2012 XYQ9###### 73####### 18480 xxxx x x .00 04/26/2012 05/ +15/2012 XYQ911##### NOV


Comment on Re^6: raw data formatting
Select or Download Code
Re^7: raw data formatting
by marto (Chancellor) on Nov 16, 2012 at 13:40 UTC

    Use <code> tags for data please. This data (unike your initial post) seemingly contains real names (as well as things titled PHONE, BAD_DEBT, State/Area/zip codes and what could be social security information), are you sure you should be posting this anywhere?

      yeah ur right...ok I scrubbed the file and made it not as long. There is no S/S info in this file.

        You're posting (what could be) other peoples personaly data on the internet (good luck getting that out of search engine caches), is that a better idea than spending a few minutes effort? Why do you think we need this to be real data? If you were having problems with a database script would you provide valid login credentails? Why do you think anyone would need so much data in order to help with your code? Hopefully you're starting to get the point.

        The sensible thing would have been to craft a fake file using data you'd made up, only a few records long. Posting it formatted correctly helps. Essentially making it easier for people to help you.

        Update: Prior to nuking, the post this was in reply to said:

        "well what can I do...scrub the whole file???"

        Please mark significant updates, rather than replace content, so replies still have context.

        Yes--quickly remove the above, and then scrub all that you want to post here as a sample to work on.

        yeah ur right...ok I scrubbed the file and made it not as long.

        Please update your post one more time to purge the secret perlmonks cache

        FWIW/AFAIK, you're supposed to treat SSN about the same as CCN, and you should notify the people whose SSN's you exposed so they can monitor teir credit for identity theft

Re^7: raw data formatting
by Kenosis (Priest) on Nov 16, 2012 at 14:15 UTC

    A grep on the data to check for four spaces followed by a digit at the beginning of each line works:

    use strict; use warnings; use List::MoreUtils qw/natatime/; my $it = natatime 5, grep /\A\s{4}\d/, <DATA>; while ( chomp( my @lines = $it->() ) ) { my $letter = 'A'; my $acctNum = do { $lines[0] =~ /\s+(\d+)\s+(\d+)/; $1 . $2 }; push @lines, " acctnum=$acctNum"; print for map { s/\s+/$letter++ . ' '/e; "$_\n" } @lines; } __DATA__ Place your data here...
      Hey, thanks for posting the real data again, good luck notifying the owners of this security breach

        I don't know that you're correct about "posting the real data again". The re-posted data contained text like "ADAMS ,xxxxxxx," so it clearly appeared that all sensitive info had been redacted--after I requested that it be scrubbed.

        Have removed all data, just in case...

      The solution was two fold: Kenosis provided this solution, thnk U! 1. I needed to take out the /r in b/c my Perl is 5.8.8
      print for map { s/\s+/$letter++ . ' '/er} @lines; print for map { s/\s+/$letter++ . ' '/e; "$_\n" } @lines;

      2. I replaced
      my $it = natatime 5, <DATA>; with my $it = natatime 5, grep /\A\s{4}\d/, <DATA>;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1004183]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2014-08-30 18:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls