|Problems? Is your data what you think it is?|
Problem with larger files (and s/)by Cloudster (Novice)
|on Jun 24, 2008 at 21:43 UTC||Need Help??|
Cloudster has asked for the wisdom of the Perl Monks concerning the following question:
A boon I crave, oh wise monks! Be merciful upon this newb supplicant!
I wrote my first Perl program a couple of weeks ago and am quite happy with it (now I just have to figure out how to link it in to a web page and my PHP chat program, but Iíll figure that out eventually.) My job is database administration and development. Yesterday I sat down to do something that I do several times a week, reformat a script generated by SQL Server 2000ís Enterprise Manager when I realized that this was another opportunity to learn more of Perl and save quite a bit of time in the future, so I set forth and have fallen flat on my face. (I wrote most of this post this morning, Iíve since made most of it work)
Hereís what a SQL Server script looks like, this would be saved in a file:
What I want to do is remove the square brackets, remove the COLLATE blah blah blah, insert three tabs in front of the data type, and insert a couple of tabs in front of the null option, etcetera. The tabs wouldnít align perfectly in the end because of variance in the field name length, but thatís ok, TextPad is excellent for making that easy.
Iíve got most of that working. I have two final problems, both of which are beyond my skill. First, I canít get the NULL/NOT NULL to parse correctly. I end up with NOT\t\tNULL.
Second, and most critical, is file size. If Iím dealing with a dinky little file like the above (300 bytes), it runs just fine. But if I feed it a 9k script file with 200 lines of code, I get a really weird result. The output file displays like it has an additional space between every character, there are lots of non-ASCII values in the file, and nothing has matched and been reformatted.
I have no idea whatís going on with my program. My intent was that a line would be read, tested to see if it contains a space and a data type name, and if it did, replace that space with three tabs. My result file is very badly mangled, and no longer ASCII. Obviously Iím missing something.
This is what the first part of the file looks like when itís done processing:
CREATE TABLE 搀戀漀 .䌀甀猀琀漀洀攀爀䈀愀氀愀渀挀攀猀 ( ऀ arbh_acct int 一伀吀 一唀䰀䰀 Ⰰഀഀ 愀爀戀栀开瀀爀漀瀀开挀漀搀攀 瘀愀爀挀栀愀爀 (30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , ऀ arbh_unpd_bal decimal⠀ Ⰰ ㈀⤀ 一伀吀 一唀䰀䰀 Ⰰഀഀ 爀䌀漀甀渀琀 琀椀渀礀椀渀琀 NOT NULL , ऀ arbh_ar_cat tinyint 一伀吀 一唀䰀䰀 ഀഀ ) ON 倀刀䤀䴀䄀刀夀 䜀伀ഀഀ 䌀刀䔀䄀吀䔀 吀䄀䈀䰀䔀 dbo⸀ EmailText ⠀ഀഀ 䴀攀猀猀愀最攀䈀漀搀礀 瘀愀爀挀栀愀爀 (255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , ऀ InsertSequence bigint 䤀䐀䔀一吀䤀吀夀 ⠀Ⰰ ⤀ 一伀吀 一唀䰀䰀 ഀഀ ) ON 倀刀䤀䴀䄀刀夀 䜀伀ഀഀ 䌀刀䔀䄀吀䔀 吀䄀䈀䰀䔀 dbo⸀ ErrorCodes ⠀ഀഀ 䔀爀爀漀爀䌀漀搀攀 猀洀愀氀氀椀渀琀 NOT NULL , ऀ ErrorDesc varchar ⠀㐀 ⤀ 䌀伀䰀䰀䄀吀䔀 匀儀䰀开䰀愀琀椀渀开䜀攀渀攀爀愀氀开䌀倀开䌀䤀开䄀匀 一伀吀 一唀䰀䰀 Ⰰഀഀ 䔀爀爀漀爀䰀漀挀愀琀椀漀渀 挀栀愀爀 (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ⤀ 伀一 PRIMARYഀഀ GO ഀഀ
Iíve looked at the source file, and itís definitely ASCII text, not Unicode. Hereís my program, I was trying to have all of the data types in one array and work it from that angle, but thatís beyond my skill right now:
Any suggestions would be most welcome. Yes, it's a rather brute-force approach, but I'm new to Perl and it does what I want it to (mostly).