Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
A boon I crave, oh wise monks! Be merciful upon this newb supplicant!

I wrote my first Perl program a couple of weeks ago and am quite happy with it (now I just have to figure out how to link it in to a web page and my PHP chat program, but Iíll figure that out eventually.) My job is database administration and development. Yesterday I sat down to do something that I do several times a week, reformat a script generated by SQL Server 2000ís Enterprise Manager when I realized that this was another opportunity to learn more of Perl and save quite a bit of time in the future, so I set forth and have fallen flat on my face. (I wrote most of this post this morning, Iíve since made most of it work)

Hereís what a SQL Server script looks like, this would be saved in a file:

CREATE TABLE [dbo].[CustomerBalances] ( [arbh_acct] [int] NOT NULL , [arbh_prop_code] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_ +AS NOT NULL , [arbh_unpd_bal] [decimal](10, 2) NOT NULL , [rCount] [tinyint] NOT NULL , [arbh_ar_cat] [tinyint] NULL ) ON [PRIMARY] GO

What I want to do is remove the square brackets, remove the COLLATE blah blah blah, insert three tabs in front of the data type, and insert a couple of tabs in front of the null option, etcetera. The tabs wouldnít align perfectly in the end because of variance in the field name length, but thatís ok, TextPad is excellent for making that easy.

Iíve got most of that working. I have two final problems, both of which are beyond my skill. First, I canít get the NULL/NOT NULL to parse correctly. I end up with NOT\t\tNULL.

Second, and most critical, is file size. If Iím dealing with a dinky little file like the above (300 bytes), it runs just fine. But if I feed it a 9k script file with 200 lines of code, I get a really weird result. The output file displays like it has an additional space between every character, there are lots of non-ASCII values in the file, and nothing has matched and been reformatted.

I have no idea whatís going on with my program. My intent was that a line would be read, tested to see if it contains a space and a data type name, and if it did, replace that space with three tabs. My result file is very badly mangled, and no longer ASCII. Obviously Iím missing something.

This is what the first part of the file looks like when itís done processing:

CREATE TABLE 搀戀漀 .䌀甀猀琀漀洀攀爀䈀愀氀愀渀挀攀猀 ( ਍ऀ arbh_acct  int 一伀吀 一唀䰀䰀 Ⰰഀഀ 愀爀戀栀开瀀爀漀瀀开挀漀搀攀 瘀愀爀挀栀愀爀 (30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , ਍ऀ arbh_unpd_bal  decimal⠀㄀ Ⰰ ㈀⤀ 一伀吀 一唀䰀䰀 Ⰰഀഀ 爀䌀漀甀渀琀 琀椀渀礀椀渀琀 NOT NULL , ਍ऀ arbh_ar_cat  tinyint 一伀吀 一唀䰀䰀 ഀഀ ) ON 倀刀䤀䴀䄀刀夀 ਍䜀伀ഀഀ ਍䌀刀䔀䄀吀䔀 吀䄀䈀䰀䔀  dbo⸀ EmailText ⠀ഀഀ 䴀攀猀猀愀最攀䈀漀搀礀 瘀愀爀挀栀愀爀 (255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , ਍ऀ InsertSequence  bigint 䤀䐀䔀一吀䤀吀夀 ⠀㄀Ⰰ ㄀⤀ 一伀吀 一唀䰀䰀 ഀഀ ) ON 倀刀䤀䴀䄀刀夀 ਍䜀伀ഀഀ ਍䌀刀䔀䄀吀䔀 吀䄀䈀䰀䔀  dbo⸀ ErrorCodes ⠀ഀഀ 䔀爀爀漀爀䌀漀搀攀 猀洀愀氀氀椀渀琀 NOT NULL , ਍ऀ ErrorDesc  varchar ⠀㐀 ⤀ 䌀伀䰀䰀䄀吀䔀 匀儀䰀开䰀愀琀椀渀㄀开䜀攀渀攀爀愀氀开䌀倀㄀开䌀䤀开䄀匀 一伀吀 一唀䰀䰀 Ⰰഀഀ 䔀爀爀漀爀䰀漀挀愀琀椀漀渀 挀栀愀爀 (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ਍⤀ 伀一  PRIMARYഀഀ GO ਍ഀഀ

Iíve looked at the source file, and itís definitely ASCII text, not Unicode. Hereís my program, I was trying to have all of the data types in one array and work it from that angle, but thatís beyond my skill right now:

print "\nThis program reformats scripts produced by SQL Server 2000 En +terprise Manager\n"; print "to remove brackets and tab out data types and null settings.\n\ +n"; print "You provide a file name, this program reads it and produces a n +ew file\n"; print "with a .out extension.\n\n"; print "File name to process? (<enter> to end program.) "; chomp($sqlfile = <stdin>); $outfile = $sqlfile . ".out"; $datatypes = " binary/ bigint/ bit/ char/ datetime/ " . " decimal/ float/ image/ int/ money/ " . " nchar/ ntext/ nvarchar/ numeric/ real/ " . " smalldatetime/ smallint/ smallmoney/ sql_variant/ " . " sysname/ text/ timestamp/ tinyint/ varbinary/ varchar/ " . " uniqueidentifier"; open(IN, $sqlfile) || die "cannot open $sqlfile for input: $!"; open(OUT, ">$outfile") || die "cannot open $outfile for output: $!"; while (<IN>) { chomp; #delete or comment out the two following lines #if you need brackets around your object names. s/\[//g; s/\]//g; s/ \(/\(/g; s/ \,/\,/g; s/COLLATE SQL_Latin1_General_CP1_CI_AS//g; s/ON PRIMARY//g; s/\sbinary/\t\t\tbinary/g; s/\sbigint/\t\t\tbigint/g; s/\sbit/\t\t\tbit/g; s/\schar/\t\t\tchar/g; s/\sdatetime/\t\t\tdatetime/g; s/\sdecimal/\t\t\tdecimal/g; s/\sfloat/\t\t\tfloat/g; s/\simage/\t\t\timage/g; s/\sint/\t\t\tint/g; s/\smoney/\t\t\tmoney/g; s/\snchar/\t\t\tnchar/g; s/\sntext/\t\t\tntext/g; s/\snvarchar/\t\t\tnvarchar/g; s/\snumeric/\t\t\tnumeric/g; s/\sreal/\t\t\treal/g; s/\ssmalldatetime/\t\t\tsmalldatetime/g; s/\ssmallint/\t\t\tsmallint/g; s/\ssmallmoney/\t\t\tsmallmoney/g; s/\ssql_variant/\t\t\tsql_variant/g; s/\ssysname/\t\t\tsysname/g; s/\stext/\t\t\ttext/g; s/\stimestamp/\t\t\ttimestamp/g; s/\stinyint/\t\t\ttinyint/g; s/\svarbinary/\t\t\tvarbinary/g; s/\svarchar/\t\t\tvarchar/g; s/\suniqueidentifier/\t\t\tuniqueidentifier/g; s/NOT NULL/\t\tNOT NULL/s; s/NULL/\t\tNULL/s; print "$_\n"; print OUT "$_\n"; } close(OUT) || die "problem closing new $outfile: $!"; close(IN) || die "problem closing original $sqlfile: $!";
Any suggestions would be most welcome. Yes, it's a rather brute-force approach, but I'm new to Perl and it does what I want it to (mostly).

In reply to Problem with larger files (and s/) by Cloudster

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others examining the Monastery: (6)
    As of 2019-08-23 14:04 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?