http://www.perlmonks.org?node_id=693841


in reply to Problem with larger files (and s/)

Hi, your code did not output strange signs when running it, I would suspect there is something wrong with the input file, but cannot tell without further information.

I would like to focus on your other problem (inserting the tabs in some places, not in others). My suggestion would be as follows, for completeness I include big parts of what you already have written. It's late so the code is not completely polished, e.g. the first two substitutions should be written with inline comments and there may be more elegant ways to write some of the regexes, but here it is:

print "\nThis program reformats scripts produced by SQL Server 2000 En +terprise Manager\n"; print "to remove brackets and tab out data types and null settings.\n\ +n"; print "You provide a file name, this program reads it and produces a n +ew file\n"; print "with a .out extension.\n\n"; print "File name to process? (<enter> to end program.) "; chomp($sqlfile = <stdin>); $outfile = $sqlfile . ".out"; open(IN, $sqlfile) || die "cannot open $sqlfile for input: $!"; open(OUT, ">$outfile") || die "cannot open $outfile for output: $!"; while (<IN>) { #remove square brackets s/(\[|\])//g; #remove whitespace before round brackets... s/\s+((\(|\)))/$1/g; #...and commas s/\s+,/,/g; #remove some keywords s/COLLATE SQL_Latin1_General_CP1_CI_AS//g; s/ON PRIMARY//g; #remove duplicate whitespace s/\s+(\s)/$1/g; #THE MOST INTERESTING PART: #For lines not starting with non-whitespace (should hopefully be #the case only for the first line, otherwise you have to track #the line number lest you analyze keywords): #replace (single) whitespace character before word by three #tabs in case the following expression is neither "NULL" #nor "NOT NULL" s/\s+(?!(?:NOT )?NULL)([a-zA-Z]\w*)/\t\t\t$1/g if !/^\S/; print; print OUT; } END { close OUT || die "problem closing new $outfile: $!"; close IN || die "problem closing original $sqlfile: $!"; }
Some comments:
1. The most interesting part is the negative lookahead used for inserting the three tabs in the places described above.
2. You do not need to chomp the input lines since otherwise you need to add newlines afterwards again.

Hope this helps a bit and gave you some new ideas.

Replies are listed 'Best First'.
Re^2: Problem with larger files (and s/)
by Cloudster (Novice) on Jun 25, 2008 at 14:57 UTC
    Thank you! Very interesting code, I look forward to studying and dissecting it.