Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Replacing a character in a file

by Broggy69 (Novice)
on Jan 03, 2018 at 16:32 UTC ( #1206611=perlquestion: print w/replies, xml ) Need Help??
Broggy69 has asked for the wisdom of the Perl Monks concerning the following question:

I am not understanding regex and getting confused with it. I am new and read a whole bunch of stuff on it but I am no closer to answering what should be simple.

My Perl program exports data from an MS-Access file. But that file has some characters that need to be removed. I am not sure where these characters are coming from as they are not in the original file used to create the MS-Access table, nor in the MS-Access table. One thing to note that whether I create the file from Visual Basic .net or Perl the special character are there. I am just learning perl. I know how I can replace a character in Visual Basic .net without regex, but I can't seem to replace characters in perl with or with perl. Below is my code and in the file Plants.txt I want to replace a special character with a space.

use Win32::ODBC; use DBI; use Win32; use Config; use warnings; use Text::CSV; $HOSTNAME = `hostname`; my $name = Win32::LoginName; my @Files = ("schema.ini", "Log.log", "Plants.txt"); my @list; open(my $filelog, '>', 'log.log'); print $filelog "Computer Name: $HOSTNAME"; print $filelog "Username $name\n"; print $filelog "$Config{archname}\n\n"; if (-e $Files[0]) { unlink $Files[0]; print $filelog "Deleted file: $Files[0]\n"; } if (-e $Files[1]) { unlink $Files[1]; print $filelog "Deleted file: $Files[0]\n"; } if (-e $Files[2]) { unlink $Files[2]; print $filelog "Deleted file: $Files[0]\n"; } my $filename = 'C:/GBowl/strawberry32/Scripts/ccname.txt'; my $db_file = 'C:/GBowl/strawberry32/Scripts/Hello.accdb'; print $filelog "Will try to connect to ms-access database\n"; my $dbh = DBI->connect( 'dbi:ADO:Provider=Microsoft.ACE.OLEDB.12.0;Data Source='.$db_file, ) or die $DBI::errstr; print $filelog "Connected to ms-access database\n"; #prepare and execute SQL statement $Export = $dbh->prepare( 'SELECT Company_Code, Name INTO [Text;HDR=No;DATABASE=C:\GBowl\str +awberry32\Scripts].[Plants.txt] FROM Plants ORDER BY Company_Code'); $Export->execute || die "Could not execute SQL statement ... maybe invalid?"; print $filelog "New $Files[2] file created\n"; print "$Files[0]\n"; print "$Files[1]\n"; print "$Files[2]\n"; ### Section to get plant ID's ### my $file = "$Files[2]"; open(my $csv, '>>', $file) or die "Could not open '$file' $!"; if (open(my $fh1, '<:encoding(iso-8859-1)', $Files[2])) { while (my $row = <$fh1>) { chomp $row; print "$row\n"; } } else { warn "Could not open file '$filename' $!"; } open(my $PlantFile, '>', 'Plants.txt'); #Here is where I need code to replace character &#65533; with nothing close $PlantFile; close $filelog;

Replies are listed 'Best First'.
Re: Replacing a character in a file
by 1nickt (Monsignor) on Jan 03, 2018 at 20:20 UTC

    Hi, you don't want to just throw that away! That character is produced by an encoding failure. You need to know what is the encoding of the data in your database, then decode from that to Perl when you fetch the data, then encode to the character set your output is using. That way you will get the special characters as they were originally.

    (This assumes that the data was encoded correctly begin with.) Since you are on Windows your data might be encoded in Windows-1252.

    Disclaimer: I am not a windows user but I saw your question had been here for a while with no replies.

    To remove a character with a Perl regexp: perl -E '$str = "fxoox"; $str =~ s/x//g; say $str;' (output: foo).

    See perlretut and perlrequick. But that is not the right solution here.

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Replacing a character in a file
by poj (Monsignor) on Jan 03, 2018 at 21:49 UTC

    I don't have ADO to test your code but here is a Win32::OLE alternative.

    #!perl use strict; use Win32::OLE; use Win32::OLE::Const 'Microsoft Access'; my $db_file = "c:/temp/Hello.accdb"; my $app = Win32::OLE->new('Access.Application', 'Quit') or die "Couldn't start new Access instance: $!"; my $db = $app->OpenCurrentDatabase($db_file) ; my $db = $app->CurrentDb; my $rs = $db->OpenRecordSet(' SELECT Company_Code,Name FROM Plants ORDER BY Company_Code' ); my $Plantfile = 'c:/temp/Plants.txt'; open OUT,'>',$Plantfile or die "Could not open $Plantfile : $!"; my $count = 0; my @fields = ('Company_Code','Name'); while ( ! $rs->Eof ) { my @cols = map { $rs->Fields($_)->Value }@fields; print OUT join "\t",@cols; print OUT "\n"; $rs->MoveNext; ++$count; } close OUT; print "$Plantfile created with $count records\n";
    poj
Re: Replacing a character in a file
by Laurent_R (Canon) on Jan 03, 2018 at 20:31 UTC
    If you really want to get rid of some characters in your data (and I agree with 1nickt++ that perhaps you shouldn't), then it would probably be better to store the result of your query into an array in memory, modify the array to clean it up and then only write the data to a file, rather than writing immediately to a file and then trying to change the file.

    Also, it seems that your code is opening the Plants.txt file in write mode, this will clutter the existing file, probably not what you want.

Re: Replacing a character in a file
by Anonymous Monk on Jan 03, 2018 at 21:22 UTC
    I think that we could help you much more if you could please post an actual, non-confidential (sanitized) example of one of the data-file lines that you are speaking of. If you "don't know where the characters are coming from," the first order of business is to find out! It is unfortunately very easy to write a program that, in a sincere effort to "avoid crashing," actually conceals an underlying problem with the data (source). Let's get to the bottom of this. Let us help you.

      Lots of replies about the special character. The original data is as follows: (Copied before file was imported in MS-Access) I will also add this isn't for production, just wanting to learn, so in this case I do not care about the character. My guess is the file sent to me came from Excel which is nortorious for adding unwanted characters. Or replacing a "-" with a character.

      My text file before import, in MS-Access, and after export looks like

      101,Thomasville 104,Tuscaloosa 105,Opelika 108,Tucker 110,Jacksonville 111,Lakeland 112,Miami 113,Villa Rica 114,Lafayette 115,New Orleans 118,Houston 120,Jamestown 124,Lynchburg Bakery 126,Tyler 127,Phoenix 128,Tolleson 129,Fontana 130,Lenexa 133,Modesto 134,Henderson 136,Knoxville 140,Bradenton 147,DKB Bakery 149,Baton Rouge 152,El Paso Bakery 161,Oxford 165,Lepage - Park St 166,Lepage - Cedar St 167,Lepage - Brattleboro 170,San Antonio 180,Bailey Street 181,Goldsboro 184,Norfolk 187,Batesville 188,Denton 190,Leeland 191,Newton 192,Savannah 193,Bardstown 321,Alpine 355,Montgomery 423,Lynchburg DC 430,Bridgeton DC 440,Reynolds DC 447,DKB DC 452,El Paso DC 456,Sacramento 457,Hope Mills 488,Memphis

      I will give the regex a try.

      2018-01-05 Athanasius removed paragraph tags from data, and added code tags

        Please use <code> and </code> tags for your data samples. Otherwise it is almost impossible to read.

        Try creating the file directly with both ADO and ODBC connections.

        #!perl use strict; use DBI; use Text::CSV; use Devel::Peek; my $dir = 'C:/GBowl/strawberry32/Scripts/'; my $db_file = $dir.'Hello.accdb'; my $outfile = $dir.'Plantfile.txt'; my $dsn; $dsn = 'dbi:ADO:Provider=Microsoft.ACE.OLEDB.12.0;Data Source='; #$dsn = 'dbi:ODBC:driver=Microsoft Access Driver (*.mdb, *.accdb);dbq= +'; my $dbh = DBI->connect("$dsn$db_file", { RaiseError => 1 } ) or die $DBI::errstr; my $sql = ' SELECT Company_Code,Name FROM Plants ORDER BY Company_Code'; my $rs = $dbh->selectall_arrayref($sql); open my $out,'>',$outfile or die "Could not open $outfile : $!"; my $csv = Text::CSV->new ( { binary => 1 } ) or die Text::CSV->error_diag(); $csv->eol ("\n"); for (@$rs){ $csv->print($out,$_); my $line = join "\t",@$_; # check non-ascii if ($line =~ /[^\x09\x20-\x7E]/){ Dump $line; } } close $out;
        poj

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1206611]
Approved by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2018-06-17 23:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?



    Results (107 votes). Check out past polls.

    Notices?