Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Issue manipulating data - problem with data file format

by maccas17 (Initiate)
on Jan 28, 2008 at 18:47 UTC ( [id://664745]=perlquestion: print w/replies, xml ) Need Help??

maccas17 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I really struggled and failed to create a meaningful title! Sorry.

I have written a crude perl script to search through all the files in a directory for certain text strings and then collate these strings into one text file. The reason is to allow me to collect configuration information from a number of flat files.

If I create a file(s) and put strings in that will match, it works fine. I have successfully used it on one set of config files, however, another set do not work properly. I have added code to print out each of the lines from my array and the format of the data looks a bit weird. Hard to describe, but it looks like it is in a different font and with additional spaces. Even though these are exactly the same type of config files.

I'm guessing that there is something wrong with these particular files - they were copied straight off the server via the network share (same as the set that work), so it's not like I've FTP them as binary.

Can anyone shed any light as to why these files might be behaving like this?

Many Thanks.

Replies are listed 'Best First'.
Re: Issue manipulating data - problem with data file format
by samizdat (Vicar) on Jan 28, 2008 at 19:26 UTC
    Without seeing samples, it's a little difficult to do much for you. I'd suggest that you open them up in a binary editor which shows binary on one side and ASCII on the other. BVI and BED are Linux examples, but there are freeware and shareware ones available for That Other Nameless OS(TM) as well. What is probably happening is that some of your files are encoded in 16-bit UniCode characters (or maybe Microsoft Rich Text Format, where formatting codes are injected into the text). Unless you specifically include the appropriate modules into your program and enable the switches in your ReGex matchers, you'll have this problem.

    So, first, determine what your encoding is, and then you'll be able to learn the next step or ask a more detailed question.

    Don Wilde
    "There's more than one level to any answer."
      I really appreciate the responses from all three of you, especially given I didn't provide much info. It sounds plausible to me what you are suggesting and I will try and test it out when I have chance. I have managed to bodge my way round it a bit by another very crude (I have much to learn!) script that reads all the files (using type command) and then outputs to a .new version. When I run the .new version with my original script it works fine.

      I do hope to get to the bottom of this in a more intelligent fashion, so I will try out the suggestions.

      OS: Win XP
      Perl v5.8.8

        We all learn from being better able to explain. Glad to help. :D

        Don Wilde
        "There's more than one level to any answer."
Re: Issue manipulating data - problem with data file format
by samtregar (Abbot) on Jan 28, 2008 at 19:25 UTC
    Hard to describe, but it looks like it is in a different font and with additional spaces.

    You're not giving us a lot to go on, but this sounds like what some terminals do when they're presented with non-ASCII data. I'd look at the data in a hex-editor to confirm (I like Emacs hexl-mode personally).

    If you want a better answer you'll need to show us some code and some sample data. Not too much, please, just enough to demonstrate the problem. Also, some info on your environment wouldn't hurt - OS, Perl version, etc.

    -sam

Re: Issue manipulating data - problem with data file format
by WoodyWeaver (Monk) on Jan 28, 2008 at 19:20 UTC
    Not light but a clue perhaps. UTF8?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://664745]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2024-03-29 00:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found