Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

-T changed behavior

by tothestars (Initiate)
on May 22, 2018 at 06:06 UTC ( [id://1215024]=perlquestion: print w/replies, xml ) Need Help??

tothestars has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am not a Perl programmer but I have a Perl issue that needs to be resolved and will greatly appreciate your help.

Here’s the case:

perl -le 'print "Version $] : ", -T "spk_pre_180208_180715_180419_bpo1 +2_p.bsp" ? "text" : "not text"' Linux: Version 5.010001 : not text Version 5.014002 : not text Version 5.016003 : not text Version 5.026001 : tex

Somewhere (between version 5.20.3 and 5.22.4) there is a Perl patch causing a change in -T's behavior. A file that was previously evaluated as Binary is now evaluated as text.

Any insight?

Thanks so much for your help!

2018-05-22 Athanasius added code and paragraph tags

Replies are listed 'Best First'.
Re: -T changed behavior
by kcott (Archbishop) on May 22, 2018 at 07:15 UTC

    G'day tothestars,

    Welcome to the Monastery.

    [Please put code, data, program output, and similar text within <code>...</code> tags. This has a number of benefits: your posts will be easier to read; you won't need to escape special HTML characters (e.g. turn '<' into '&lt;'); it provides a download facility so we can get a verbatim copy of your code.]

    You can find the Perl deltas in the Miscellaneous section of perldoc. Looking in perl5220delta: Selected Bug Fixes, there's a reference to '-T' that may be what you're looking for. There's another in the Performance Enhancements section but that doesn't look relevant to me. If those don't help, try other deltas in the version range you specified.

    You might also want to try:

    $ file spk_pre_180208_180715_180419_bpo12_p.bsp

    How does the output from that compare with what '-T' is telling you?

    I don't have any of those Perl versions handy so I can't do any direct testing for you.

    — Ken

      I think you might be right about that entry in perl5220delta. Just for reference, I think that the corresponding commit is f13c8ddbf, which references this P5P thread.

      Of course, we really need tothestars to show a SSCCE, then it'd even be possible to run a git blame to narrow it down to the exact commit. But using the above information, it might be something related to misdetection of UTF-8.

      tothestars: Note that -T is a heuristic test only, for example, the documentation has always said (emphasis mine): "The first block or so of the file is examined". If you want any kind of certainty, you should use a more reliable method - what's best here depends on what you're trying to achieve with this test.

Re: -T changed behavior
by LanX (Saint) on May 22, 2018 at 12:56 UTC
    For those wondering here the docs: -T

    • -T File is an ASCII or UTF-8 text file (heuristic guess).

    • -B File is a "binary" file (opposite of -T).

    ...

    The -T and -B tests work as follows. The first block or so of the file is examined to see if it is valid UTF-8 that includes non-ASCII characters. If so, it's a -T file. Otherwise, that same portion of the file is examined for odd characters such as strange control codes or characters with the high bit set. If more than a third of the characters are strange, it's a -B file; otherwise it's a -T file. Also, any file containing a zero byte in the examined portion is considered a binary file. (If executed within the scope of a use locale which includesLC_CTYPE , odd characters are anything that isn't a printable nor space in the current locale. ) If -T or -B is used on a filehandle, the current IO buffer is examined rather than the first block. Both -T and -B return true on an empty file, or a file at EOF when testing a filehandle. Because you have to read a file to do the -T test, on most occasions you want to use a -f against the file first, as in next unless -f $file && -T $file .

    Consequently I would check if the new Perl installation has another locale setting.

    Like others said, give us an example input to reproduce the problem.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

Re: -T changed behavior
by Marshall (Canon) on May 22, 2018 at 10:07 UTC
    Can you show the first 1K bytes of:
    Version 5.016003 : not text
    Version 5.026001 : tex
    Update: The reason for this is that the -t "text" file test is a heuristic ("experimentally derived rule of thumb"). The first part of the file is read and a determination is made from that.
Re: -T changed behavior
by Anonymous Monk on May 22, 2018 at 08:02 UTC
    show an example file?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1215024]
Approved by Marshall
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-19 15:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found