Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Of course, Windows applications make a lot of assumptions about files based on their filenames (especially their extensions).
Yep, that's pretty dumb.
You may want to check file type by investigating the contents, not the filename.
That makes some sense. But not overly because it's pretty hard to do.
The first two or four bytes of most files are often a very good clue as to the file's type. These bytes are usually referred as a "magic number."
<rant>
Well, guessing the type of the content of a file based on the first two bytes (or rather, the first couple of bytes, /etc/magic allows for variable formats) is not much smarter than using the file name. Sure, you are free to choose your filename - but who takes advantage of that? Noone stores gif images in files ending in .pl, and if you put your C program in a file called "fuddly-bumps.html", chances are your compiler will not take you seriously and refuse to compile your program. Not to mention that the classical Unix build program, make entirely depends on filenames to build the targets. And yes, another advantage is that you don't need a filename to make a guess. But the disadvantage is that the magic number gets in the way. Not much of a problem for binary formats which are purely processed by programs. But annoying, and prone to error for anything edited by humans. Furthermore, it still is uncontrolled guesswork (just like file-extensions). Anyone can invent a magic number, whether it's in use or not, there's no official way of keeping track, making sure there are no collisions etc. Here's a small example of the dumbness of magic numbers:
The Netpbm project uses several (related) file formats. The magic numbers are "P1", "P2", "P3", "P4", "P5" and "P6". Looks simple. Looks extentable as well, doesn't? If more formats are needed, just continue the numbering. "P7", "P8", "P9", "P10". Right? No. If you start a file with a P followed by a 1, regardless of what follows, file thinks it's a "Netpbm PBM image text". Even if it's a simple text file that starts with the sentence "P100s of Samsung are really cool phones".
My point is that magic numbers suck as bad as file extensions. Both magic numbers and file extensions work in practise reasonably well because people follow de facto standards. Windows uses file extensions almost exclusively. Unix (and with that, I mostly mean Unix tools) rely on both. Some tools use magic numbers. Some use tools use file extensions. Some use both.
</rant>

Abigail


In reply to Re: Getting File Type using Regular Expressions by Abigail-II
in thread Getting File Type using Regular Expressions by bkiahg

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-19 23:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found