Clear questions and runnable code
get the best and fastest answer
A bad guy can embed malicious code into ANY file format, even plain text. The real question is: Will it be executed? If the bad guy can make the victim use an application that does not properly check the file it reads, the malicious code will probably be executed. Microsoft has been taught this lession several times, and still there are exploits based on maliciously modified files. This problem is not limited to Microsoft, but Microsoft's software is often the largest target.
So, what can you do to prevent this happening to your code?
Simple: DO NOT TRUST YOUR INPUT.
Validate all input. Treat it as malicious until you can mathematically prove that every single byte of input is correct and not malicious. Refuse to work with input that does not pass the validation. Do not try to auto-correct invalid input. Perl's taint mode can be helpful here, but it is only an automatic tool. It can't prevent all attacks, simply because it is limited to a few critical functions. This is better than what many other languages offer, but it is NOT FOOLPROOF.
Passing malicious, unvalidated input to another tool (Image::Magick or any other module or external application) to "make it safe" does not work unless the tool is EXPLICITLY designed to validate the input as described above.
A simple file type detection tool (like file(1)) may be a first step towards validation, but you need to be aware that those tools only test a few bytes of the input to detect the file type. They DO NOT VALIDATE the entire file. For example, the test for the GIF file format just reads the first six bytes and compares them with "GIF87a" and "GIF89a".
What can you do to prevent attacks to the computers of your users?
You can not control them. You don't even know what software they will use. And it is not your responsibility to protect them.
If we talk about a controlled corporate network, things are a little bit different, and your main job should be to educate your users -- the usual drill: Do not open unknown attachments, do not open unrequested attachments, do not open or execute files of unknown origin, and so on. You should also make sure that all software on all systems is regularely updated. You should remove software whose author repeatedly fails to fix security bugs within a short time. Even if the author is Microsoft or Apple.
Firewalls are of little use. They are great to separate the malicious internet from a protected network, and they are fine for this job. But you need to open the firewall to access servers on the other side of the firewall, and there the problem begins: If the firewall inspects the content, it needs to know the exact data format to validate it. All firewalls I've seen just scan for known attack patterns, like a virus scanner. That obviously can not work for new attacks. And it is not the job of the firewall. The application needs to validate its input, not the firewall. There are even attacks that target the content scanner of the firewall.
Virus scanners do not help, for the same reasons. They can only test for known patterns and known behaviours. So, they MUST FAIL for new attacks with new patterns or new behaviours.
I've bypassed several content filters (combined with virus scanners) simply by zipping a problematic file and sending a hex dump of the ZIP file as plain text instead of the original file. I've even prefixed the hex dump with a small perl script that automatically converted the hex dump back into the ZIP file. The last incarnation was a perl script that automatically generated the self-unpacking hex-dump-perl-script text file. I was even prepared to change the hex dump into a format that looked like plain english text.
I've seen malware coming through content filters and virus scanners while at the same time harmless and useful files were blocked. So, no, I do not trust content filters and virus scanners.
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)