Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^4: Using Net::SMTP to send email attachments

by shmem (Chancellor)
on May 01, 2017 at 13:29 UTC ( [id://1189262]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Using Net::SMTP to send email attachments
in thread Using Net::SMTP to send email attachments

The second attachment is empty, right? Remove the last boundary. It is followed by nothing, and thus produces an empty attachment. update: Instead, output the boundary with "--" (two hyphens) attached, as per afoken's advice below.

perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
  • Comment on Re^4: Using Net::SMTP to send email attachments

Replies are listed 'Best First'.
Re^5: Using Net::SMTP to send email attachments
by afoken (Chancellor) on May 01, 2017 at 14:36 UTC
    The second attachment is empty, right? Remove the last boundary. It is followed by nothing, and thus produces an empty attachment.

    Wrong way. When avoiding perfectly working modules, one should at least read the relevant RFCs, in this case RFC 2046. It clearly states:

    The boundary delimiter line following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter line is identical to the previous delimiter lines, with the addition of two more hyphens after the boundary parameter value.

    (Chapter 5.1.1. "Common Syntax", page 20)

    So, the code line must be changed from

    $smtp->datasend("--$boundary\n");

    to

    $smtp->datasend("--$boundary--\n");

    Also, the boundary string should not appear anywhere else in the mail body. RFC 2046 states:

    NOTE: Because boundary delimiters must not appear in the body parts being encapsulated, a user agent must exercise care to choose a unique boundary parameter value. The boundary parameter value in the example above could have been the result of an algorithm designed to produce boundary delimiters with a very low probability of already existing in the data to be encapsulated without having to prescan the data. Alternate algorithms might result in more "readable" boundary delimiters for a recipient with an old user agent, but would require more attention to the possibility that the boundary delimiter might appear at the beginning of some line in the encapsulated part. The simplest boundary delimiter line possible is something like "---", with a closing boundary delimiter line of "-----".

    And RFC 2045 adds:

    Since the hyphen character ("-") may be represented as itself in the Quoted-Printable encoding, care must be taken, when encapsulating a quoted-printable encoded body inside one or more multipart entities, to ensure that the boundary delimiter does not appear anywhere in the encoded body. (A good strategy is to choose a boundary that includes a character sequence such as "=_" which can never appear in a quoted-printable body. See the definition of multipart messages in RFC 2046.)

    So, a "good" boundary string contains pseudo-random or hashed data and is not a single word.

    Some boundaries found in my inbox:

    Boundary stringUser AgentComment
    ----=_NextPart_000_0013_01D2C28B.E3F62DD0Microsoft Outlook 14.0Several hyphens, an equal sign, and some hashed data
    ----=_Part_5917814_1894675906.1493199820931unknown, used by AmazonVery similar to the above one
    ------------F7B57802990546C5ABB340EDThunderbird 45.8.0 on WindowsSeveral hypens and a single hash
    B_3576043155_12903Microsoft-MacOutlook/14.7.3.170325No hyphens, decimal hash values
    b1_1513f22a95bb051912f3d082319cd009PHPMailer 5.2.14Again no hypehns, long hex hash value
    --_com.samsung.android.email_24350088718877150Unknown mail program running on a Samsung Android smartphone"-_" as recommended in RFC 2045, class name of the main program, and a decimal hash
    _005_17997d6269704c37af1f86b072d23dc6pollinexchangepollindel_Unknown, with traces of MS Exchangelong hex hash value, plus the name of the outgoing mail server with all non-alphanumeric characters removed
    sA6u0I68pY2erg76iB=_hTGmQiLde4Zv1ORogMailerLooks like two concatenated base64 strings, or perhaps just randomly choosen characters.
    b1_f1e8d96630926a3eef6252dc28dfcf72Elaine 5.11Looks very similar to PHPMailer above
    Apple-Mail=_48151E9E-26D6-4966-B9BC-C015767CB661Apple Mail 2.3124Same idea as Samsungs Android app: Application name and some random hex values

    MIME::Lite constructs a boundary string like this:

    # Generate a new boundary to use. # The unsupported $VANILLA is for test purposes only. sub gen_boundary { return ( "_----------=_" . ( $VANILLA ? '' : int(time) . $$ ) . $B +Count++ ); }

    (Ignore $VANILLA.) What you get is a fixed string, plus a few concatenated numbers (timestamp, process ID, and a simple counter). Overall, it should be quite unique, and it is unlikely to occur elsewhere in a message. No, this is not exciting, but it works.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Wrong way. When avoiding perfectly working modules, one should at least read the relevant RFCs, in this case RFC2046. It clearly states:

      Right. Sometimes memory betrays me; but it is not me who is "avoiding perfectly working modules"...

      So, a "good" boundary string contains pseudo-random or hashed data and is not a single word.

      Elsewhere in this thread: when assembling a multipart mail "avoiding perfectly working modules", if only for the sake of providing an example, I construct the boundary as '==' . encode_base64( join('',gettimeofday), '') which doesn't qualify as a single word also, and should be fairly unique, too.

      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
        but it is not me who is "avoiding perfectly working modules"...

        Never said that, or at least I didn't intent to do so.

        if only for the sake of providing an example, I construct the boundary as '==' . encode_base64( join('',gettimeofday), '')

        Just for fun, I looked up PHPMailer, found https://github.com/PHPMailer/PHPMailer/blob/master/class.phpmailer.php. In createBody(), the boundary strings (PHPMailer seems to use up to three different ones) are generated like this:

        $this->uniqueid = $this->generateId(); $this->boundary[1] = 'b1_' . $this->uniqueid; $this->boundary[2] = 'b2_' . $this->uniqueid; $this->boundary[3] = 'b3_' . $this->uniqueid;

        And generateId() is this:

        return md5(uniqid(time()));

        I've learned that PHP functions often have surprising behaviour and/or badly choosen names, so I looked up all three of them:

        So, what happens is that time() returns the current time as an integer. That is used as a prefix for the uniqid() function. And of its return value, an MD5 hash is created and returned as a hex string. So, the three boundary strings used by PHPMailer are identical MD5 hashes except for the "b1_", "b2_", and "b3_" prefixes.


        I can't resist a little bit of documentation bashing, though.

        time() is documented as returning the current time measured in the number of seconds since the Unix Epoch (January 1 1970 00:00:00 GMT). First problem: Unix time is defined in UTC, not GMT. Second problem: No word of leap seconds. Unix time completely ignores leap seconds, it is defined as "number of non-leap seconds since the epoch". PHP does not mention ignoring leap seconds, so I could conclude that they are in fact counted. I would expect PHP's time() function to return a number ending in 7 when called at any full minute, because there were 37 leap seconds since the Unix epoch, as of today. Let's see:

        alex@wiki pts/0 13:42:00 /home/alex>php -r "echo time();" 1493898120

        So, no, PHP's time() does NOT return the "number of seconds since the Unix Epoch (January 1 1970 00:00:00 GMT)". Instead, it returns what PHP documentation authors call the "current Unix timestamp", i.e. Unix time, NOT counting leap seconds.

        This is the PHP implementation used on 64-bit Slackware 14.2, identifying itself as following:

        PHP 5.6.30 (cli) (built: Feb 8 2017 21:28:32) Copyright (c) 1997-2016 The PHP Group Zend Engine v2.6.0, Copyright (c) 1998-2016 Zend Technologies with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2016, by Zend Tec +hnologies

        uniqid is supposed to Generate a unique ID. Its return value is documented to be a timestamp based unique identifier as a string. Interesting. Someone seems to have solved the problem of creating really unique identifiers. Or so it seems. The documentation contains a warning:

        This function tries to create unique identifier, but it does not guarantee 100% uniqueness of return value.

        So, is the return value unique or not? A simple yes-or-no question, like "are you pregnant?" There is no "I'm 42.3% pregnant". You are, or you aren't. And uniqid does NOT return a unique ID.

        Another warning explains that adjusting the clock may be a problem, too:

        This function does not guarantee uniqueness of return value. Since most systems adjust system clock by NTP or like, system time is changed constantly. Therefore, it is possible that this function does not return unique ID for the process/thread.

        Why are process and thread mentioned in that warning? If a function returns a unique value, that value should not depend on the process or thread. Of course, if all that function does is mixing time, process ID and/or thread ID, the return value won't be unique when time is adjusted. But then, why would one call that function uniqid?

        Then, parameters seem to have evolved over time. The first one is a prefix string. Why on earth would anyone do that? String contatination can easily be done after calling the function, as in $id = $prefix . uniqid();. The intention is clear: Choose a unique prefix per machine, combine that with some magic function that returns unique IDs per machine, and you have a globally unique ID. Unfortunately, that does not work if the magic function is not magic at all but returns some garbage based on time and probably process ID and / or thread ID.

        The next parameter is a boolean called more_entropy that enables uniqid() to add additional entropy (using the combined linear congruential generator) at the end of the return value, which increases the likelihood that the result will be unique. It is the recommended way to fix the problem of adjusting time:

        Use more_entropy to increase likelihood of uniqueness.

        A CLCG is a pseudo-random number generator, which generates a fixed set of numbers that will repeat. No, it won't make the result unique. The result will be less likely to be non-unique, but it DOES NOT make the result unique. How less likely depends on the CLCG implementation and its input. It might be good enough for non-crypto purposes.

        It might be fatal to use uniqid, at least without more_entropy set to true, to create a session ID, for the same reasons explained in Re^4: Randomness encountered with CGI Session and in Re^6: Randomness encountered with CGI Session. And, as explained in the latter posting, UUIDs and GUIDs are not guaranteed to be unique, but they are only very likely to be unique.


        PHPMailer does not use any of the parameters for uniqid, so md5 is run on the current time in seconds concatenated with the current time in microseconds mixed with process and/or thread ID. It's not perfectly random, it can be predictable, and it does not matter at all. It's some more or less unique garbage that is unlikely to be contained in any MIME message.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1189262]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-26 00:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found