http://www.perlmonks.org?node_id=482488


in reply to Re: Algorithm for cancelling common factors between two lists of multiplicands
in thread Algorithm for cancelling common factors between two lists of multiplicands

It doesn't produce the output you wanted ...

As I mentioned in 482012, the exact reduction isn't important. The results of your 3 manual steps (33,5)(2,4,23) and my 5 manual steps (33,5)(8,23) are completely equivalent.

Adding a dependancy upon Inline::C is heavier than Math::Pari, though GCD could be written in perl of course.

Your algorithm certainly works. However, using GCD requires multiple, O(m*n) passes over the datasets. hv's prime factors approach eliminates the multiple passes and seems, on the small sets I've tried, to get the results more quickly, even with pure perl factoring code.

Coming up with an efficient, pure perl implementation of prime factorisation (limiting myself to integers 0 .. 2**32 ) is my current fun challenge :)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Replies are listed 'Best First'.
Re^3: Algorithm for cancelling common factors between two lists of multiplicands
by Limbic~Region (Chancellor) on Aug 10, 2005 at 12:30 UTC
    BrowserUk,
    I misinterpreted "Other reductions possible" to read "Further reductions possible" given the example and where the commentary appeared. Admittedly I haven't read the entire thread and was under the incorrect impression for any given input there was only 1 acceptable output. I was also under the incorrect impression that you were not interested in prime factorization but what would be the result if the problem of "cancelling out" had been attacked by hand.

    I believe the approach I presented could be made more efficient though I doubt it would be able to compete with a good prime factorization algorithm. I am familiar with several, some may have even been used in this thread, such as the quadratic sieve - but I don't currently have the time to write a perl implementation. IIRC numbers smaller than a certain size are best done with 1 algorithm and larger than a different number with a different algorithm with the numbers falling in the middle being a toss up.

    If I do come up with anything I will be sure to let you know. Thanks for the fun problem.

    Cheers - L~R

Re^3: Algorithm for cancelling common factors between two lists of multiplicands
by sk (Curate) on Aug 10, 2005 at 07:56 UTC
    BrowserUk,

    I would be curious to see the results if you ever benchmark the GCD approach with the prime factor approach on the problem size you were mentioning

    The GCD Approach is O(m*n)*O(GCDalgo) but I am not sure how the complexity reduces for the prime factorization approach.

    You need to calculate the prime factor each of the product value right? i.e. if you have to do 100*101*102*103*104/57*58*59 then you need factor for each of the values correct?

    The GCD approach will loop (5 * 3)*O(GCDalgo) times. But prime factor approach needs to calc factors for 8 values and that could potentially loop a lot more per value (max upto the value) and the complexity will be more than the GCD approach.

    I ran the GCD approach and took about 9 minutes (not a formal benchmark) just to cancel out terms. Don't have any math package to do big multiplications/divisions.

    I even tried to reduce the inner loop by switching @a and @b but did not see a huge impact

    Also, i am confused whether the Haskell code actually computed the example (40000+!) because the execution speed was just unbelievable. Did it loose any precision when you tested it? I guess the approach was similar in canceling out terms. It makes me wonder why Perl canceling out takes so much longer than Haskell implementation

    will be happy to hear on how your final implementation look.

    Thanks,

    SK

      These are the results I have currently for the inputs [989 9400 43300 2400]:

      tmoertel's haskell implementation of the cancelling code ( in 2.6 seconds):

      [11:20:43.54] C:\ghc\ghc-6.4\code>FET < ex1.dat +8.070604647867604097576877675243109941729476950993498765160880e-7030 [11:20:46.34] C:\ghc\ghc-6.4\code>

      Using Math::Pari (in 26 ms):

      P:\test>MP-FET.pl 8.070604647867604097576877668E-7030 1 trial of _default ( 24.817ms total), 24.817ms/trial

      Using Math::BigInt (Yes. That 4 hrs 38 minutes!)

      P:\test>MBF-FET.pl 8.070604647867604097576877675243109941729e-7030 1 trial of _default ( 16,699s total), 16,699s/trial

      The whole point of the cancelling code is that you do not have to calculate the huge factorials in infinite precison (slow) math, because you reduce the factorials to products of sets of much smaller, prime factors and then cancel most of those.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
        Thanks for the update

        I cannot read Haskell so I was not sure how the factoring was done. Going through the example it seems like tmoertel was just factoring like terms. If the code does further factoring (school days division) then I am surprised that it can loop through that fast!

        If you take a look at my scratchpad I tried ikegami's logic on gcd. I added a condition to check for values == 1 to save an extra GCD. I also did not have the Math module he was using so picked up another one.

        This one takes about 9 minutes. It would be interesting to see what will happen if we apply BigInt from here onwards!

        I am also curious to see the runtime of the prime factor approach. I shall try hv's code to see how fast it runs. If I remember correctly Euclidean algo for GCD is O(n+log n) and prime factorization is NP-complete. For small values, prime fact is a breeze but the complexity does not change so i shall let you know when i get around testing the runtime for it!

        cheers SK

        Can you post your Math::BigInt code and test case? I would like to play around with some other ideas.

        Update: Ooops, I see it now, thanks.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

        The last two digits of the Math::Pari implementation's result seem to be off:
        +8.070604647867604097576877675243109941729476950993498765160880e-7030 8.070604647867604097576877675243109941729e-7030 8.070604647867604097576877668E-7030 <-- Math::Pari ^^
        Can you increase the implementation's precision? Or is that the limit?

        Cheers,
        Tom

      Looking at my copy of The Art of Computer Programming: Seminumerical Methods, 3E, Knuth gives the worst case as:
      O(GCD) ~= 4.785*log(N) + 0.6723
      where N is the smaller argument to Euclid's algorithm. The average is much better:
      Avg(GCD) ~= 0.1775*N
      Update: hv pointed out that the approximation for the mean didn't look right. I went back to the source, and see that I pulled out an intermediate result (I have to read more closely in the future). So it's not the mean.

      The mean is given as

      Avg(GCD) ~= 1.9405*log10(N)
      where N is the smaller argument to Euclid's algorithm, and ignoring a subtracted correction term based on the divisors of N.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re^3: Algorithm for cancelling common factors between two lists of multiplicands
by QM (Parson) on Aug 10, 2005 at 17:51 UTC
    Coming up with an efficient, pure perl implementation of prime factorisation (limiting myself to integers 0 .. 2**32 ) is my current fun challenge :)
    You might have a look at Math::Big::Factors, which I found useful, and is pure Perl. (Note that the docs have a typo, and mention factor_wheel where the function is actually named factors_wheel.)

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Without actually trying it, Math::Big::Factors says:

      For very small numbers the computation of the wheel of order 7 may actually take longer than the factorization, but anything that has more than 10 digits will usually benefit from order 7.

      As I'm limiting myself to 2**32, it probably won't help?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
        Here's a benchmark for Math::Big::Factors wheels of orders 5-7:
        Rate 7 6 5 7 34.1/s -- -95% -100% 6 627/s 1739% -- -94% 5 9770/s 28548% 1458% --
        and for factors_wheel of numbers near 32 bits, using an order 7 wheel:
        s/iter 4294967291 4294967293 4294967295 4294967291 4.45 -- -61% -71% 4294967293 1.74 156% -- -25% 4294967295 1.31 241% 33% --
        Note that 4294967291 is prime, 4294967293 has 2 large factors, and 4294967295 has 5 factors, the largest of which is 2**16+1.

        For 4294967291, and wheel orders 5-7:

        s/iter 7 5 6 7 4.44 -- -16% -21% 5 3.71 19% -- -6% 6 3.49 27% 6% --
        In all fairness, I should probably mention that I've customized my own version of Math::Big::Factors to Memoize results where possible, and to reduce the calls to Math::BigInt::new() for constants.

        Note that factors_wheel creates a new wheel every time (what a shame), instead of requiring a wheel reference be passed in. Caching wheel goes a little way in correcting this, without changing the interface.

        You might also consider avoiding the use of Math::BigInt where possible, as you don't need numbers that big.

        Now, back to the question at hand. For numbers near 2**32, factoring a prime number seems to take 150 times longer than creating the order 7 wheel. Creating the order 6 wheel is considerably faster, and so is the factoring based on that wheel.

        I'm not sure where the breakpoint is for order 7 wheels. A casual search hints that it is very large.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of