devnul has asked for the
wisdom of the Perl Monks concerning the following question:
Hello,
I find myself with a very tough problem involving lease calculations and was wondering if anyone might be able to help me figure out a solution here.
Basically I am getting a dump of data from an "old big clunky EBCDIC system". I've obviously converted this to something more ASCIIlike and have removed all the nonfloatingpoint pieces of data from each record.
The reason I've done this is because the system it comes from can only return "payoff information" for single leases entered by an operator and my client needs to know the payoff amount for every lease in the system (well over 200,000  which would take an operator forever to accomplish), and this dump may provide the information to determine this. Also, being a proprietary system, this is about the only financial data I am able to extract from the system.
The issue I have is that I don't know fields are what  only that by adding and substracting some combination of them, with a particular lease, I will arrive at a figure of $843.24.
I can't quite get my head around how to go about reversing engineering this formula, or even if this mountain can be moved.
Any help would, as always, be greatly appreciate.
 Devnul
Re: Reverse engineering a formula... by Skeeve (Vicar) on Jun 13, 2006 at 10:21 UTC 
 [reply] [d/l] [select] 

I was trying to keep my question "theoretical". But here's the numbers if you really want them:
$VAR1 = [
'18.930167',
'17.967469',
'0.008720',
'0.008720',
'122.640000',
'12493.320000',
'359.520000',
'288.700000',
'359.520000',
'89.880000',
'32.960000',
'56.920000',
'13.470000',
'1231.360000',
'20587.440000',
'359.520000',
'792.170000',
'629.160000',
'972.290000',
];
 Devnul  [reply] [d/l] 

 [reply] [d/l] [select] 





An initial price of 12,493.32
A first payment of 1,231.36
+ 23 monthly payements of 843.24
=====================================
Total cost
( tax relief) 20,587.44
Daily interest rate 0.008720
Annual Equivalent rate 17.967469
Annual effective rate
(inc. one off payments) 18.930167
Other numbers include oneoff admin charges and daily/weekly effective costs?
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.  Rule 1 has a caveat!  Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] [d/l] 

Well, look at it this way:
 You have 19 floats
 Each float may:
 Need to be added (multiply by 1)
 Need to be subtracted (multiply by 1)
 Need to be ignored (multiply by 0)
This means there are 3 to the power 19 different ways to combine these numbers... That's a lot of possible combinations to try, but it can be done I suppose... You'd probably want to write a recursive algorithm to try all these permutations, and see if the rounded total matches the number you specified...
Update: Possible solution code posted below
 [reply] 
Re: Reverse engineering a formula... by BrowserUk (Pope) on Jun 13, 2006 at 10:50 UTC 
Do you only have the one set of known data? As with any form of decoding, the more examples you have, the easier (and more reliable), it becomes.
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.  Rule 1 has a caveat!  Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] [d/l] 

I can lookup other leases, if I am able to find a formula that works with this sequence of numbers.
 Devnul
 [reply] 

 [reply] 
Re: Reverse engineering a formula... by Moron (Curate) on Jun 13, 2006 at 11:57 UTC 
I remember when at school thinking that the intelligence tests of the form "what is the next number in the sequence" only served to prove that the questioners weren't intelligent enough to be testing other people's intelligence.
The reason is that for example, the next number in the sequence 1,2,3,4,5, ? can be anything you like (even 7 x phi is as good an answer as any). Worse still, there are an uncountably infinite number of different formulas that can "justify" the sixth number being 7 x phi. You also need an infinite number of examples to overcome that problem  even if you knew all 200000 answers, they wouldn't help you with the 200001th for the same reason.
Your only recourse is to interview whoever has the business knowledge behind the formula.
 [reply] 

I can argue that, given more precise question, a sort of tasks about sequences like "1,2,3,4,5,?" makes perfect sence.
If you're referring to set of functions such that
f(1) = 1
f(2) = 2
f(3) = 3
f(4) = 4
f(5) = 5
and then calculate f(6) then indeed there surely exists infinite number of different f's conforming these conditions and yet f(6) equals to 7 x phi.
But when you limit answers to functions having only natural numbers, of special kind (let that be polynoms with integral coefficients, with the largest power *no* more than three, then one and only one function fulfills the condition, and this is f(x)=x, and the correct answer is 6.)
Like everywhere else in mathematics, very many things depend on exact phrasing.
addition but I agree that OP has very badly phrased task
update 2+1 small typos  [reply] [d/l] 

Consider the case of y = x + r*cos(pi*x) where r is any real, which produces a sine wiggle around a straight line, where the deviation from the straight line is maximum for halfvalues of y, but for integer values produces the illusion of y=x.
Update oic, you want to limit the types of functions  but without the business knowledge how can you do that? We know it's leasing, but without knowing the business, how do you know it doesn't have a risk component linked to a normal distribution based on the customer's age? That wouldn't be polynomial.
More update: yes fletch is right, s/cos/sin/  my brain did the equivalent of a double negative when I was imagining this.
 [reply] 




With apologies for the thread drift:
Worse still, there are an uncountably infinite number of different formulas that can "justify" the sixth number being 7 x phi.
Surely a "number of different formulas" is an integer, and therefore countable?
 [reply] 

You have a countable and finite number of intersections between a candidate function and the sample data, but the candidates may vary across real space in between the intersections, making them uncountable in number.
 [reply] 

What you say is certainly true in the abstract, but real systems, whether they be physical or financial, tend to follow relatively simple (albeit noisy) models. But I don't think the OP's system is even a real system; I think it's likely to be a very simple model of a system, probably involving exponential growth.
But all of this is guesswork. As others have said, the OP would probably get a great deal more help if only he would supply more data.
 [reply] 
Re: Reverse engineering a formula... by roboticus (Chancellor) on Jun 13, 2006 at 12:09 UTC 
devnul:
I'm not a financial analyst, but it seems to me that if you google payoff, amortization, net present value, etc., you'll get some standard formulae you can use as a starting point. Then you can use the hack and slash method, by playing with proportionality constants, etc., until you can match up the numbers, or you can also google for "curve fitting" and give it a try.
That could be a lot of work, however. Have you considered writing a program to simulate the operator and making your perl script do the keyentry to get the data you need out of the existing machine?
roboticus  [reply] 

roboticus speaks wisely.
Many, many years ago, in the era of Windows 3.1 and Token Ring networks, I had a temp job at the local power company. I was given the task to transfer data from their mainframe application into a database running on the Windows network. As with the original question here, there were several thousand records to transfer and the mainframe application would only display one at a time.
I have no idea how many hours (days!) they expected me to spend doing data entry on this task, nor how they intended to handle the errors that would surely have resulted, but it didn't prove that easy for them to keep me busy... I quickly taught the Windows macro recorder a sequence of keystrokes to copy and paste the data, then read a magazine for about an hour (getting many dirty looks from passersby) and reported the task complete.
And on that day, I learned both the wisdom of simulating a human operator to carry out repetitive tasks and the joy of suffering the reactions of others to one who truly "works smarter, not harder".
 [reply] 
Re: Reverse engineering a formula... by devnul (Monk) on Jun 13, 2006 at 12:38 UTC 
Thanks everyone!
The answers (and, indeed, the questions you've asked) have helped direct my thinking in the right area on this, which is exactly what I was looking for.
Thank you, again....
 Devnul  [reply] 
Re: Reverse engineering a formula... by Gilimanjaro (Hermit) on Jun 13, 2006 at 12:54 UTC 
The following should work; I haven't had the guts to run it on your real data yet, but my minitestset seems to result in an answer...
#!/usr/bin/perl
=cut
my @t = (
'18.930167',
'17.967469',
'0.008720',
'0.008720',
'122.640000',
'12493.320000',
'359.520000',
'288.700000',
'359.520000',
'89.880000',
'32.960000',
'56.920000',
'13.470000',
'1231.360000',
'20587.440000',
'359.520000',
'792.170000',
'629.160000',
'972.290000',
);
my $target = 843.24;
=cut
my @t = (100,2,32,4,65);
my $target = 7;
my @terms = _test($target,@t);
for my $n (0..$#terms) {
next unless $terms[$n];
print (($terms[$n]<0)?'subtract':'add');
print " value number ",$n+1," (",$t[$n],")";
print "\n";
}
print "to obtain total of $target\n";
sub _test {
my ($target,$number,@rest) = @_;
for my $term (0,$number,$number) {
if(@rest==0) {
next unless $target == sprintf('%.2f',$sum+$te
+rm);
return ($term/$number)
}
my @terms = _test(sprintf('%.2f',$target$term),@rest)
+;
next unless @terms;
return (($term/$number),@terms);
}
return;
}
It's a recursive algorithm that will eventually try every possibility until it finds a working set of terms... It will probably take a long time with the real dataset, so you'll want to eliminate as many terms as possible beforehand.
 [reply] [d/l] 
Re: Reverse engineering a formula... by samizdat (Vicar) on Jun 13, 2006 at 13:51 UTC 
 [reply] 
Re: Reverse engineering a formula... by salva (Abbot) on Jun 13, 2006 at 15:34 UTC 
for a completelly different kind of solution you can try using a neuralnetwork package.
First you would need to generate some sample data, feeding some randomly distributed values into your "blackbox" and getting the output.
Then use the obtained set of input/outputs to train the neuralnetwork and once the error goes below your acceptable limit, you could use it to replace the original blackbox.
 [reply] 
Re: Reverse engineering a formula... by Scott7477 (Chaplain) on Jun 13, 2006 at 18:33 UTC 
If you could provide the values for each of the fields that are associated with the $843.24 payoff amount, that would be helpful. The values that you would see in a lease calculation
would typically include the original price of the item, the residual value, the interest rate, the lease term, and perhaps the monthly payment. The range of values for each of those value types can be constrained rather easily. Obviously, the interest rate is going to be a number somewhere between 5 and maybe 30 percent at the outside. The lease term will be either in months or years most likely, so you know that that is going to be a small integer.
Your problem here is one of a type that always makes my blood boil when I hear about them. Regardless of whether a system is proprietary or not, a user ought to be able to retrieve in a simple fashion every piece of data that has been put into the systems. So many of the big name ERP systems make it really easy to put a lot of data in, but then they want to charge you big bucks for "reporting tools" just so you can see what's actually in the system. Every single table should be able to be dumped as a text file or CSV file or something. That's my rant for the day:)  [reply] 
System of Equations by kbrint (Sexton) on Jun 13, 2006 at 20:40 UTC 
Is the equation the same for all data records? If so, then you can just take 19 separate entries and create a system of 19 variables and 19 equations. For example, the first equation would be:
a*18.930167 + b* 17.967469 + c* 0.008720+...+s* 972.290000 = 843.24
After you have 19 or so of those equations, you should be able to solve for a,b,c,d,...
Based on the info you provided, if you get numbers other than {1,0,1} then you know the equation isn't the same for all entries.  [reply] [d/l] [select] 

you are assuming a linear equation that even if it is what is described on the OP, it's very unusual for a lease formula and probably wrong.
 [reply] 
Re: Reverse engineering a formula... by pboin (Deacon) on Jun 13, 2006 at 20:52 UTC 
First of all, 'clunky old EBCDIC' systems still run most of the financial world. Companies like BankofAmerica, Amex, and Visa seem to think they're pretty OK.
You're very likely fighting the system, and not using it the way it's designed. OS/390 applications are very batchcentric, sort of similar to the way Unixy things are pipecentric. I can practically guarantee you that there is a batch interface to what you want to do. Gen one job, or one step per lease and let 'em rip. Google for 'IBM JCL' to get started.
Finally, make sure your managment (and you) understand that you're just making up the lease calc if you go that route. This is money you're dealing with, and money mistakes tend to piss people off in a way that nothing else does. Be *very* careful to have your work audited by a beancounter who knows what they're doing.
Good Luck.
 [reply] 
Re: Reverse engineering a formula... by bobf (Monsignor) on Jun 14, 2006 at 02:58 UTC 
Instead of trying to determine the formula that was used for a (hopefully) complete set of numbers with unknown meaning, why don't you turn the problem around?
Ask the client create a new dummy lease with known input data. Tweak one (and only one) of the parameters and create another new lease. Repeat...
Given enough dummy data (known inputs and outputs) you should very quickly be able to determine which numbers are meaningless, which are significant, how each number affects the final value, and which numbers in the dumped data you need. Add a little algebra to get the formula, and you're done. This approach could be quite a bit simpler than trying to bruteforce it. :)
 [reply] 

