|Perl: the Markov chain saw|
Please provide a hint for me to continue with the rest of my programby pooyan (Initiate)
|on Apr 23, 2013 at 20:37 UTC||Need Help??|
pooyan has asked for the
wisdom of the Perl Monks concerning the following question:
Hello Perl Monks,
I hope you are doing very well.
This is a typical Perl question that is being passed to candidates by recruiters and I cannot yet answer it despite having seen it 3 times. At this point I am determined to solve the problem correctly and grow up because of it. Please give me some hints, just hints as to how best approach the solution.
I am going to post what I know and what I don't know first, and then post the test instructions and the associated DB.pm module, I think this is better:
I know how to use the subroutine from the perl module DB.pm and know that need to connect to a database or to query using dbConnect and query. I have created my own mySQL database from XAMPP with a sample table based on the code. This is a one-column table that gets email addresses and wants to use a Perl program to update another table having count by domain. How do I process and output a count without using complex SQL queries/subroutines (as requested by the test instructions posted below - it says use Perl to process it)? I can use a select * to give me an array of all e-mails, I can go through each record and extract domain so to get an array of domains, then I can sort the array. I think I can go through each record and save it in a temp val (say temp = "yahoo.com"), put this in a hash table (whose key is domain and value is count). In the next pass, I check another record if this is == to temp (so if it is yahoo.com again, update the hashtable with using the key and by incrementing the value - which holds the count. Finally, then I would use another loop and insert statements to translate the hash in to the domain counting table in DB. How correct is this approach?
Another problem I have is that, it says find daily count. since there is no date column in the original table, I am not sure how to just bring up the emails added on a particular day.
For top 50, I just use my sorted hash in descending order and restrict my loop to 50 passes and print the values.
Here is the DB.pm