Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Waiting ...

by PerlingTheUK (Hermit)
on May 31, 2005 at 18:54 UTC ( [id://462168]=perlmeditation: print w/replies, xml ) Need Help??

I am one of those who have never written a meditation. I have never thought about a reason to write one.
Right now, I feel like I really should ask why other people meditate.
I for example have spent all day messing with Excel. Someone gave me a 66MByte Excel file with almost 200 sheets, seriously asking me to look up values copy and paste some of these tables into other tables, where he had some surely clever but nevertheless completely non-understandable formulas, then take some other XML-File - preferably copy it into Excel, then manually search some entries, okay I guess you got the point.
Maybe you are not wondering anymore why I learned Perl. However, my first thought was to automate this copying and pasting. But I did not make my homework and expecting that I could easily open a 66 Megabyte file in Spreadsheet::ParseExcel or just edit files and not create new ones proved to be a bit naive.
So I found myself rushing home at lunch, getting my old VBA Excel book, which until today I considered the biggest waste in money during all my university years and got myself into OLE.
I did not believe, it worked. Just let me mention that even now my VBA Excel book is the biggest waste of money. I did find one or two question as to how to format strings in there but I realized I googled that sort of result up much faster anyway.
So now I have mastered this little script, ripping an Excel file into something useful, say a textfile. But I should have expected that if a human somewhere in some dark office of some dark company building in the middle of London, has a system that gets data out of a database into Excel, which surely is wrong anyway, this person does not trust the data. Therefore reviews it. Okay okay, that sounds harmless, but it is not. Some random values bottom line, top line, somewhere between those two were manually reformatted.
Sure, by checking if a time is still a time after I converted it to text, or controlling dates that way I can find out if this date is right. And thank you my dear unnamed friend also for formatting the grid of 5 of these two hundred odd sheets, at least that does not cause me any problems. More so however does it cause me problems that 5 sheets do not have a title row - Did I mention that same values where in different columns in different sheets, to make things not too simple? Such an easy assumption, match the title against a pattern, get a your_column my_column hash and so one, but no.
Okay that is where I got up two: I can now identify 4 different time and 5 different date formats, I am looking up to 70 rows deep into the worksheets to identify of a time is Arrival or delay, because headers and many entries are missing,...
And yet I still expect to find some new formats that get my files to be unusable.
And this is where Excel is doing the best job. to get Each value in this file, takes 2 hours, so parallel to writing my text files. I have some functions that will alert me as soon as values seem to be wrong. Hopefully this time, I can fix them afterwards in the text file.
And this is where this circle closes. I am writing this Meditation because I have nothing else to do but a deadline to keep and have to make sure everything was converted right and start matching tomorrow.
I am curious to hear if other people have similar experiences? It is sometimes frustrating but at least then there is Perl ;) and the Monks.

Cheers,
PerlingTheUK

Replies are listed 'Best First'.
Be wary of the dark side
by mugwumpjism (Hermit) on May 31, 2005 at 21:28 UTC

    A Jedi relies not only on his lightsaber, but can quickly adapt to strange and unusual fighting situations with weapons at hand that are more specialised for the task.

    Normally it is recommended that young jedi accolytes meditate deeply on the concept of computing, clearing their mind of software that is of commercial nature. So, naturally you will find fewer solutions to problems that are entrenched deep in the dark side of the software world.

    Your large spreadsheet may be easily solved via VB and OLE for now, but perhaps if you considered making nice ways to import new spreadsheets to the database, and write new spreadsheets from information in the database, then the next problem that you encounter manipulating this data will be more easily tackled.

    On the other hand, sometimes it is a better idea to simply use these tools to achieve small tasks, and put them down though you know that they are not as flexible and maintainable as a fully Free Software engineered solution. If the company you are working for has a culture of passing such things around, then you might be better off making sure that you make as good relations with your collegues on an interpersonal level as you can, use Perl for the small places it fits in such an enterprise, then wait for the right time, and move on to a company that fits your clear love of Free Software more fully.

    Fighting for Free Software is good, but the conditions and culture must be present for it to flourish. You have learned a valuable lesson, young jedi.

    $h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/." ."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n";
Re: Waiting ...
by rinceWind (Monsignor) on May 31, 2005 at 23:17 UTC

    Whilst using Spreadsheet::ParseExcel is an option, there are other options, such as opening the input data with Excel and saving the original spreadsheet as a .csv and parsing that.

    A text editor can help with basic, simple changes, such as extra heading rows and superfluous rows at the bottom. But you can use the power of Perl to really munge the data; Perl is fast at reading large text files, building hashes, etc.

    I hope I haven't missed the point of what you are doing

    --
    Missquoting Janis Joplin:


    Oh Lord, won’t you burn me a Knoppix CD ?
    My friends all rate Windows, I must disagree.
    Your powers of persuasion will set them all free,
    So oh Lord, won’t you burn me a Knoppix CD ?

      Splitting out to .csv files is a good idea, but it does have some issues.  First, a .csv can only save out one sheet at a time; manually saving out 200+ sheets would be mind-numbingly painful (hopefully VB would/should be able to automate this task).  Second, the resulting 200+ files are coming close to the maximum number of open file handles a 32 bit Perl can handle (depending on how many derivative files are created it could be uncomfortably close).  The number of file handles becomes an issue because it sounds like the OP is doing some massive cross-referencing between sheets (meaning most, if not all, of the .csv files would need to be open).  Seconding a previous post, this file should have been entered into a database long ago...

        That is exactly what I did. All sheets had one of two distinctive formats and I read them and joined them in two csv files.
        Manually doing this however was not an option as this manual data processing is exactly what is so mind numbing and what I hate.

        Cheers,
        PerlingTheUK
      You exactly got the point. I had this Excel spread sheet and had to do some clever data matching. Just reading excel is to slow so I wanted to get it into a scv format. Only 2Gig of memory are not sufficient to open a 66MByte Excel File in Spredsheet::ParseExcel, So i did this directly via Win32::OLE, which opens Excel to communicate directly. However, communicating and formatting 12MByte (Actual Text data) took 2 long hours to parse.
      It somehow just strikes me that is quite slow and in CSV file data rates equals 13.6kbps. Why do I think Office is a big pile of s+$%^?

      Cheers,
      PerlingTheUK
        It somehow just strikes me that is quite slow and in CSV file data rates equals 13.6kbps. Why do I think Office is a big pile of s+$%^?

        I wonder if OpenOffice Calc would fare any better, as it can read and write .xls format.

        --

        Oh Lord, won’t you burn me a Knoppix CD ?
        My friends all rate Windows, I must disagree.
        Your powers of persuasion will set them all free,
        So oh Lord, won’t you burn me a Knoppix CD ?
        (Missquoting Janis Joplin)

Re: Waiting ...
by jonadab (Parson) on Jun 01, 2005 at 11:23 UTC
    Someone gave me a 66MByte Excel file with almost 200 sheets, seriously asking me to look up values copy and paste some of these tables into other tables, where he had some surely clever but nevertheless completely non-understandable formulas, then take some other XML-File - preferably copy it into Excel, then manually search some entries, okay I guess you got the point.

    Are you really a data-entry lackey, or are you getting paid a programmer's wages to copy and paste stuff from one place in a spreadsheet to another? (I guess I shouldn't ask why the formulas couldn't refer to the correct cells in the first place...)

    I read a story once (in Reader's Digest, I think) about a finish carpenter in the cabinetry business, who showed up at a home where he was supposed to resurface the kitchen counter, but when he got there, nobody was home and the counter was covered in dirty dishes. So he washed the dishes, dried them, and put them away. A coworker asked him why he did that... "That's not in your job description. You could have just left and gone back another day." But he said, "Hey, if she wants to pay $60/hour to have her dishes washed, that's fine with me."

      Don't worry my wage is quite reasonable. But all this sort of things comes with the tremendous freedom of working a tiny company. Basically when it comes to how to do software design decisions, you get to do ut your way. If you understand it as a challenge and actually build something yourself it is a damn reward to see it grow.
      Another Advantage of a small company is that you get to do all sorts of work and as there is no one to do the real monkey work you truely get to got on site just to take photos and are still paid good pounds an hour.

      Cheers,
      PerlingTheUK
Re: Waiting ...
by abhinavvaid (Acolyte) on Jun 01, 2005 at 07:52 UTC
    I have been using perl along with OLE for quite some time now. And trust me, you are on the right track. The only trick in saving the time is getting the right object name/accessing its methods/properties etc. Though Microsoft does provide OLEviewer/registry. But I do not find it quite helpful. As you need to search and end up spending a lot of time. But I did find "winspy" and "windowse" quite useful for the said task. Not only excel, it works quite well in all the Office appls that include word/PPT/MSOutlook etc. Not only MS, I have used it in automating other tasks as well(like buiding and testing installers)etc.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://462168]
Approved by gmax
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-19 21:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found