Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

I need help Badly....I'm very overwehlemed

by JPG73
on Oct 31, 2010 at 20:56 UTC ( #868635=perlquestion: print w/ replies, xml ) Need Help??
JPG73 has asked for the wisdom of the Perl Monks concerning the following question:

I need to figure out a way to complete this coding and I am very lost. Please help. Here are the parameters.

As usual, the day and a life of a bioinformatician calls for making the large task small. We have data from the Unigene database from six different mammals, which the company you are working for needs data about tissue expression for a given gene "on Demand". Of course one could open the directory and look for the gene file, and then go through to view the tissue expression, but this would become tedious very quickly. You are to write a program with command line flags. If you need to see how to use command line flags see the solutions for assignment #2 and assignment #3

The data for these six organisms can can be found here: (DO NOT COPY THESE FILES!, DO NOT COPY THESE FILES). So if you are reading this do not copy these files, b/c they take up too much room on the server. Instead, create a variable and set it to the directory: my $uniGene = '/data/PROGRAMMING/assignment4'; Now you can access the data using this convention: my $infile = "$uniGene/$org/$gene"; # $org is the name of the organism we will search, $gene is the gene file. Since we left the trailing / of the $uniGene we can separate two variable with the / I always declare my directories in this fashion. Believe me, it will save you headaches later on. Take a look at one of the files (Note the ending, looks like something which should be coded in a variable): less /data/PROGRAMMING/assignment4/Homo_sapiens/TIMM9.unigene Your program should have two command line flags, name the FIRST FLAG 'host' which tells the program which directory to look in for data. One thing this program should do is take common names as well as scientific names, so:| Homo_sapiens or Homo sapiens or Human or Humans Bos_tarus or Bos tarus or Cow or Cows Equus_caballus or Equus caballus or Horse or Horses Mus_musculus or Mus musculus or Mouse or mice Ovis_aries or Ovis aries or Sheep or Sheeps Rattus_norvegicus or Rattus norvegicus or Rat or Rats This will allow a little flexibility in the flag, but if the directory does not exist, the user should be warned (see subroutine 3, below), and displayed the directories which do exist. Tell the user that the search is case sensitive. We will learn later how to do case-insensitive. Name the SECOND FLAG 'gene'. It will take a gene name like 'PWRN1, ESF1, PVRL1, etc. This flag will be used to see if the gene exists in the given host directory, if it does it will be used for the data, if not, tell the user it does not exist and exit (see subroutine 4). We will begin to modularize code, so if I tell you to write a subroutine, and you do not write a subroutine for part of the code, you will loose 5 points each time you fail to write a subroutine! Also, name the subroutine exactly how I name it. Finally, conform to the way I have you write the subroutine, follow the outline for the subroutiens. You should have a total of four subroutines for this assignment. Feel free to write additional subroutines if you feel it will help. Subroutines 1). Write a subroutine (call it getGeneData, called in scalar context) that receives two arguments: 1). A gene name. 2). A host name. This subroutine opens the file for the host and gene, extracts the list of tissues in which this gene is expressed and returns a reference to a sorted array of the tissues. Remember at this point the directory has been checked to make sure it exists, so you don't have to worry about it failing at this point, but you should still use the proper file opening check! Process the file line by line. Hint: In order to get the tissue(s), use this: if(/^EXPRESS\s+(.*)/){ my $tissues = $1; } Don't worry right now about what's happening with this code, it's a regular expression, and we capture what's in parentheses, which then get placed in $1. Do understand that the scalar $tissues now contains all the tissues. You should know how to get those into an array and then subsequently sort the array in alphabetical order. 2). Write another subroutine (call it printOutput, called in void context) which receives three arguments: 1). An array reference which was returned from getGeneData. 2). The gene name searched. 3). The host name given at the CLI. This subroutine should print the tissue expression data for the gene. The output should have the format seen below (OUTPUT FORMAT). 3). Write another subroutine (call it directoriesWhichExist, called in void context). which receives 0 arguments. If the user asks for a directory that does not exist, this subroutine is called, and prints out the directories which do exist, like we see above. If this subroutine is called, it exits the program. 4). Write the last subroutine (call it isValidGeneName, called in void context) which receives two arguments. 1). A gene name. 2). A host name. This subroutine will check to make sure the given gene name exists, if it does it returns a 1, else it returns a 0. You should then use this subroutine as follows: if ( isValiedGeneName($geneName, $host) ){ print "Found Gene Name for $host\n"; } else{ print "This Gene Name does not exists for $host, exiting now\n"; exit; } This is a very useful programming convention b/c the subroutine can be used in decision statements, like we did above. OUTPUT FORMAT: (example is for the Human TGM1 gene): In Homo sapiens, There are 41 tissues that TGM1 is expressed in: 1. adipose tissue 2. adult 3. bladder 4. bladder carcinoma 5. brain 6. breast (mammary gland) tumor 7. cervical tumor 8. cervix 9. colorectal tumor 10. embryoid body 11. embryonic tissue 12. esophageal tumor 13. esophagus 14. eye 15. fetus 16. germ cell tumor 17. head and neck tumor 18. intestine 19. kidney 20. kidney tumor 21. larynx 22. lung 23. mammary gland 24. mouth 25. muscle 26. neonate 27. non-neoplasia 28. normal 29. ovarian tumor 30. ovary 31. pancreas 32. pancreatic tumor 33. pharynx 34. placenta 35. skin 36. skin tumor 37. thymus 38. trachea 39. umbilical cord 40. uterine tumor 41. uterus Any help would be amazing and please I know this is a lot but I am staring at a blank black page with no idea how to start.
If you reply in a private message that would be helpful too. Thnak you, Joe

Comment on I need help Badly....I'm very overwehlemed
Re: I need help Badly....I'm very overwehlemed
by talexb (Canon) on Oct 31, 2010 at 21:24 UTC

      I need to figure out a way to complete this coding and I am very lost. Please help. Here are the parameters.

      As usual, the day and a life of a bioinformatician calls for making the large task small. We have data from the Unigene database from six different mammals, which the company you are working for needs data about tissue expression for a given gene "on Demand". Of course one could open the directory and look for the gene file, and then go through to view the tissue expression, but this would become tedious very quickly.

      You are to write a program with command line flags. If you need to see how to use command line flags see the solutions for assignment #2 and assignment #3 The data for these six organisms can can be found here: (DO NOT COPY THESE FILES!, DO NOT COPY THESE FILES). So if you are reading this do not copy these files, b/c they take up too much room on the server.

    Hmm .. this looks a lot like a homework assignment.

    How far have you got on this? We'll be glad to help you if you're stuck on an issue with Perl syntax, but we can't and won't do your homework for you.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: I need help Badly....I'm very overwehlemed
by toolic (Chancellor) on Oct 31, 2010 at 21:31 UTC
      I am a bioinformatician and that does look like a homework assignment to me too. I'd love to be given a brief that specific. I too am more than happy to help but I'm not altruistic enough to try and wade my way through that giant post. If you would care to summarise it for me I'll help.
        Joe, you need to learn etiquette of forum use. People in forums won't do your homework and they certainly won't read giant pieces of text like that. Could you not have taken the trouble to edit it? For example, we do not have to know the 41 different types of tissue to understand the problem.
Re: I need help Badly....I'm very overwehlemed
by Anonymous Monk on Nov 01, 2010 at 01:03 UTC
    The reason you and my wife are "overwhelmed" is that you TALK TOO MUCH!!! If I wasted my time reading your dense post I would be overwhelmed too. --brevity is the soul of wit
      And the ability to break problem into solvable parts is the soul of engineering. The writer of the original post obviously completely lacks this skill and that is the reason such HOMEWORK assignments are assigned. To JPG73 find another line of study ... journalism maybe ... plagiarism and lack of critical thinking ability seems to be acceptable in that field if not encouraged. :P
Re: I need help Badly....I'm very overwehlemed
by mjscott2702 (Pilgrim) on Nov 01, 2010 at 09:11 UTC
    Hmm, I echo the sentiments of the other responders here. Might I suggest you check out: http://www.catb.org/~esr/faqs/smart-questions.html - little bit lengthy (ironically), but gives some excellent advice on how to ask a question, and maybe even get somebody to do your homework for you.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://868635]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2014-07-29 04:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls