Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

please reply

by an (Initiate)
on Jan 06, 2013 at 09:58 UTC ( #1011866=perlquestion: print w/ replies, xml ) Need Help??
an has asked for the wisdom of the Perl Monks concerning the following question:

explain this code

@bigrams=(); while(<>){ chomp; push @words,split(/\s+/,$_); } for($i=0;$i<$#words;$i++){ $bigram=$words[$i].$words[$i+1]; $found=-1; for($index=0;$index<=$#bigrams;$index++){ { if($bigrams[$index] eq $bigram){ $found=$index; } } } if($found>-1){ $bigramfrequency[$found]++; } else{ push(@bigrams,$bigram); $bigramfrequency[$#bigrams]++; } } print"Bigrams\n"; for($index=0;$index<@bigrams;$index++){ print"$bigrams[$index] : $bigramfrequency[$index]\n"; }

Comment on please reply
Download Code
Re: please reply
by LanX (Canon) on Jan 06, 2013 at 10:02 UTC

      lolol That really was ripe for the picking. It would have been irresponsible not to go for it.

Re: please reply
by Anonymous Monk on Jan 06, 2013 at 10:17 UTC

      wanted to know why while loop is ended after splitting and why found is used

        For lack of the input data this program expects, it is impossible to say what this program is trying to achieve.

        It runs, but it seems not to output anything meaningful, so it is very well possible it's logic is wrong.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
        Second CountZero's objection to lack of input data.

        However, when I run your code, supplying a test file (multi-char strings, some of which satisfy either of two contradictory definitions of bigrams* and some non-bigram strings such as nobigram, each separated by 2 newlines), your code returns " : 1" for pairs of words -- each concatenated with the following word (except in the case of "nobigram" which is first concatenated with the preceding word and then with the following word).

        # output: Bigrams aabaaacc : 1 ccnobigram : 1 nobigramabcda : 1 abcdabcdaaaa : 1 bcdaaaanobigram : 1 nobigrambbhbbbb : 1 bbhbbbbbbb : 1 bbbaba : 1 abaaaaabbb : 1 aaaabbbcccccccccc : 1 cccccccccccbbbaaaccc : 1 cbbbaaacccccbbaaccc : 1

        I cannot reproduce your "loop is ended after splitting EXCEPT (and then not perfectly) by removing the doubled curly braces at Lns 9-10 and 22-23, in which case only a single instance of the final two words of sample data are returned (along with the " : 1"). (Update: That's not a correct count for either definition cited for a bigram)

        As to "why found is used," it seems possible, in the overly limited context you've provided, that it's intended to be a counter -- a variable in which to stash the number of bigrams found. I realize that seems exessivly obvious, but, IMO, it's the only obvious possible-answer.

        In any case, if counting is your intent, please see http://search.cpan.org/~emorgan/Lingua-EN-Bigram-0.01/lib/Lingua/EN/Bigram.pm (or some fork for the language in which your interests lie).


        *Definitions vary:

        • Wikikpedia says "A bigram or digram is every sequence of two adjacent elements in a string of tokens, which are typically letters, syllables, or words; they are n-grams for n=2.
           
          while
           
        • The Free OnLine Dictionary defines a bigram as a two-letter word (FOL is NOT, IMO, a reliable source, but Merriam-Webster and others define bigram only for those using paid access or their (one-shot) free trial).

        For clarity, here is the content (verbatim) of the text file:

        aabaaa cc nobigram abcda bcdaaaa nobigram bbhbbbb bbb aba aaaabbb cccccccccc cbbbaaaccc ccbbaaccc

        wanted to know why while loop is ended after splitting and why found is used

        :) Curious, because this isn't what you asked for :)

        :) but you can find out by understanding what each part of the program does, and inserting print/Data::Dumper parts at various points in the program

        If you limit the input while Dumper-ing it should be easier to follow

Re: please reply
by karlgoethebier (Curate) on Jan 06, 2013 at 13:20 UTC
    "explain this code"

    No.

    Best regards, Karl

    Update:... doesn't this look better...?

    Update2:... and your data...?

    Please see the recommendations above how to continue...

    «The Crux of the Biscuit is the Apostrophe»

Re: please reply
by CountZero (Bishop) on Jan 06, 2013 at 19:15 UTC
    Could this be related to exercise 4.4 of exercise solutions (part of a Perl course taught at CNTS - Computational Linguistics at the University of Antwerp)?

    The problem to solve is:

    Write a program that reads text from standard input until end-of-file, and then prints the frequency of each bigram that occurs in the text. And this without hashes.
    If that is indeed the "solution" to this problem, it is not working.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1011866]
Approved by muba
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (8)
As of 2014-07-30 01:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls