Re^3: please reply

in reply to Re^2: please reply
in thread please reply

Second CountZero's objection to lack of input data.

However, when I run your code, supplying a test file (multi-char strings, some of which satisfy either of two contradictory definitions of bigrams^* and some non-bigram strings such as nobigram, each separated by 2 newlines), your code returns " : 1" for pairs of words -- each concatenated with the following word (except in the case of "nobigram" which is first concatenated with the preceding word and then with the following word).

# output:
Bigrams
aabaaacc : 1
ccnobigram : 1
nobigramabcda : 1
abcdabcdaaaa : 1
bcdaaaanobigram : 1
nobigrambbhbbbb : 1
bbhbbbbbbb : 1
bbbaba : 1
abaaaaabbb : 1
aaaabbbcccccccccc : 1
cccccccccccbbbaaaccc : 1
cbbbaaacccccbbaaccc : 1
[download]

I cannot reproduce your "loop is ended after splitting EXCEPT (and then not perfectly) by removing the doubled curly braces at Lns 9-10 and 22-23, in which case only a single instance of the final two words of sample data are returned (along with the " : 1"). (Update: That's not a correct count for either definition cited for a bigram)

As to "why found is used," it seems possible, in the overly limited context you've provided, that it's intended to be a counter -- a variable in which to stash the number of bigrams found. I realize that seems exessivly obvious, but, IMO, it's the only obvious possible-answer.

In any case, if counting is your intent, please see http://search.cpan.org/~emorgan/Lingua-EN-Bigram-0.01/lib/Lingua/EN/Bigram.pm (or some fork for the language in which your interests lie).

^*Definitions vary:

Wikikpedia says "A bigram or digram is every sequence of two adjacent elements in a string of tokens, which are typically letters, syllables, or words; they are n-grams for n=2.

while
The Free OnLine Dictionary defines a bigram as a two-letter word (FOL is NOT, IMO, a reliable source, but Merriam-Webster and others define bigram only for those using paid access or their (one-shot) free trial).

For clarity, here is the content (verbatim) of the text file:

aabaaa 

cc 

nobigram 

abcda 

bcdaaaa 

nobigram 

bbhbbbb 

bbb 

aba 

aaaabbb 

cccccccccc 

cbbbaaaccc 

ccbbaaccc
[download]

Comment on Re^3: please reply Select or Download Code

In Section Seekers of Perl Wisdom