's objection to lack of input data.
However, when I run your code, supplying a test file (multi-char strings, some of which satisfy either of two contradictory definitions of bigrams* and some non-bigram strings such as nobigram, each separated by 2 newlines), your code returns " : 1" for pairs of words -- each concatenated with the following word (except in the case of "nobigram" which is first concatenated with the preceding word and then with the following word).
aabaaacc : 1
ccnobigram : 1
nobigramabcda : 1
abcdabcdaaaa : 1
bcdaaaanobigram : 1
nobigrambbhbbbb : 1
bbhbbbbbbb : 1
bbbaba : 1
abaaaaabbb : 1
aaaabbbcccccccccc : 1
cccccccccccbbbaaaccc : 1
cbbbaaacccccbbaaccc : 1
I cannot reproduce your "loop is ended after splitting EXCEPT (and then not perfectly) by removing the doubled curly braces at Lns 9-10 and 22-23, in which case only a single instance of the final two words of sample data are returned (along with the
" : 1"). (Update: That's not a correct count for either definition cited for a bigram)
As to "why found is used," it seems possible, in the overly limited context you've provided, that it's intended to be a counter -- a variable in which to stash the number of bigrams found. I realize that seems exessivly obvious, but, IMO, it's the only obvious possible-answer.
In any case, if counting is your intent, please see http://search.cpan.org/~emorgan/Lingua-EN-Bigram-0.01/lib/Lingua/EN/Bigram.pm (or some fork for the language in which your interests lie).
- Wikikpedia says "A bigram or digram is every sequence of two adjacent elements in a string of tokens, which are typically letters, syllables, or words; they are n-grams for n=2.
- The Free OnLine Dictionary defines a bigram as a two-letter word (FOL is NOT, IMO, a reliable source, but Merriam-Webster and others define bigram only for those using paid access or their (one-shot) free trial).
For clarity, here is the content (verbatim) of the text file:
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||