Our Solution
Up ] A Powerful Program ]

Home
Why the CRP?
What is it?
How to use it.
Testimonials
Research data
Products
Ordering
About CEA
FREE Download

A Higher-order Unit Approach--the Best Approach to Date!

Teaching reading with a whole word approach is likely to end in memory overload and labels of "dyslexia" and "learning disability", unless the learners discover on their own how to "break the code". Phonic generalizations are not very helpful, since most rules are too complex and inconsistent to be effective for reading instruction. Finding the little word in the big word and the typical word family approaches have often failed because the little words and "linguistic elements" employed are not consistent. (This is not to say that a "word family" approach could not be helpful in teaching beginning reading.)

A critical question for reading instruction is, "What are the manageable parts of words?", that is, "What parts of words work in the decoding task?" Nearly 30 years ago, we wrote a computer program to analyze our language in search for parts of words that would be useful in reading instruction. We used, as input to the computer, those words that occurred three times or more per million words of running text, taken from a list produced at Brown University by Henry Kucera and W. Nelson Francis. Our list consisted of the most frequently occurring approximately 18,000 words in the language. Since that time, because of the introduction of micro computers and inexpensive storage capabilities, we have incorporated the entire Francis-Kucera list of over 44,000 words into our analyses. Kucera and Francis sampled a wide variety of types of literature to produce their word list. Thus the parts of words we have discovered are useful for virtually any reading materials, not just for children's books or books created by educators for use in teaching reading.

The computer was programmed to find all letter combinations from two to seven letters in length. Over 80,000 such units were found. Which of these thousands of units should be used in reading instruction? To help answer this question, the computer was also programmed to list all the words in which each unit occurred and the frequency of each word in running text. This information enabled us to determine the frequency and consistency of pronunciation of any unit.

The answer to the question of what parts of words to teach seems rather obvious, those that occur in many words and are pronounced consistently. Table 1 is a small portion of the first page of the output for the unit "at." The bigram "at" occurs 1665 times in the 18,000 words we analyzed, i.e., in more than 9% of the words. 

Table 1
A Sample of the Computer Output
The Bigram "at", An Inconsistent Unit

665*
150
127
97
72

great
sat
somewhat
heat
boat

68
60
56
54
51

beat
fat
hat
seat
throat
45
43
31
27
26
meat
coat
defeat
combat
repeat
26
23
23
treat
cat
sweat
*The number before each word is the number of times that word occurs in approximately 1,000,000 words of running text as found in the Kucera and Francis corpus of words.

It is clear from Table 1, however, that "at" has several pronunciations. For this reason it is not useful, even though it has been and continues to be used in various reading materials to teach beginners such words as "cat", "rat", "sat", "pat", etc. In fact "at" is pronounced as in "cat" in only 13% of the words in which it occurs. Thus those who teach a learner "at", as a word part or unit that is pronounced as in "cat", are teaching that learner to be wrong in over 85% of the words containing "at". Obviously there are many word parts or units that should not be presented in beginning reading instruction. The bigram "in" is another example of a unit often presented very early in reading programs that is also highly inconsistent. Most occurrences of "in" are in the trigram "ing". To teach such units as word parts is to teach the learner to respond incorrectly more often than correctly, just as the "two-vowel" rule does.

As indicated previously, the computer also yielded the same information for the three-letter combinations (trigrams). The trigram "ing", for example, occurs 1554 times in 18,000 words examined. In all but 17 of these 1554 occurrences, "ing" is pronounced as in "sing." Thus one who learns to pronounce "ing" as in "sing" will be correct 99% of the time. Obviously "ing" is a word part that works and can be used to help the beginning reader. Another useful trigram is "ack", illustrated in Table 2. It is highly consistent and occurs in many words.

Table 2
The Trigram "ack", A Consistent Unit

967*
203
110
105
92

back
black
lack
attack
jack

38
10
9
9
9

track
halfback
rack
slack
stack

8
6
4
4
3

sack
smack
snack
knack
Cossack

3
3
3

counterattack
feedback
horseback
*The number before each word is the number of times that word occurs in approximately 1,000,000 words of running text as found in the Kucera and Francis corpus of words.

At this point it will be useful to distinguish between two kinds of frequencies, "word frequency" and "sheer frequency". Thus far we have been discussing word frequency, i.e., the number of times the unit occurs in the 18,000 word list; "ing" occurs in over 1500 words and "in" in over 3000 words. By contrast, sheer frequency takes into consideration the frequency of occurrence in running text of the words in which the unit occurs. The unit "ich" is a good illustration of these two kinds of frequencies, since it occurs in only 19 different words in the 18,000 words originally examined. (See Table 3).

Table 3
"Word Frequency" and "Sheer Frequency"
An Illustration using "ich"

3562
74
27
21
12

which
rich
Greenwich
Michigan
Richmond

10
7
6
6
6

sandwich
riches
Munich
Reich
richness

6
6
6
5
5

sandwiches
whichever
cliche
cliches
enrich

5
5
4
3

richer
richly
enrichment
niche
*Pronunciation other than the predominant one.

If we consider word frequency, "ich" is not a very consistent unit. It is pronounced as in "which" in only 14 of the 19 words in which it occurs (the underlined words). This means that it is less than 75% consistent, (14 divided by 19 multiplied by 100 = 73.68%). But when we consider sheer frequency, i.e., the frequency of "ich" in running text, a very different picture emerges. By summing the frequencies of all of the words containing "ich", i.e, 3562 for "which", 74 for "rich", 27 for "Greenwich", etc., we find that "ich" occurs 3770 times in approximately 1,000,000 words of running text. We call this "sheer frequency". Summing only the frequencies of the underlined words, in which "ich" is pronounced as in "rich", we get a total of 3726. Thus "ich" is pronounced as in "which" in 3726 out of its 3770 occurrences in running English text, which is over 98% of its occurrences (3726 divided by 3770 multiplied by 100 = 98.83%). In other words, when reading our language, if one encounters the trigram "ich" and responds with the pronunciation as in "which", that response will be correct over 98% of the time. This percent consistency or sheer frequency then is an expression of the probability of a particular pronunciation of a unit being correct in our language. In selecting units for the lessons of the CRP, sheer frequency is always used. In fairness please note that when I was decrying the lack of consistency of the unit "at", about 13%, I cited its word frequency, because it is easier to understand. Actually the bigram "at" is not quite that "bad". In part because the word "at" is such a high frequency whole word, when its consistency is calculated using sheer frequency, it goes up to about 48%. Not as bad but still poor!

 

 

 

 

 

Click here to go to the beginning of this section.

Click here to see the power of the program