








| |
A Higher-order Unit Approach--the Best Approach to Date!
Teaching reading with a whole word approach is
likely to end in memory overload and labels of "dyslexia" and
"learning disability", unless the learners discover on their own
how to "break the code". Phonic generalizations are not very helpful,
since most rules are too complex and inconsistent to be effective for
reading instruction. Finding the little word in the big word and the typical
word family approaches have often failed because the little words and
"linguistic elements" employed are not consistent. (This is not
to say that a "word family" approach could not be helpful in teaching
beginning reading.)
A critical question for reading instruction is,
"What are the manageable parts of words?", that is, "What parts
of words work in the decoding task?" Nearly 30 years ago, we wrote a
computer program to analyze our language in search for parts of words that would
be useful in reading instruction. We used, as input to the computer, those words
that occurred three times or more per million words of running text, taken from
a list produced at Brown University by Henry Kucera and W. Nelson Francis. Our
list consisted of the most frequently occurring approximately 18,000 words in
the language. Since that time, because of the introduction of micro computers
and inexpensive storage capabilities, we have incorporated the entire Francis-Kucera
list of over 44,000 words into our analyses. Kucera and Francis sampled a wide
variety of types of literature to produce their word list. Thus the parts of
words we have discovered are useful for virtually any reading materials, not
just for children's books or books created by educators for use in teaching
reading.
The computer was programmed to find all letter
combinations from two to seven letters in length. Over 80,000 such units were
found. Which of these thousands of units should be used in reading
instruction? To help answer this question, the computer was also programmed
to list all the words in which each unit occurred and the frequency of each word
in running text. This information enabled us to determine the frequency and
consistency of pronunciation of any unit.
The answer to the question of what parts of
words to teach seems rather obvious, those that occur in many words and are
pronounced consistently. Table 1 is a small portion of the first page of the
output for the unit "at." The bigram "at" occurs 1665 times
in the 18,000 words we analyzed, i.e., in more than 9% of the words.
|
Table 1
A Sample of the
Computer Output
The Bigram
"at", An Inconsistent Unit
|
665*
150
127
97
72 |
great
sat
somewhat
heat
boat |
68
60
56
54
51 |
beat
fat
hat
seat
throat |
45
43
31
27
26 |
meat
coat
defeat
combat
repeat |
26
23
23 |
treat
cat
sweat |
|
*The number before each word is the number of times that word occurs in
approximately 1,000,000 words of running text as found in the Kucera and
Francis corpus of words. |
|
It is clear
from Table 1, however, that "at" has several pronunciations. For this
reason it is not useful, even though it has been and continues to be used in
various reading materials to teach beginners such words as "cat",
"rat", "sat", "pat", etc. In fact "at"
is pronounced as in "cat" in only 13% of the words in which it occurs.
Thus those who teach a learner "at", as a word part or unit that is
pronounced as in "cat", are teaching that learner to be wrong in over
85% of the words containing "at". Obviously there are many word parts
or units that should not be presented in beginning reading instruction.
The bigram "in" is another example of a unit often presented very
early in reading programs that is also highly inconsistent. Most
occurrences of "in" are in the trigram "ing". To teach such
units as word parts is to teach the learner to respond incorrectly more often
than correctly, just as the "two-vowel" rule does.
As indicated previously, the computer also
yielded the same information for the three-letter combinations (trigrams). The
trigram "ing", for example, occurs 1554 times in 18,000 words
examined. In all but 17 of these 1554 occurrences, "ing" is pronounced
as in "sing." Thus one who learns to pronounce "ing" as in
"sing" will be correct 99% of the time. Obviously "ing" is a
word part that works and can be used to help the beginning reader. Another
useful trigram is "ack", illustrated in Table 2. It is highly
consistent and occurs in many words.
|
Table 2
The Trigram "ack",
A Consistent Unit
|
967*
203
110
105
92 |
back
black
lack
attack
jack |
38
10
9
9
9 |
track
halfback
rack
slack
stack |
8
6
4
4
3 |
sack
smack
snack
knack
Cossack |
3
3
3 |
counterattack
feedback
horseback |
|
*The number before each word is the number of times that word occurs in
approximately 1,000,000 words of running text as found in the Kucera and
Francis corpus of words. |
|
At this point it will be useful to distinguish
between two kinds of frequencies, "word frequency" and "sheer
frequency". Thus far we have been discussing word frequency, i.e.,
the number of times the unit occurs in the 18,000 word list; "ing"
occurs in over 1500 words and "in" in over 3000 words. By contrast, sheer
frequency takes into consideration the frequency of occurrence in running text
of the words in which the unit occurs. The unit "ich" is a good
illustration of these two kinds of frequencies, since it occurs in only 19
different words in the 18,000 words originally examined. (See Table 3).
|
Table 3
"Word
Frequency" and "Sheer Frequency"
An Illustration using
"ich"
|
3562
74
27
21
12 |
which
rich
Greenwich
Michigan
Richmond |
10
7
6
6
6 |
sandwich
riches
Munich
Reich
richness |
6
6
6
5
5 |
sandwiches
whichever
cliche
cliches
enrich |
5
5
4
3 |
richer
richly
enrichment
niche |
|
*Pronunciation other than the
predominant one. |
|
If we consider word frequency, "ich"
is not a very consistent unit. It is pronounced as in "which" in only
14 of the 19 words in which it occurs (the underlined words). This means that it
is less than 75% consistent, (14 divided by 19 multiplied by 100 = 73.68%). But
when we consider sheer frequency, i.e., the frequency of "ich"
in running text, a very different picture emerges. By summing the frequencies of
all of the words containing "ich", i.e, 3562 for
"which", 74 for "rich", 27 for "Greenwich", etc.,
we find that "ich" occurs 3770 times in approximately 1,000,000 words
of running text. We call this "sheer frequency". Summing only the
frequencies of the underlined words, in which "ich" is pronounced as in
"rich", we get a total of 3726. Thus "ich" is pronounced as
in "which" in 3726 out of its 3770 occurrences in running English
text, which is over 98% of its occurrences (3726 divided by 3770 multiplied by
100 = 98.83%). In other words, when reading our language, if one encounters the
trigram "ich" and responds with the pronunciation as in
"which", that response will be correct over 98% of the time. This
percent consistency or sheer frequency then is an expression of the probability
of a particular pronunciation of a unit being correct in our language. In
selecting units for the lessons of the CRP, sheer frequency is always used. In
fairness please note that when I was decrying the lack of consistency of the
unit "at", about 13%, I cited its word frequency, because it is easier
to understand. Actually the bigram "at" is not quite that
"bad". In part because the word "at" is such a high
frequency whole word, when its consistency is calculated using sheer frequency,
it goes up to about 48%. Not as bad but still poor!


Click here to go to the
beginning of this section.
Click here to see the power of the
program
|