Number of Letters:
- LEN: This is simply the number of letters within the string.
Frequency of Orthographic Form:
- FREQ: Frequency is a measure of how often a wordform is encountered
in 1,000,000 presentations of text. It is based on the raw
frequency of the string in the modified CELEX database (see the
main page), and converted to frequency per million. If a string
has a frequency of 0, then it is not in the CELEX database.
Orthographic Neighborhood Statistics (Coltheart's N):
- Orth: This is the number of orthographic neighbors that a string has. An orthographic neighbor is defined as a word of the same length that differs from the original string by only one letter. For example, given the word 'cat', the words 'bat', 'fat', 'mat', 'cab', etc. are considered orthographic neighbors.
- Orth_F: This is the averaged frequency (per million) of the orthographic neighbors.
Constrained Unigram Statistics:
- N1_F: This is the averaged frequency (per million) of the constrained unigrams for the wordform. A constrained unigram is defined as a specific letter in a specific position, in a specific length of word. That is, the 'c' in 'cat' is considered the same 'c' as in 'cot', but not in 'act', or 'catch'.
- N1_C: This is a count of the number of wordforms that share the same constrained unigrams (analogous to a Coltheart N measure).
Constrained Bigram Statistics:
- N2_F: This is the averaged frequency (per
million) of the constrained bigrams for the wordform. A constrained
bigram is defined as a specific two letter combination (bigram) in a
specific position, in a specific length of word. That is, the 'ca' in
'cat' is considered the same 'ca' as in 'can', but not in 'act' ('ac'
is different than 'ca'), or 'catch'. Note that single letters do not
have bigram statistics, and therefore are listed as NA for this measure.
- N2_C:
This is a count of the number of wordforms that share the same constrained bigrams
(analogous to a Coltheart N measure).
Constrained Trigram Statistics:
- N3_F: This is the averaged frequency (per
million) of the constrained trigrams for the wordform. A constrained
trigram is defined as a specific three letter combination (trigram) in a
specific position, in a specific length of word. That is, the 'sta' in
'stage' is considered the same 'sta' as in 'staff', but is different from the trigram in 'stay' . Note that single letters and two
letter stings do not have trigram statistics, and therefore are listed
as NA for this measure.
- N3_C: This is a count of the number of wordforms
that share the same constrained trigrams (analogous to a Coltheart N measure).
Unconstrained Unigram Statistics:
- UN1_F: This is the averaged frequency (per million) of the unconstrained unigrams for the wordform. An unconstrained unigram is defined as a specific letter within a word, regardless of its position, or the wordlength.
- UN1_C: This is a count of the number of wordforms that share the same unigrams (analogous to a Coltheart N measure).
Unconstrained Bigram Statistics:
- UN2_F: This is the averaged frequency (per
million) of the constrained bigrams for the wordform. An unconstrained
bigram is defined as a specific two letter combination (bigram) within
a word, regardless of its position, or the wordlength. For example,
the 'ba' in 'bat' is considered the same as the 'ba' in 'tabasco'.
Note that single letters do not have bigram statistics, and therefore
are listed as NA for this measure.
- UN2_C:
This is a count of the number of wordforms that share the same bigrams
(analogous to a Coltheart N measure).
Unconstrained Trigram Statistics:
- UN3_F: This is the averaged frequency (per
million) of the constrained trigrams for the wordform. An
unconstrained trigram is defined as a specific three letter
combination (trigram) within a word, regardless of its position, or the
wordlength. For example, the trigram 'sta' in 'stand' is considered the same as the trigram 'sta' in 'standard' and in 'estate'. Note that single letters and two letter stings do not
have trigram statistics, and therefore are listed as NA for this
measure.
- UN3_C: This is a count of the number of wordforms
that share the same trigrams (analogous to a Coltheart N measure).