MCWord Definitions

Number of Letters:

Frequency of Orthographic Form:

FREQ: Frequency is a measure of how often a wordform is encountered in 1,000,000 presentations of text. It is based on the raw frequency of the string in the modified CELEX database (see the main page), and converted to frequency per million. If a string has a frequency of 0, then it is not in the CELEX database.

Orthographic Neighborhood Statistics (Coltheart's N):

Orth: This is the number of orthographic neighbors that a string has. An orthographic neighbor is defined as a word of the same length that differs from the original string by only one letter. For example, given the word 'cat', the words 'bat', 'fat', 'mat', 'cab', etc. are considered orthographic neighbors.
Orth_F: This is the averaged frequency (per million) of the orthographic neighbors.

Constrained Unigram Statistics:

N1_F: This is the averaged frequency (per million) of the constrained unigrams for the wordform. A constrained unigram is defined as a specific letter in a specific position, in a specific length of word. That is, the 'c' in 'cat' is considered the same 'c' as in 'cot', but not in 'act', or 'catch'.
N1_C: This is a count of the number of wordforms that share the same constrained unigrams (analogous to a Coltheart N measure).

Constrained Bigram Statistics:

N2_F: This is the averaged frequency (per million) of the constrained bigrams for the wordform. A constrained bigram is defined as a specific two letter combination (bigram) in a specific position, in a specific length of word. That is, the 'ca' in 'cat' is considered the same 'ca' as in 'can', but not in 'act' ('ac' is different than 'ca'), or 'catch'. Note that single letters do not have bigram statistics, and therefore are listed as NA for this measure.
N2_C: This is a count of the number of wordforms that share the same constrained bigrams (analogous to a Coltheart N measure).

Constrained Trigram Statistics:

N3_F: This is the averaged frequency (per million) of the constrained trigrams for the wordform. A constrained trigram is defined as a specific three letter combination (trigram) in a specific position, in a specific length of word. That is, the 'sta' in 'stage' is considered the same 'sta' as in 'staff', but is different from the trigram in 'stay' . Note that single letters and two letter stings do not have trigram statistics, and therefore are listed as NA for this measure.
N3_C: This is a count of the number of wordforms that share the same constrained trigrams (analogous to a Coltheart N measure).

Unconstrained Unigram Statistics:

UN1_F: This is the averaged frequency (per million) of the unconstrained unigrams for the wordform. An unconstrained unigram is defined as a specific letter within a word, regardless of its position, or the wordlength.
UN1_C: This is a count of the number of wordforms that share the same unigrams (analogous to a Coltheart N measure).

Unconstrained Bigram Statistics:

UN2_F: This is the averaged frequency (per million) of the constrained bigrams for the wordform. An unconstrained bigram is defined as a specific two letter combination (bigram) within a word, regardless of its position, or the wordlength. For example, the 'ba' in 'bat' is considered the same as the 'ba' in 'tabasco'. Note that single letters do not have bigram statistics, and therefore are listed as NA for this measure.
UN2_C: This is a count of the number of wordforms that share the same bigrams (analogous to a Coltheart N measure).

Unconstrained Trigram Statistics:

UN3_F: This is the averaged frequency (per million) of the constrained trigrams for the wordform. An unconstrained trigram is defined as a specific three letter combination (trigram) within a word, regardless of its position, or the wordlength. For example, the trigram 'sta' in 'stand' is considered the same as the trigram 'sta' in 'standard' and in 'estate'. Note that single letters and two letter stings do not have trigram statistics, and therefore are listed as NA for this measure.
UN3_C: This is a count of the number of wordforms that share the same trigrams (analogous to a Coltheart N measure).