Presented at the 38th Annual Conference of the International Military Testing Association (IMTA), 12 - 14 November 1996, Gunter Sheraton Hotel, San Antonio, Texas; co-hosted by the Air Force Personnel Center, Armstrong Laboratory/Human Resources Directorate, and the Air Force Occupational Measurement Squadron.

Validation of a Procedure for Clustering Expressions of Frequency and Amount

William J. Phalen
Institute for Job & Occupational Analysis

Robert M. Yadrick
Technical Training Research Division
Armstrong Laboratory Human Resources Directorate

Walter G. Albert
Manpower and Personnel Research Division
Armstrong Laboratory Human Resources Directorate

This paper is the third in a series of reports concerned with the scaling of words and phrases expressing qualitative levels of "frequency" and "amount." The initial paper by Yadrick, et al. (1993) replicated a study by Bass, Cascio, and O'Connor (1974) which used a magnitude estimation procedure to scale 39 expressions of frequency and 44 expressions of amount. This study, however, used Air Force basic trainees to provide the estimates. The resultant scaling of expressions was quite similar to that of Bass, et al., even though the rater populations were quite different. This suggested that there are commonly shared perceptions regarding the weightiness of such expressions, independent of the samples which provided the estimates. Nine expressions of amount were selected to constitute an equal interval scale, based on their magnitude-estimated weights. These were tested in 40 computer administered occupational surveys to 160 cases and were found to produce more valid and reliable results than the traditional nine-point relative time spent scale, as reported in Albert, et al. (1994).

The next study by Yadrick, et al. (1994) described the application of a new, univariate procedure for clustering the weighted expressions into groups of equivalent (or synonymous) expressions. It appeared to the researchers that the expressions within each group were sufficiently synonymous with one another that a single expression might be picked from each group to represent an equal interval scale with the optimal number of points. It remained to be determined whether the divisions or cut points suggested by the groups in the cluster solution had captured the "psychologically real" levels of a frequency or amount scale embedded in the ordered lists of magnitude-estimated expressions. Therefore, a validation study was undertaken to determine the degree of correspondence between mathematically defined cluster groups and perceptually defined, or psychologically real, groups of expressions. This paper will report on the development and application of the criterion measure used to validate the univariate clustering procedure.

The Clustering Procedure

A special measure of "equivalence" was designed in the previous study to describe the similarity of expressions and to cluster them into homogeneous groups along a single dimension ("frequency" or "amount"). Since the dimensional values for each expression were derived by a ratio measurement technique (magnitude estimation), equivalence was computed as a ratio-based measure with exponential magnification of ratio differences to accentuate dividing lines between nonequivalent sets of equivalent expressions. In the basic equation, the pairwise ratios are converted to logarithms and the logarithms are summed algebraically so that positive logs (ratios > 1.0) and negative logs (ratios < 1.0) representing equal ratio differences from log 1.0 = 0 cancel each other (compensatory effect). Thus, two raters who disagree as to which of two expressions is greater or less will negate each other's estimates. This feature is contrary to standard similarity or overlap measures, which treat all differences as representing dissimilarity (noncompensatory effect). A detailed description of the "equivalence" equation with example computations can be obtained from the senior author.

In this study, the equivalence equation was applied first to determine the equivalence of the top two expressions in the list, i.e., expressions "1" and "2" on the lefthand side of Table 1 for "frequency" and likewise on the lefthand side of Table 2 for "amount". If the equivalence value exceeded 80.00%, then the equivalence of expressions "1" and "3," then "1" and "4," etc. was computed, with expression "1" repeatedly being used as the target expression, until the equivalence value fell below 80.00%. At this point, the set of expressions falling within a minimum linkage of 80.00% was selected as the first group of expressions representing the highest level of the scale. Thus, in Table 1 (lefthand side) expressions "1" through "3" formed the first group of equivalent frequency expressions with a minimum equivalence (between expressions "1" and "3") of 83.81%. The next target was expression "4." Expressions "4" and "5" were compared, "4" and "6," etc. until "4" and "10" yielded the lowest acceptable equivalence of 80.66%. The remaining groups were formed in a similar manner, resulting in 12 "frequency" groups (or scale levels) and 13 "amount" groups (or scale levels). The minimum equivalence values are reported to the right of each group.

Column 1 of Tables 1 and 2 also shows the magnitude-estimated weights for each expression, as derived by Bass, et. al. Although the groups of expressions generated by this clustering procedure seemed to be very reasonable in the judgment of the authors, it could not be assumed that the groupings represented psychologically real divisions without comparing them against a criterion of psychologically derived groups of expressions.

The Criterion

A survey was developed that contained the ordered lists of frequency and amount expressions (with their weights deleted) in four different versions that presented frequency expressions first, followed by amount expressions, or vice versa, and presented the expressions in high to low order ("Always" to "Never," and "All" to "None"), or vice versa. The instructions asked respondents to follow a procedure that was analogous to what was done in the clustering procedure, but substituting his or her psychological or perceptual estimates of equivalence in place of our mathematical calculations of equivalence. More specifically, each respondent was asked to begin with expression "1" on the list, which was already circled, as the first target expression, and to compare expression "2" with it. If expression "2" appeared to be pretty much equivalent in meaning to expression "1," the respondent would proceed to compare expression "3" to expression "1," then "4" to "1" and so forth, until reaching an expression that did not appear to be reasonably synonymous or equivalent to expression "1." The respondent would then circle this expression as the next target expression against which to compare the expressions following it. This procedure would continue until the entire list was evaluated. The result would be a set of psychologically derived groups of expressions based on the perceptions of that respondent. In a completed survey, each group of expressions could be identified as beginning with a circled expression and ending with the expression immediately preceding the next circled expression.

A sample of 42 respondents, consisting of behavioral scientists and clerical workers at the Armstrong Laboratory and at three contractor offices provided responses. Approximately equal numbers of each version of the survey were completed to provide the desired counterbalancing.

The groups identified by the 42 respondents were consolidated into a matrix whose rows and columns indicated the number of times each expression was selected as a "beginning" or "ending" expression, respectively. Evaluation of the row and column totals made it possible to select the set of groups which provided the best overall fit of the individual respondent data. The resultant sets of psychologically defined groups based on the perceptual judgments of 42 respondents are shown in the righthand portion of Table 1 for the "frequency" expressions and in the righthand portion of Table 2 for the "amount" expressions. The minimum equivalence values for the psychologically defined groups are also reported as additional points of comparison with the mathematically defined groups.

Frequency Results

Table 1 clearly shows a fairly high degree of correspondence between the mathematical versus psychological clustering of expressions of frequency. Three groups are identical, and three additional groups share one boundary in common. Although the mathematical clustering defined 12 groups vs 8 groups for the psychological clustering, three of the mathematical groups are single expressions which abutted sharp changes in the magnitude-estimated weights.

It might be argued that the expressions constituting the singleton groups should be dropped, especially "seldom," which is clearly out of place. This would reduce the number of mathematically defined groups to nine, while the psychologically defined groups would remain at eight. This would also increase the correspondence between the two clusterings. A major point of difference involves expressions "28" through "35" (if "seldom" is eliminated). The mathematical clustering separated this set of expressions into three groups, while the psychological clustering considered the set to be one group. In this case, the psychological grouping makes more sense. However, it should be noted that the minimum equivalence for this group (expressions "28" through "35") is only 0.19, which is to say that there is an extremely large ratio difference between a weight of 4.72 for "very seldom" and a weight of .33 for "seldom." If "seldom" is dropped, the minimum equivalence for the group would jump to 48.84, which is still low. Overall, the correspondence between the mathematical and the psychological clustering is reasonably good, especially considering some of the questionable weighting of expressions.

Amount Results

As is evident from examining Table 2, the correspondence between the mathematical and the psychological clustering is not as clear for expressions of amount. The greatest difference is at the very top, where the first four expressions in the psychological clustering constitute four singleton groups. It appears that the respondents felt that "all" was significantly more inclusive than "an exhaustive amount." However, "almost entirely" is certainly not "exhaustive" and so had to be separated out. Then comes "completely," which seems suspiciously like "all" (if adverbs can be like adjectives) and so "completely" had to be separated from "almost entirely." The problem encountered here is one of context. The raters who made the magnitude estimates rated each expression separately, without seeing how the expressions would ultimately be ordered when listed together; whereas, the respondents in our study had no choice but to follow the order in which the expressions were listed when they defined the group boundaries for equivalent expressions. Again, the mathematical clustering defined more groups in the mid- and low-ranges than the psychological clustering. The psychological clustering seems a bit stretched in putting "a lot" in the same group as "a moderate amount." It is harder to quibble with the psychological group that includes expressions "34" through "43."

In the mathematical clustering, the singleton group containing "a limited amount" should be dropped. It is another one of those fuzzy expressions that is hard to define clearly, since all amounts other than "all" are "limited" amounts. Where the mathematical and psychological groups do not correspond, it would appear that the mathematical grouping at the upper end of the scaled list is superior, but the psychological grouping at the lower end is superior.


It would appear that there is sufficient evidence that the mathematical clustering procedure used in this study does a reasonably good job of clustering expressions of frequency and amount into groups of equivalent expressions representing psychologically real levels of frequency and amount. As time permits, a more precise validation study is planned to perform a statistical test of correspondence between the mathematical and the psychological clustering solutions. This test will consist of a t-test between the mean correlations (as represented by Fisher Z's) of the 42 respondents' individual groupings with the mathematically defined groups and the psychologically defined groups. Our hypothesis is that the mean correlations for the mathematical groups will be lower, but not significantly lower, than the mean correlations for the psychological groups. Perhaps, too, the study will be replicated after removing all ambiguous and controversially weighted expressions from the two lists.


Albert, W.G., Phalen, W.J., Selander, D.M., Dittmar, M.J., Tucker, D.L., Hand, D.K., Weissmuller, J.J. & Rouse, I.F. (1994, October). Large-scale laboratory test of occupational survey software and scaling procedures. In the symposium, Bennett, W. Jr., Chair, Training needs assessment and occupational measurement: Advances from recent research. Proceedings of the 36th Annual Conference of the International Military Testing Association. Rotterdam, The Netherlands: European Members of the IMTA.

Bass, B.M., Cascio, W.F., & O'Connor, E.J. (1974). Magnitude estimations of expressions of frequency and amount. Journal of Applied Psychology, 59, 313-320.

Yadrick, R.M., Phalen, W.J, Albert, W.A., Dittmar, M.J., Weissmuller, J.J., & Hand, D.K. (1994, October). Clustering of magnitude estimations of frequency and amount of time. Proceedings of the 36th Annual Conference of the International Military Testing Association. Rotterdam, The Netherlands: European Members of the IMTA.

Yadrick, R.M., Phalen, W. J., Albert, W.G., Dittmar, M.J., Weissmuller, J.J., & Hand, D.K. (1993, November). Magnitude estimations of frequency and amount of time. Paper represented at the 35th Annual Conference of the International Military Testing Association. Williamsburg, VA: U. S. Coast Guard.

Back to the IJOA home page