Presentation to the 35th Annual Conference of the Military Testing Association, Williamsburg, VA, November 1993

New Approaches for Increasing Information Value of Individual Response Data

William J. Phalen, San Antonio, Texas

Jimmy L. Mitchell, McDonnell Douglas Aerospace


Response data elicited from job incumbents and subject-matter experts (SMEs) involve comparative or absolute judgments of stimuli. The weakness of comparative judgments is that the number, order, and contextual relationships of the stimuli affect respondents differently as the scaling process progresses, resulting in an overall reduction in the information value of the response data. Computer-based techniques permit tailoring the scaling to the individual respondent in terms of the presentation of stimuli and the feedback of reconfigured or interpreted response data. As for absolute judgments of task stimuli, errors of judgment are much more apparent than for comparative judgments. Although viewed by some as a serious credibility problem, it is probably a plus in a computer-based survey environment. Potential errors of judgment are readily detectable and immediate feedback mechanisms have been devised to alert and assist respondents in identifying and correcting these errors. There are processes for inferring micro-level relationships from macro-level response data (e.g., linking knowledges, skills, and abilities [KSAs] to tasks when this information was provided only at the job level). Inferential networking processes are also available for deriving multiple layers of indirect similarity among weapon systems with reference to the tasks performed upon them, as well as a short-hand, categorical procedure for simultaneous comparison of multiple weapon systems at the task level.


The responses of individual job incumbents are the lifeblood of occupational analysis and research. The military analyst and researcher have heretofore been blessed with the luxury of large samples from which to draw data, and the law of large numbers has mercifully protected them from the need to extract as much information per response as possible from every job incumbent or subject-matter expert surveyed. Noisy data resulting from job inventories of varying size and quality, suboptimal scaling procedures, and uncontrolled conditions of self-administration has, nevertheless, yielded meaningful results when the data are aggregated into large groups of respondents. At the individual respondent level, it has been shown that there is considerable difference in the responses of job incumbents on two occasions two to three weeks apart, both in terms of tasks selected and relative time spent ratings of tasks selected on both occasions (Albert & Phalen, 1993; Cragun & McCormick, 1967), and that the relative time spent scale seems to yield little or no practically useable information over and above the incumbent's selection of tasks performed (Pass and Robertson, 1978). At the aggregate level, although job descriptions for large groups of job incumbents have displayed split-half reliabilities in the .90's (Christal, 1971) and good stability of large job types and clusters across two surveys several years apart (Driskill & Bower, 1978), it was demonstrated in one study that job types of less than 20 cases were not replicable when two equivalent samples of 1600 cases of accounting and finance specialists were analyzed and many cases were excluded from the set of identifiable job types (Watson, 1974). The luxury of large samples may become a thing of the past now that downsizing of the military is taking place, defense funds are being cut, and the workforce of analysts and researchers is being forced to do more with fewer resources. Now is an opportune time, indeed, a critical time, to reassess how we analysts and researchers do business. We must find ways to get more bang for the buck from out data, so that smaller amounts of data will serve use better in the future than larger amounts have in the past, if we hope to provide answers to the big MPT questions that continue to confront us.

This paper will suggest a two-pronged approach: (1) how to get better data from individual job incumbents, which involves new PC-based computer-adaptive scaling technologies; and (2) how to extract more information from the data that is currently being collected by developing more sophisticated data linkage technologies such that second, third, and higher order linkages can be inferred from the first-order linkages existing in the original data set. Some of these ideas were previously outline at last year's MTA conference (Mitchell, Phalen, & Hand, 1992) and at the recent Eighth International Occupational Analyst Workshop (Phalen & Mitchell, 1993). Today, we will first address the issue of new PC-based survey technologies. We will discuss the potential advantages of the process as discernible at this point in time, without getting into the details of the research being conducted at the Air Force's Armstrong Laboratory. Specific aspects of this research are being reported in another paper at this conference (Albert, et al., 1993), as well as in an associated demonstration of the software.

PC-Based Occupational Survey Technology

The first major advantage of the computer-based survey approach is that is saves and great deal of time and money by eliminating the printing and mailing out of survey booklets, the return mailing of the booklets, and the transferral of response data from job inventories to an electronic medium. Job inventories can be transmitted electronically to PC's at survey locations, and responses are entered on the PC by job incumbents, transmitted with virtually no delay from the PC to a base computer, and from the base computer to a computer at the location where the analysis is performed. This kind of setup also facilitates spot surveys to conduct focused analyses of specific occupational problems or changes occurring at selected locations, thus extending the time before another full-scale job analysis will be needed.

The second major advantage of the PC-based survey approach is that it can tailor the presentation of tasks to the respondent. This entails using discriminant information about the know co-performance of tasks and other items, such as weapons system, equipment, and tasks, so as to limit the presentation of tasks to those tasks that have some likelihood of being performed by a respondent, given his or her responses to previously presented tasks, and to prioritize the presentation of tasks according to their likelihood of being performed so as to ensure maximum attention before mental fatigue sets in. Thus, much larger survey instruments containing thousands of tasks can be developed and loaded on the computer without exceeding the capacity of any one individual to respond effectively.

The third major advantage of the PC-based survey approach is that potentially better scaling techniques than those available in paper-and-pencil surveys can be used. Thus, in the Armstrong Laboratory's research, the nine-point relative time spent scale, in which tasks which received the same rating can be fed back to the respondent for review to see if any tasks are miscategorized and move them to a higher or lower rating category. This procedure serves to overcome rating errors due to context effects and a changing frame of reference regarding comparison of each tasks to "all other tasks performed." A third stage allows each of the nine rating categories to be further subdivided into two or three categories, thus extending the range of the scale, which currently has too low a ceiling or too high a basement for some tasks in order to represent the desired percent of time spent. Graphical scales are also possible, with responses being entered by moving a cursor along a horizontal line of 80 points enclosed at each end by the individual's highest and lowest time spent tasks. But the ability to provide scales which estimate total time spent on tasks in absolute terms is emerging as the potentially most important scaling contribution of the PC-based approach. The procedure used first decomposes the concept of "total time spent" into is constituent dimensions: "frequency of performance" and "time it usually tasks for a single performance of a task," and permits the respondent to estimate these values by stating numerical values of his or her own choosing, rather than using the limited response levels of a seven- or nine-point scale. The constituent dimensions are themselves very useful, especially the "time it usually takes for a single performance of a task," which can be useful in setting training standards, manpower requirements, and productive capacity of individuals, as well as being helpful in identifying differences in performance times for individuals, skill levels, and different weapon systems. The dimensional measures are converted to a common numeric base by the computer and cross-multiplied to indicate total hours of time spent per week, per month, and per year by the respondent. The advantages of the absolute time estimation procedure are: (1) frequency and time to perform estimation can be accomplished for each task without the need to compare it with all other tasks (less susceptible to context, range, and changing reference effects); (2) a much wider range of responses is possible, such that respondents can give tasks their "true" weight relative to all other tasks performed; and (3) it is much easier for excessively high responses (which is the general tendency) to be detected and feedback given to the respondent to re-evaluate his or her response as soon as the estimation is made, since it has been found that respondents will take time to reevaluate a single tasks, when brought to their attention, but are reluctant to make changes at the conclusion of the estimation (or rating) exercise when their job description containing all tasks is fed back to them for evaluation and editing of responses.

Absolute time estimation has one disadvantage: a tendency toward overestimation of total time spent across all tasks. However, the feedback mechanisms that have been built in to the computer-administered survey system tend to alleviate a good bit of this problem. The average overestimation for respondents who overestimate appears to be in the neighborhood of 100 to 400 hours per year, which seems large until you compute the average overestimation per task performed, an amount of one to two hours per year on the average, which is not bad at all. The current-used relative time spent scale was chosen because of this tendency in absolute time estimation. However, this did not solve the problem. It merely masked it by using numbers that could not be judged against real time, and these numbers were converted to percentages, because they too did not add up to any meaningful number. For that matter, absolute time values for each task would be converted to a percentage of the total absolute time and would probably result in percentages with more realistic values for the highest and lowest time spent tasks than the relative scale procedures, but the value of the individual absolute time estimate for each task is too valuable itself, in spite of its small amount of overestimation error, to be transformed in any way that makes it no longer interpretable as absolute time for the dubious advantage of making the total time value more realistic.

Efforts to compare the reliability and validity of the PC-based procedures for gathering occupational data with the paper-and-pencil procedure are planned, to determine which approach obtains more information and less noise per response. If, as anticipated, the PC-based procedure yields more and better information in less time per respondent, then smaller samples of job incumbents can be expected to yield results comparable tot hose of the large samples being used currently. Thus, for example, job types containing as few as five to ten members may become as replicable as the 20-member job types of the past. At the same time, the job analysis process may become simpler and faster, because the cluster-ing process will yield more distinct job types including a higher percentage of the sample.

Data Linkage Approaches

There are two data linkage approaches which we have examined which are designed to extract more information from data sets at less cost. The first approach is a KSA-to-task or equipment-to-task linkage system which does not require SMEs or job incumbents to individually assess the linkage of every KSA or item of equipment to each task in order to determine is a given KSA is required for the performance of a given task (i.e., direct linkage: requires NTasks x NKSAs to be compared). Rather, this system is able to determine whether a KSA or item of equipment is required for a task by simply having job incumbents rate the tasks they perform in their job and indicate the extent to which they use each KSA relevant to their jobs (indirect linkage: requires only the rating of NTasks + NKSAs). Since each job incumbent is rating tasks, KSAs, and equipment, certain statements can be made concerning the relationships between these ratings if the hypothesis that, for example, a task (Ti) and a knowledge (Ki) are indeed linked, i.e., Ki is required to perform Ti:

(1) There should be a high correlation between a given Ti and a given Ki; across cases for those cases which have a non-zero value for Ti and Ki or for Ti only. (Cases which have zeros for both Ti and Ki can be ignored, as well as cases which have a zero for Ti and a non-zero for Ki. In other words, we are concerned only with those cases where the task is performed and the knowledge is used [supports knowledge requirement], or the task is performed but the knowledge is not used [indicates no knowledge requirement]. The correlation should be tested for r = 0 and the probability associated with the T-value should be converted to Z).

(2) There should be a significant difference between the mean of the Ki ratings associated with the cases that perform Ti (Xa) and the mean of the Ki for those cases that do not perform Ti (Xb). Xa should be significantly higher than Xb. The probability associated with the T-value should be converted to Z.

(3) A Chi-square test in which the number of Ti = 1 and Ki = 1 (1,1) values are tested against the expected value of (Ti=1 x Ki=1)/N and the actual value must be greater than the expected value and a test of the number of Ti = 1 and Ki = 0 (1,0) values against the expected value (Ti=1 x Ki=0)/N and the actual value must be less than the expected value. A Chi-square value in the wrong direction should be treated as if it were a negative Chi-square value and the two Chi-square values summed accordingly. The resulting Chi-square value should be converted to Z (i.e., Z equals the square root of the Chi-square value) but retains the sign of the Chi-square value.

The extent of the linkage between Ti and Ki may be represented by one or more of the Z values computed above (depending on the nature of the data) and a Z-value should be computed to represent the extent of linkage of every KSA to every task. The KSAs should be rank ordered accordingly, from highest positive Z to lowest negative Z, under each task. The highest Ki values are the most likely Kis required for a task if they have a value equal to or greater than 2.00. The validity of the rank ordering can be validated by having SMEs verify the task-to-knowledge linkages for a number of tasks. The process can only work if some cases use the Ki and some do not. However, if everyone or no one uses a given Ki, then it is not important, in any practical application, to be able to establish a precise linkage of the Ki to a task, since it is either a requirement for every job or a requirement for no job. The profile of Zi values for each task across Kis will then permit the clustering of tasks into task modules having similar profiles of knowledge requirements, something much to be desired for a training decision system or a job restructuring system. Transpose the Ti-Ki matrix of Zis, and knowledges can be clustered into knowledge modules according to the similarity of their profiles by usage on tasks.

At this point there a number of ways task modules clustered on commonality of knowledges could easily be linked to the appropriate knowledge clusters, although evaluating the averages of Zis which link all Tis in a task module to all Kis in a knowledge module would be the most straight forward.

In summary, the described task-to-KSA linkage approach drastically cuts the number of ratings needed from SMEs or job incumbents (NTasks + NKSAs versus NTasks x NKSAs) and can lead to numerous other useful products, such as knowledge-based task modules, knowledge modules, and the linkage. The same procedure would, of course, apply to the linking of equipment to tasks and to knowledges--a three-way linkage. To carry this a step further, tasks could then be clustered into modules based on commonality of knowledges and equip-ment. The possibilities go on and on, if other factors a brought in (e.g., weapon systems).

A second data linkage approach is one which is designed to link multiple weapon systems to task or knowledges in terms of whether the task or knowledge applies to each of the systems and how similar the task or knowledge requirement is from one weapon system to another. The ultimate use of the procedure would be the ex post facto creation of a weapon system-specific job inventory from a standard inventory and its file of responses.

This approach is best described in steps:

Step 1. Obtain ratings of weapon system familiarity and usage in present job from job incumbents when they fill out the background section of a standard job inventory.

Step 2. Select up to 10 raters from the job inventory data base who indicated moderate to high familiarity with two or more weapon systems and send these raters another job inventory booklet together with a list of the weapons systems they are familiar with, the systems being coded "A", "B", ... "F" ( a maximum of six systems).

Step 3. In a space following each task, have the raters indicate whether the task applies to one or more of the listed weapon systems and whether it is essentially the same task for each weapon system by using the following rating scheme: (a) If the task applies to one or more systems, list the appropriate codes (e.g., "A", "B", ... "F"); (b) if the task is essentially the same for two or more weapon systems, write the codes side-by-side (e.g., ACE); (c) if the task differs for two or more weapon systems, place a hyphen or dash between the codes representing those systems (e.g., ACE-BD-F, which indicates the task is essentially the same for weapon systems A,C, & E, and for systems B & D, but the task as performed on weapons systems A, C, & E is different from how it is performed on B & D, and the task for system F is different from that on the other five systems); (d) if a task is not performed on one or more of the listed systems, the codes for those systems should not be listed.

Step 4. When the codes are data entered (i.e., letters and dashes in the order listed by the rater next to each task) together with the cross-referenced of codes to weapon systems used by that rater, another piece of software will compute the N (N +1)/2 binary comparisons inherent in the coding (including the comparison of each weapon system with itself), with "+1" indicating similarity, "-1" indicating dissimilarity, and 0 indicating no information. For example, for a task with the coding ACE-BD-F, there would be 6(7)/2 = 21 binary comparisons, as follows:

			AA = 1  BB = 1  CC = 1  DD = 1
AB = 1 BC = -1 CD = -1 DE = -1
AC = 1 BD = 1 CE = 1 DF = -1
AD = -1 DE = -1 CF = -1
AE = 1 BF = -1
AF = -1

Had one asked for similarity ratings for the individual pairs, 21 separate ratings would have had to be generated, a grueling undertaking if hundreds of tasks are to be rated. Yet the shorthand coding (i.e., ACE-BD-F) contains all this binary information in a compact form.

Step 5. Create a weapons system X weapon system matrix for each task and insert the binary comparisons for every rater in the matrix. If a square matrix is used, then the binary value inserted in the R x C cell will also be inserted in the C x R cell (e.g., AB = BA = 1) resulting in 62 = 36 pairwise values per task per rater.

Step 6. Use transitive connectivity algorithm to infer as many layers of indirect, nonredundant comparisons as possible. For example, if one rater said that a task is essentially the same for weapons systems A & B (zero order connection), and another rater stated that the task is essentially the same for systems B & C, then the inference can be made that the task is essentially the same for weapon systems A & C (first order connection, because A is connected to C through one intervening connector, B). If still another rater states that the task is essentially the same for weapons system C & D, then A & D are connected by a second order connection (through B & C); and so on. The number of indirect connections could easily exceed the number of direct connections by a factor of ten in some cases. Thus, once the coded task have been expanded to n(n + 1)/2 binary relationships, the further delineation of relationships can be extended still further by means of a transitive connectivity algorithm.

Step 7. Computer a task-by-task measure of the extent to which each task is performed on each weapon system (from diagonal of final matrix) and a task-by-task similarity measure for all possible pairs of weapon systems as to the extent to which a task is similar across weapon systems. Note that, although similarity began as a binary measure, it will become a continuous measure as more and more raters agree or disagree on whether it is essentially the same task for two weapon systems.

Step 8. Decompose each task in the original task list into as many tasks as are required to accommodate dissimilarities among the weapon systems upon which it is performed. For example, if it is concluded that the task "Inspect O-rings" is essentially the same task when performed on systems A, C, & E, but different on B & D, and different also on F, this task should be decomposed into three tasks: (1) Inspect O-rings on weapon systems A,C, & E; (2) Inspect O-rings on systems B & D; and (3) Inspect O-rings on system F.

Step 9. Each job incumbent's time spent values on the original tasks should be subdivided among the decomposed tasks according to the time spent ratings he or she gave each weapon system in the background section of the survey. Thus, if the rater's total time spend on the task "Inspect O-rings" was 8% and her or she had rated weapons Systems A = 4, B = 6, C = 5, D = 4, E = 6, and F = 7, the weapon system specific task "Inspect O-rings on systems A, C, & E" should be allocated (4+6+6)/(4+6+5+4+6+7) 8% = .50 (8%) = 4% time spent for that rater, etc.

It is now possible to determine what tasks are performed on which weapons systems, which tasks are essentially the same for two or more weapon systems, how much total time is being spend per weapon system, which job incumbents have current experiences on tasks relevant for a certain weapon system (even if he or she has never worked on that system) and which tasks might be similar on a new weapon system, given its known similarity to existing weapon systems.


We would like to emphasize that the approaches described in this paper for increasing the information value of occupational survey response data, and even creating new information from exisitng data, are only the tip of the iceberg. Other opportunities should be explored, especially in areas of asymmetric multidimension, and adaptive clustering, which will perhaps, best be described in another paper at later time. The rapidly increasing power of the computer, and reasonable cost of highly capable machines, now makes previously unthinkable approaches to data gathering and analysis of much more sophisticated data not only thinkable, but eminently doable.


Albert, W.G., Rouse, I.F., Selander, D.M., Yadrick, R.M., Phalen, W.J., Weissmuller, J.J., Dittmar, M.J., & Tucker. D.L. (1993, November). Development and test of compter-administered survey software. Proceedings of the 35th Annual Conference of the Military Testing Association. Williamsburg, VA: U.S. Coast Guard Headquarters.

Albert, W.G., & Phalen, W.J. (1991, May). Statistical analysis of automated survey data collected from chapel management specialists (AFS 893X0). Proceedings of the Seventh International Occupational Analysts Workshop. Randolph AFB, TX: USAF Occupational Measurement Squadron.

Cragun, J. R., & McCormick, E. J. (1967). Job inventory information: Task and scale reliabilities and scale interrelationships. PRL-TR-67-15. Lackland AFB TX: Personnel Research Laboratory.

Driskill, W.E., & Bower, F.B., Jr. (1978). The stability over time of Air Force enlisted career ladders as observed in occupational survey reports. Proceedings of the 20th Annual Conference of the Military Testing Association. Oklahoma City, OK: U.S. Coast Guard Institute.

Mitchell, J.L., Phalen, W.J., & Hand, D.K. (1992, October). Multilevel occupational analysis: Hierarchies of tasks, modules, jobs, and specialties. In the symposium, Organizational analysis issues in the military (H.W. Ruck, chair). Proceedings of the 34th Annual Conference of the Military Testing Association. San Diego, CA: Navy Personnel Research & Development Center.

Pass, J.J., & Robertson, D. (1978). Sample size and stability of task analysis inventory response scales. Proceedings of the 20 Annual Conference of the Military Testing Association. Oklahoma City, OK: U.S. Coast Guard Institute.

Phalen, W.J., & Mitchell, J.L. (1993, June). Innovations in occupational measurement technology for the U.S. Military. In the symposium, Military Occupational Analysis: Issues and Advances in Research and Application (H.W. Ruck, chair). Proceedings of the Eighth International Occupational Analysts Workshop, San Antonio, TX; USAF Occupational Measurement Squadron.

Watson, W.J. (1974, February). The similarity of job types reported from two independent analyses of occupational data, AFHRL-TR-73-58, AD-776 777. Lackland AFB, TX: Occupational Research Division, Air Force Human Resources Laboratory.

Back to the IJOA home page