It is interesting to note how usability testing methods have evolved over the years. In July of 1970 at Bell Laboratories (AT&T’s Bell Labs) in Piscataway, New Jersey, we began conducting usability tests on new business information systems. These were legacy AT&T office systems that were being converted from being purely manual systems to becoming computer-based systems.
In these tests, participants completed task scenarios while using a keyboard and viewing a CRT screen (we were not yet using a mouse). Our first tests were conducted in rented hotel rooms, but we soon migrated to using on-site conference rooms. Finally, in early 1972 we moved into a new 11-story office building where we constructed our own usability lab on the top floor – complete with a one-way mirror, and an overhead camera.
In these early ‘hotel room’ usability tests, participants were tested one at a time with a usability practitioner sitting next to them recording comments, success rates, and the time to complete each task (using a stopwatch). In the beginning tests there were no real-time observers, but later we allowed developers to sit quietly in the room behind the participant. From the beginning, each session was videotaped, and the tapes were viewed afterwards as a report was prepared. We used one camera pointed at the participant’s head and hands, and the monitor.
My first article on this testing methodology was presented at the 1972 Human Factors Society annual conference*. In this paper, I described the process that we had been using at Bell Laboratories to conduct usability tests.
In the years following, we continued to conduct usability tests in more sophisticated test facilities. All of the new facilities had one-way mirrors to accommodate observers. Typically, these early usability tests consisted of one-hour testing sessions in which participants would perform a series of tasks, while thinking aloud.
During most of these testing sessions, participants were allowed to take as long as they needed to complete each task scenario, while the usability test facilitator observed and took notes. The facilitator typically recorded comments made by participants, and took notes about interesting user behaviors. Much of this early usability testing focused simply on determining if three or four participants were able to complete the tasks.
Most of these early tests were so qualitative in nature, that they actually resembled live ‘expert review’ sessions of the new systems. Even so, the resulting usability reports made many suggestions for improvements. By today’s standards, these tests were very ‘soft’, and the test sessions were difficult to repeat or replicate, making it almost impossible to conduct valid and meaningful retests.
Even so, these tests probably did help to improve the early user interfaces. However, most of the improvements probably were the result of experienced usability practitioners making skilled observations and recommendations during the tests, as opposed to any detailed analyses of quantitative results.
Probably the first useful quantitative information was related to the time it took participants to complete the scenarios. I remember one situation in 1971 where a system developer submitted a new system to be tested. We created scenarios, and conducted the test – collecting, among other data, the time to complete each scenario. His supervisor then told him to make all of our recommended changes. In other words, to make all recommended changes that would reduce the time necessary to complete the new compute-based tasks.
In a couple of months, he resubmitted the material to be re-tested. We used the same scenarios to conduct the re-test. We found that the time required by participants to complete the scenarios in the re-test was not reliably different than in the first test – it was almost exactly the same as before. We were disappointed with the outcome, and took the time necessary to determine exactly what changes the developer had made when making revisions. Our detailed evaluation revealed that during the two-month re-design period, he had not changed anything. We found out later that he had worked on other projects, and then just re-submitted the original system for testing.
*Testing manual procedures in computer‑based business information systems, Proceedings of the 16th Annual Meeting of the Human Factors Society, October, 1972.