Once the problems were better understood, and several solutions had been proposed, we created several competing wireframes to see which would best elicit the success levels we were seeking. All of these eventually were combined into three homepage wireframes (A, B, and C). After only one test, B was eliminated and the final two wireframes (A and C) were tested ‘head-to-head’.
We had 65 participants attempt to complete 136 scenarios (68 using Wireframe A and 68 using Wireframe C). Each participant spent about one hour completing the scenarios using Wireframe A and then Wireframe C, or Wireframe C then Wireframe A.
We used a FirstClick testing methodology where we collected and analyzed only the first click they made after reading the scenario. In previous testing, we had observed that the first click was a very critical click. If they had difficulty making that original decision they frequently had problems finding the correct answer.
This type of testing enabled us to considerably expand the number of scenarios (each scenario took participants less than 30 seconds), and to see which of the two wireframes elicited the best initial performance. The test was conducted using Bailey’s Usability Testing Environment (UTE) and Techsmith’s Morae.
The test results showed no reliable difference between the two wireframes in terms of success, but Wireframe C did elicit reliably faster performance.
The following two figures show the percent of people that clicked first on each of the links for two different scenarios. Those in green were correct clicks, and those in red were incorrect clicks. We were particularly interested in two different response patterns. The following figure shows a scenario where a fairly large number of participants tended to agree on the wrong response, i.e., the wrong link. This means that there is something about that link that erroneously elicits clicks – while the correct response does not elicit the clicks. This usually can be fixed by changing the link names.
The following figure shows a different pattern of responses. In this case, few people could agree on the correct response, and even fewer on the incorrect response. Those making wrong responses showed little consistency in their responses. This type of problem is more difficult to find a workable solution.
Some of the scenarios that originally elicited poor success rates in the Baseline test continued to show poor performance in the FirstClick test. This was after making many changes to the homepage that were expected to improve user performance on these scenarios. For example, in the baseline test the ‘Budget’ scenario had an overall success rate of only 17%, and in the FirstClick test ‘Budget’ still had only 59% making the first click successfully. We noted that two scenarios that elicited good success rates in the baseline test continued to show very good performance in the FirstClick test. The FAS scenario, for example, had perfect performance in both tests.