06 April 2012

Week 12: Reviewing the Data

This week, I've been moving data from the spreadsheet used to capture notes during testing into another that compiles further statistics based on these data.  It assesses four measures:
  1. results of the system usability scale
  2. effectiveness of tasks
  3. efficiency of tasks
  4. results of the appearance scale
It then takes these four measures and averages them for an overall benchmark score.  It also calculates the mean for responses to the single-ease question (SEQ) asked after every task: Overall, on a scale from 1 to 7 where 1 is Very Difficult and 7 is Very Easy, this task was… I looked up the significance of the SEQ because I wasn't really sure what it was supposed to measure and I found this explanation:
Was a task difficult or easy to complete? Performance metrics are important to collect when improving usability but perception matters just as much. Asking a user to respond to a questionnaire immediately after attempting a task provides a simple and reliable way of measuring task-performance satisfaction. Questionnaires administered at the end of a test such as SUS, measure perception satisfaction. [1]
I'll admit, going into testing I was really only concerned with whether participants succeeded for failed at the tasks I created.  And I was feeling pretty good that there were only two incidences of failures among the 84 possibilities (14 separate measures for each of six participants). That resulted in a 98% effectiveness rating but alas that tells only part of the story.  The measure that is most suspect in my opinion is the efficiency rating. I spoke with Tanya about this and really there is no hard and fast rule for this one. What the UX team typically does is time themselves doing each task at a moderate pace and uses those times as benchmarks for an "expert" user to compare the times on task for each participant only for those tasks resulting in successes.  Pretty much, if one participant struggles a bit or is exploring the interface and trying to figure out what to do, it can really throw off the overall times. I was dismayed to see the mean efficiency of these tasks was only 74% which falls short of the standard 80 that we strive for on any metric.

But, the overall benchmark score for my study squeaked in at 81 which I guess means my designs were an improvement.  One of the big issues we were trying to address with this project was making it easier/clearer for users to be able to add channels to a content item. In WEM 8.0 summative testing, only 40% of customers and 44% of non-current customers were able to do this.  In my study, 100% of customers and 67% of non-customers were able to add channels without any prompting for an overall score of 83%.  The one non-customer failure for this task said it was likely an issue with the prototype because she just didn't notice that the "square" icons were supposed to be check boxes.  So here is the overall scorecard for this round of testing:

WEM v8.2 Pickers scorecard after one round of usability testing

The scorecard is a PowerPoint slide and I got to learn how to link data from an Excel spreadsheet into PowerPoint.  The only issue was that the error message I was receiving from PowerPoint that it could not update the data because it couldn't find the linked file was very confusing and I had to do a Google search to find where I needed to update this file path.  I never would have found this on my own.

From the Home button, go to Prepare and then Edit Links to Files
Next week I'll start interpreting these data and trying to weave an interesting story into a PowerPoint presentation that I'll eventually give to internal stakeholders.  I plan to use some video clips from the tests in addition to just throwing out a bunch of statistics.

Footnotes

[1] Sauro, J. If you could only ask one question, use this one.

No comments:

Post a Comment