Usability Testing a Quantity Conversion Feature

Overview

Timeframe: February 2017
Company: Home Depot
Role: Planning, facilitation, synthesis, design recommendations, strategic prioritization

Home Depot offers a wide variety of special order products – things we don’t stock in our stores. These products are available in a variety of units like bundles, cases, packs, pallets, pieces… You get the idea.

When customers ask for “30 lightbulbs”, or “enough shingles to cover 1,200 square feet of roof,” technically, it is possible for them to purchase just the right amount of product, but it’s not straightforward. After all, just how many lightbulbs are in a “case”?

It’s difficult for our sales associates to do these conversions quickly so, to help simplify the sales process, we shipped a set of features to support effortless quantity conversions. Basically, we made it possible for an associate to enter the quantity their customer wants and then the system automatically converts that quantity into something they can purchase.

I led a 9-person usability study focused on evaluating this experience. We looked at whether users were able to notice and rationalize why their quantity had changed from what they had originally entered.

Synthesizing data – video on one screen, timestamps and notes on the other

Problem

This study was primarily aimed at answering the following two questions:

  1. When we pull the rug out from under users (by changing their quantity), do they notice?
  2. If they do notice, are they able to rationalize why this happened?

Solution

Scope & Constraints

We had an unusual amount of bandwidth for this study, a week or so to prepare, access to a more private environment (rather than at the Pro Desk itself), 30+ minutes per session, and 2 days to conduct the study.

Our scope was extremely broad. We tested a total of 13 hypotheses covering everything from interaction design details, interface copy, and high-level workflows.

Format

The test was divided into two parts. We started with a task-based usability study and concluded with more of an interview. All of the sessions were conducted in person at a local Home Depot store.

Steps I Took

Here is how I approached this study.

1. Decided what to test

For a number of months our team had been capturing many of the assumptions that we were basing our design decisions on. Sometimes we had high confidence that an assumption was valid. Other times we had low confidence. I compiled a list of higher risk assumptions that warranted validation.

I built a database to organize much of our user research data, all the way from assumptions to findings

2. Formed hypotheses

After reviewing and editing that list of assumptions I translated them into testable hypotheses using Jeff Gothelf’s recommended format from his book Lean UX (i.e. “We believe [this to be true]. We will know we’re right when we see [this signal or measure].”)

3. Created test script

Next I wrote tasks and questions that were specifically crafted to get insight into those hypotheses. This became my test script.

4. Conducted sessions

Myself and another designer went to the store on day one, expecting to test with about 3 participants. We ended up doing 4 sessions. The following day we went to store again and did 5 more sessions!

We captured the screen and audio for later review.

5. Reviewed recordings

I spent several days after the sessions combing through the recordings and observer notes. Not only did I make very detailed notes about what participants said and did, but I also captured timestamp data for key events.

Recall the two primary questions this study addressed:

  1. When we pull the rug out from under users (by changing their quantity), do they notice?
  2. If they do notice, are they able to rationalize why this happened?

Technically these questions could be answered with a simple “yes” / “no” answer, but that would simply reveal the presence of usability issues, not anything about their severity.

Timestamp data reveals both the presence and severity of usability issues.

First I captured timestamps for specific actions, then ran simple calculations to determine the time elapsed between them

6. Analyzed notes and timestamp data

I captured nearly 400 distinct notes from all 9 sessions, including over 100 direct quotes. Along with the qualitative side, I now had all the quantitative timestamp data. There was a lot of data to analyze!

I was able to identify some meaningful trends by analyzing user’s behavior and comments by task.

The timestamp data lent substantial weight to my findings, clearly indicating we had some pretty severe usability problems on our hands.

Ugh, look at all that red… Red equals “bad”, in case you wondered.

7. Produced findings and recommendations

I produced 16 findings from the study. To socialize these results I created a lightweight, focused report which also included 20 specific design recommendations to improve the usability issues that the study had revealed.

8. Added design work to the backlog

I worked with our team’s product manager to vet my recommendations and ended up including 14 of them on our team’s backlog. I played a role in defining this work and reviewing the subsequent design work (produced by another designer on the team).

Outcome

As I mentioned above, we took some specific actions to address the usability problems surfaced in this study. This has improved our ability to onboard more products and scale our catalog, however, some of the usability problems we identified are still not resolved. I turns out, these problems go much deeper than a specific feature or interface; there are bigger, systemic challenges we’re facing with our product data.

Throughout this study, I also had some important revelations about usability testing itself. Here are some of the things I learned.

Test Earlier

Don’t wait until something has been released to production before conducting some kind of usability testing. We have since started testing preliminary concepts much earlier in our design process.

Limit Scope

Establish very clear and limited scope up front. We originally tried to tackle 13 hypotheses. In reality, we should have limited this to 2–3. Thankfully, before I did too much analysis, my new manager helped me trim the scope down to just 2 questions. This was after we had already gathered an immense amount of data, though, so it would have been better to limit scope up front. These days I mostly restrict my research to a single hypothesis or question at a time.

Limit Participants

Limit the test to 5 participants. I was very pleased to have so many willing participants, but it would have been better to conduct two separate studies with 4–5 participants each rather than a single study with 9 participants.

Increase Collaboration

Collaborate with non-design stakeholders on the test plan. Depending on the study, stakeholders often have their own questions and concerns which the research could help address. In this particular study I waited to involve our product manager until after the study was complete; he would have benefited from being involved much sooner. This also helps with post-testing communication and persuading stakeholders to buy in to the results, especially when the results conflict with their strongly-held beliefs.