John Legend.jpg

Google Assistant with John Legend

A late-stage usability study

 Background

Overview

The product was the John Legend Cameo with the Google Assistant.

If you say ‘talk like a Legend’, you would hear John Legends voice for a wide range of options, actions and features.






The Problem

We get a message from John Legends PR team, stating that they were worried that his voice would sound TOO much like the default Assistant voices and the voice for User Generated Content (such as map directions)

Basically, his PR team was concerned that someone would record a clip of the Google Assistant saying something obscene like, “Hi I’m John Legend! And I am an asshole” and people wouldn’t be able to tell if it was John Legends voice saying the whole thing or if the Assistant’s voice was involved.




My Role

My responsibilities included:

  • Helping create the study plan

  • Recruiting participants

  • Moderating sessions

  • Note taking

  • Data analysis

  • Presentation of insights w/ recommendations 

Research Questions

Research Questions.png

 Process & Challenges

Time Constraint

The first challenge was that the launch of the feature was quickly approaching and we had only a month to prove to Mr. Legend’s PR team that the voices we would suggest, would address their concerns.

Due to the time scope and the fact that we had 8 voices already available, we planned for a quick usability study since to test our general hypothesis:

  • The female voices would be more easily distinguishable compared to the male voices.

  • At least one male voice would be easily & accurately distinguishable.

Recruiting Participants

Due to the sensitive nature of the study, we needed to test with internal users. As part of the recruitment process, I made an email that described the incentive for participating, the time required, and a link to the screener questions and mailed it out the the UX forum group we had internally. From there, I compiled a shortlist of participants based on time available, product area they were in, as well as the Googler’s role. In the end, we got a good mix of participants gender and ux backgrounds

How to test 8 different voices?

The main issue at hand, was thinking through how we could test whether or not the default Assistant voices would be easily discernible from John Legends voice.

We decided that we could play short audio files for the main use cases of the Assistant. This was to replicate a more realistic scenario. Some example actions included:

  • Telling time

  • Playing music

  • Getting directions

  • Stating an address

  • Setting a reminder

  • Adding a contact

  • Making a call to a family member

Procedure & Methodology

A between-subject usability study

Since the study request came so last minute, we needed to prioritize whether it was more important for each participant to test every voice (within subjects) but only be able to listen to one or two audio clips for each variant, or have each participant test a different voice variant with all the audio clips. We decided the latter because there was so much variance with the audio clips (# of voice changes), therefore a within-subjects design may have produced more unreliable data.

Though this would mean we could only test with one participant for each condition (8 participants, 8 voices). In an ideal world, we would have liked to test with many more users for each condition, but constraints in the real world are impossible to avoid, so we knew this would only be the first step of a more throughout research plan, including a quantitative survey to follow up.

Moderation

We asked participants listen to to each audio file, and then raise their hand to indicate every time they heard a voice change. Each audio file would vary with the amount of voice changes; most would have around 4-5 however some clips had up to 13 voice changes.

Example of how audio clip voice changes would occur

Example of how audio clip voice changes would occur

Piloting study plan

We tested this method (of raising their hand) with out first participant, however, we got immediate feedback that it was a bit of a cognitive load to do. We observed the participant raising their hand, then quickly putting it down, then trying again, then asking to redo the task.

The participant later stated that trying to listen for changes and then raising their hand was a bit confusing to do at first. We assumed that this could be the case with the rest of our participants as well, so we talked about how to remedy this potential issue.

hand.png
 

Iterating on plan

We took this feedback and then moved to a more simpler form of using tally marks on a piece of paper. Same idea, except instead of raising their hands, participants would simply strike a tally mark for every time they heard a voice change.

On the note taking side, we also would keep track of when participants would ask to listen to a clip a second time or not.

Tally Mark.png

 Impact & Findings

Analyzing Data

The process was pretty straightforward (thankfully), due to the the data being documented in a binary success rate (correct or incorrect).

Essentially the main insights matched with our hypothesis of the female voices being more discernible compared to the male voices. However we found that not all female voices performed better than all the male voices.

With that being said, the goal of the research was to be able to confidently offer up one voice for each gender to appease JL’s PR team.

Keeping this goal in mind, I analyzed the success rate for each voice and extracted the top performing voice variants for each gender. Afterwards, I met with the embedded UX researcher on the project to go over the initial findings. In an efforts to not bias the results, at the start of the study he did not tell me which of the voice variants he thought would perform best, however through it all, his assumptions for the male voice matched up with the findings.

Recommendations

Due to the methodology not being the most rigorous in terms of sample size of participants testing each condition, I made it clear that these results should be treated as a signal for more research.

I recommended that the top two performing voices should be tested further through a follow up survey to get more validation. The team then took those two voices (1 female, 1 male) and launched a survey testing the comprehension of each voice following the same general format. The results from the survey this allowed the team to receive statistically significant data to back up notion that the proposed voice variants were easily & accurately discernible from John Legends voice.

With a usability study and a quantitative survey to back up our voice variants, John Legends PR team was satisfied and the feature was launched with great success to all parties involved!


Quick Summary

Untitled.png

 

 

Thanks for reading!

Feel free to check out some of my other projects or if you have any questions about the research process for this study, feel free to contact me and I would be happy to go over in more detail!


Previous
Previous

Emoji Categorization