1582307036_gboard_story.jpg

Emoji Category Icon Labeling

A survey study with the Expressions team

 


Background

Previous research suggests that users tend to rely on these categories to find their desired emoji.


Therefore it is important that the icons used to label these categories are intuitive and help users find the emojis they are searching for.

 

cropped2.png

Research Goals

  • To better understand and inform the design for the new category label icons for emoji.

  • To evaluate the current set of icon labels, new variations design sets and competitor’s keyboard. 

Designs Evaluated

Designs_Evaluated.png
 

Research Questions

Which set.png
 

To find the answers to these questions, the Expressions team conducted a survey

Overview.png

 My role

My Role.png

 

Process & Challenges


I drafted 9 surveys through Qualtrics

  1. Basic survey structure

    • Present stimuli (emoji) to participants

    • Ask where they would expect to find 

    • Ask how confident they were with their answer

    • Randomized logic order for questions to eliminate Order Effect Bias

  2. Piloted drafts 

    • Small samples

    • Whitelisting participants

  3. Launched final surveys

Example test items in Qualtrics

Example test items in Qualtrics


I exported the raw data from Qualtrics into Google Sheets to begin deep diving into analysis and cleaning up data


Qualtrics to sheets.png

Filtering the Noise

 I flagged anomalies in the data which would have otherwise had a negative effect on the results

As I began taking a closer look into data, I noticed a couple areas of concern…

 

Recruiting from Mturk had an issue of bots or noisy irrelevant data

  • Extremely short duration for completion

  • Same answer for every question 

    • Ex. Some participants would select ‘A’ for every survey question

  • Participant drop off (stopping half way through)

Example of bot completing survey in extremely quickly, thus creating irrelevant data.

Example of bot completing survey in extremely quickly, thus creating irrelevant data.

Since the N was relatively small (around 60 per survey), having over 10 results that were unreliable as data proved to be a potential issue. To remedy this, after deleting the irregular data points, I relaunched the survey again to make up for the difference of sample size I lost.


I converted 8400 responses by hand but knew it needed to be less time consuming 

TOO MUCH DATA! AHHHHHH!

TOO MUCH DATA! AHHHHHH!

Before being able to use descriptive statistics I needed to first

Convert responses to binary

  • Whether the answer was correct (+1) or incorrect (-1)

Rescale the binary according to confidence

  • How confident they were about their answer 

    • Ex. If a user was incorrect and Highly Confident, they would be scored a (-4) or while a correct answer that was slightly confident, would be scored a (+2)

The issue was that this part of the analysis was not easily scalable and quite painful :(


 

Improving process for future launches

I revisited the problem and wrote a formula in sheets to do this tedious conversion 

NewCode2.png

After I had already presented the findings, I wanted to make sure no one else would have to do those conversions by hand again. I wanted to ensure that the future iterations of this survey design would go smoother. 

Second iteration.png

Doing this helped make this particular method more scale-able for later uses from the Expressions team.

The next researcher on this project was able to replicate this survey structure and use some of analysis techniques to conduct their survey in a more efficient manner! It was amazing to see my efforts helped a fellow researcher over a year later!


 

Descriptive Statistics

After re-scaling all responses, I needed to calculate

  • The average score (the mean)

  • The standard deviation (to understand how much shared difference each variable had)

  • The 95% Confidence Interval (to show how much of a range the population had from the mean)

  • The Standard Error of Measurement (to determine how precise the measurement is, the smaller the SEM, the more precise the measurement capacity of the instrument.)

This was done for..

  • 8 different emoji categories (e.g people, activities, food)

  • with 2 variants for each category (an emoji within the category)

  • 9 different sets

    • Current set

    • 4 new design sets from Google

    • 4 competitor designs

 Impact and Results

I used descriptive statistics as a way to quantify users’ levels of understanding around emoji categorization for each design set 

I calculated the average scaled confidence score with error bars for each design set in relation to each emoji stimuli. 

The intent to try and uncover

  • Which design sets are performing better FOR specific emojis.

  • Which emojis are hard for users to categorize REGARDLESS of design set.

Essentially, emojis that had a low average confidence score, meant that they were not entirely clear to users.

 

Confusion.png
 

Displaying Results

With my audience in mind, I pivoted to creating graphs as a way to better communicate the findings

My rational behind this was that the main stakeholders were designers. The raw data is useful, but might be too much effort to sift through or could lead to an incorrect interpretation.

Initial raw data after calculation                                                                                                 Graph with average confidence score and 95% CI intervals plotted

Initial raw data after calculation Graph with average confidence score and 95% CI intervals plotted

 

Stakeholder Feedback

Stakeholder feedback.png

Quick Summary

Summary.png

Thanks for reading!

Feel free to check out some of my other projects or if you have any questions about the research process for this study, feel free to contact me and I would be happy to go over in more detail!

 

Previous
Previous

Household Notes on Home Hub

Next
Next

John Legend Cameo on Google Assistant