Emoji Category Icon Labeling

A survey study with the Expressions team

Background

Previous research suggests that users tend to rely on these categories to find their desired emoji.

Therefore it is important that the icons used to label these categories are intuitive and help users find the emojis they are searching for.

Research Goals

To better understand and inform the design for the new category label icons for emoji.

To evaluate the current set of icon labels, new variations design sets and competitor’s keyboard.

Designs Evaluated

Research Questions

To find the answers to these questions, the Expressions team conducted a survey

My role

Process & Challenges

I drafted 9 surveys through Qualtrics

Basic survey structure
- Present stimuli (emoji) to participants
- Ask where they would expect to find
- Ask how confident they were with their answer
- Randomized logic order for questions to eliminate Order Effect Bias
Piloted drafts
- Small samples
- Whitelisting participants
Launched final surveys

I exported the raw data from Qualtrics into Google Sheets to begin deep diving into analysis and cleaning up data

Filtering the Noise

I flagged anomalies in the data which would have otherwise had a negative effect on the results

As I began taking a closer look into data, I noticed a couple areas of concern…

Recruiting from Mturk had an issue of bots or noisy irrelevant data

Extremely short duration for completion
Same answer for every question
- Ex. Some participants would select ‘A’ for every survey question
Participant drop off (stopping half way through)

Example of bot completing survey in extremely quickly, thus creating irrelevant data.

Since the N was relatively small (around 60 per survey), having over 10 results that were unreliable as data proved to be a potential issue. To remedy this, after deleting the irregular data points, I relaunched the survey again to make up for the difference of sample size I lost.

I converted 8400 responses by hand but knew it needed to be less time consuming

Before being able to use descriptive statistics I needed to first

Convert responses to binary

Whether the answer was correct (+1) or incorrect (-1)

Rescale the binary according to confidence

How confident they were about their answer
- Ex. If a user was incorrect and Highly Confident, they would be scored a (-4) or while a correct answer that was slightly confident, would be scored a (+2)

The issue was that this part of the analysis was not easily scalable and quite painful :(

Improving process for future launches

I revisited the problem and wrote a formula in sheets to do this tedious conversion

After I had already presented the findings, I wanted to make sure no one else would have to do those conversions by hand again. I wanted to ensure that the future iterations of this survey design would go smoother.

Doing this helped make this particular method more scale-able for later uses from the Expressions team.

The next researcher on this project was able to replicate this survey structure and use some of analysis techniques to conduct their survey in a more efficient manner! It was amazing to see my efforts helped a fellow researcher over a year later!

Descriptive Statistics

After re-scaling all responses, I needed to calculate

The average score (the mean)
The standard deviation (to understand how much shared difference each variable had)
The 95% Confidence Interval (to show how much of a range the population had from the mean)
The Standard Error of Measurement (to determine how precise the measurement is, the smaller the SEM, the more precise the measurement capacity of the instrument.)

This was done for..

8 different emoji categories (e.g people, activities, food)
with 2 variants for each category (an emoji within the category)
9 different sets
- Current set
- 4 new design sets from Google
- 4 competitor designs

Impact and Results

I used descriptive statistics as a way to quantify users’ levels of understanding around emoji categorization for each design set

I calculated the average scaled confidence score with error bars for each design set in relation to each emoji stimuli.

The intent to try and uncover

Which design sets are performing better FOR specific emojis.
Which emojis are hard for users to categorize REGARDLESS of design set.

Essentially, emojis that had a low average confidence score, meant that they were not entirely clear to users.

Displaying Results

With my audience in mind, I pivoted to creating graphs as a way to better communicate the findings

My rational behind this was that the main stakeholders were designers. The raw data is useful, but might be too much effort to sift through or could lead to an incorrect interpretation.

*Initial raw data after calculation Graph with average confidence score and 95% CI intervals plotted*

Stakeholder Feedback

Quick Summary

Thanks for reading!

Feel free to check out some of my other projects or if you have any questions about the research process for this study, feel free to contact me and I would be happy to go over in more detail!

Contact Me

Emoji Category Icon Labeling