Background
Previous research suggests that users tend to rely on these categories to find their desired emoji.
Therefore it is important that the icons used to label these categories are intuitive and help users find the emojis they are searching for.
Research Goals
To better understand and inform the design for the new category label icons for emoji.
To evaluate the current set of icon labels, new variations design sets and competitor’s keyboard.
Designs Evaluated
Research Questions
To find the answers to these questions, the Expressions team conducted a survey
My role
Process & Challenges
I drafted 9 surveys through Qualtrics
Basic survey structure
Present stimuli (emoji) to participants
Ask where they would expect to find
Ask how confident they were with their answer
Randomized logic order for questions to eliminate Order Effect Bias
Piloted drafts
Small samples
Whitelisting participants
Launched final surveys
I exported the raw data from Qualtrics into Google Sheets to begin deep diving into analysis and cleaning up data
Filtering the Noise
I flagged anomalies in the data which would have otherwise had a negative effect on the results
As I began taking a closer look into data, I noticed a couple areas of concern…
Recruiting from Mturk had an issue of bots or noisy irrelevant data
Extremely short duration for completion
Same answer for every question
Ex. Some participants would select ‘A’ for every survey question
Participant drop off (stopping half way through)
Since the N was relatively small (around 60 per survey), having over 10 results that were unreliable as data proved to be a potential issue. To remedy this, after deleting the irregular data points, I relaunched the survey again to make up for the difference of sample size I lost.
I converted 8400 responses by hand but knew it needed to be less time consuming
Before being able to use descriptive statistics I needed to first
Convert responses to binary
Whether the answer was correct (+1) or incorrect (-1)
Rescale the binary according to confidence
How confident they were about their answer
Ex. If a user was incorrect and Highly Confident, they would be scored a (-4) or while a correct answer that was slightly confident, would be scored a (+2)
The issue was that this part of the analysis was not easily scalable and quite painful :(
Improving process for future launches
I revisited the problem and wrote a formula in sheets to do this tedious conversion
After I had already presented the findings, I wanted to make sure no one else would have to do those conversions by hand again. I wanted to ensure that the future iterations of this survey design would go smoother.
Doing this helped make this particular method more scale-able for later uses from the Expressions team.
The next researcher on this project was able to replicate this survey structure and use some of analysis techniques to conduct their survey in a more efficient manner! It was amazing to see my efforts helped a fellow researcher over a year later!
Descriptive Statistics
After re-scaling all responses, I needed to calculate
The average score (the mean)
The standard deviation (to understand how much shared difference each variable had)
The 95% Confidence Interval (to show how much of a range the population had from the mean)
The Standard Error of Measurement (to determine how precise the measurement is, the smaller the SEM, the more precise the measurement capacity of the instrument.)
This was done for..
8 different emoji categories (e.g people, activities, food)
with 2 variants for each category (an emoji within the category)
9 different sets
Current set
4 new design sets from Google
4 competitor designs
Impact and Results
I used descriptive statistics as a way to quantify users’ levels of understanding around emoji categorization for each design set
I calculated the average scaled confidence score with error bars for each design set in relation to each emoji stimuli.
The intent to try and uncover
Which design sets are performing better FOR specific emojis.
Which emojis are hard for users to categorize REGARDLESS of design set.
Essentially, emojis that had a low average confidence score, meant that they were not entirely clear to users.
Displaying Results
With my audience in mind, I pivoted to creating graphs as a way to better communicate the findings
My rational behind this was that the main stakeholders were designers. The raw data is useful, but might be too much effort to sift through or could lead to an incorrect interpretation.
Stakeholder Feedback
Quick Summary
Thanks for reading!
Feel free to check out some of my other projects or if you have any questions about the research process for this study, feel free to contact me and I would be happy to go over in more detail!