Race and ethnicity: Collecting demographic data in a survey

There are multiple reasons why you might want to collect race/ethnic data on a survey and each of those reasons will have different implications for what you ask and how. In this post, I will clarify the different purposes that might be served by asking about race/ethnicity and provide some suggestions for creating a question that gathers usable data. This post focuses on American race/ethnic categories; in other parts of the world there are even more options and considerations.

Racially diverse group of elementary school kids sitting around teacher in a classroom

First, what is the difference between race and ethnicity, and why do I keep writing “race/ethnic”? Here’s a great overview of the topic from Stanford’s Gendered Innovations. Basically, race is a social construct based on oppression and colonization; ethnicity is a shared language or culture.

Even though race isn’t real, there are some good reasons to track it. One is that you may want to know whether there are differences in outcomes for members of different population groups, or whether participants in your program from different race/ethnic groups have different needs. 

You might be concerned that some groups have more unmet need than others, or that your program needs to culturally and linguistically reflect its participants. Or you may want to be able to articulate a disparity to funders and allies in order to support a program that addresses those disparities. For example, there are documented disparities in health care and education. If that’s your aim, then you may want to conduct some research in advance to identify groups that you’re particularly concerned about and make sure that your data collection strategy is aligned to that goal. I’ve written on presenting data by race here and Urban Institute presents some best practices here.

Another reason you might be collecting race/ethnic data is that you want to ensure that your program is serving a population that matches your community, or that nobody is being systematically excluded from participating in your program or services. If that’s your purpose, you’ll be comparing your sample to data from the US Census, and you may want to use their race/ethnic categories — which were updated for this year’s census.

Here is how the US Census asks about race/ethnicity in the 2020 Census.

census 2020 ethnicity question. Question text reads: Note, please answer both question 6 about Hispanic origin and question 7 about race. For this census, Hispanic origins ar enot races. 6. Is this person of Hispanic, Latino, or Spanish origin. Answer choices: No, not of Hispanic, Latino, or Spanish origin; Yes, Mexican, Mexican Am., Chicano; Yes, Puerto Rican; Yes, Cuban; Yes, another Hispanic, Latino, or Spanish originimage of census 2020 race question Question text reads: What is this person's race? Mark one or more boxes and print origins. Answer choices:  White; Black or African Am; American Indian or Alaska Native; Chinese; Filipino; Asian Indian; Other Asian; Vietnamese; Korean; Japanese; Native Hawaiian; Samoan; Chamorro; Other Pacific Islander; Some other race -- print race or origin. Under the first 3 options, there are text boxes to write in national origin or affiliation.

But even the Census Bureau researchers have agreed that asking about race and ethnicity separately is a little confusing and doesn’t really reflect how some populations think about their racial backgrounds. When they get to the question that asks for their race, some people who identify as Hispanic/Latino will select “other”.

Most Americans don’t think about race and ethnicity separately, so you might want to ask about race and ethnicity in a single question like this:

Question text: Which of the following best describes you. Answer choices: Asian or Pacific Islander, Black or African American, Hispanic or Latino, Native American or Alaskan Native, White or Caucasian, a race/ethnicity not listed here

In this version, respondents can check multiple boxes, so your totals will add up to more than 100% — just make a footnote on your charts. You will be able to perform comparisons with this data, for example do individuals who identify as one group have higher graduation rates than people who identify as a different group. With a little coding and cleaning up, you can transform the data to reflect the census’s categories if you need to compare to census data.

If you want to have respondents check only one box, you’ll need to add a multi-racial or bi-racial answer choice like this, which comes from Versta Research:

 Question text: Which of the following best describes you. Answer choices: Asian or Pacific Islander, Black or African American, Hispanic or Latino, Native American or Alaskan Native, White or Caucasian, Multiracial or biracial, a race/ethnicity not listed here

This option is nice and tidy and can easily be recoded to match the census categories. But sometimes, these answer choices don’t capture the full ethnic diversity of our neighborhoods.

I once presented the demographic data from a neighborhood survey that captured only race/ethnic data as reflected on the census. When I presented the data, some audience members pointed out that by including populations such as Cape Verdean and Haitian in the Black/African-American category, my data didn’t fully represent the cultural and linguistic diversity of the neighborhood. If you want to use your survey to make sure that your programming truly reflects your community’s cultural diversity, you might want to include national origin. The census does it in the open text below each answer choice, in the example above.  Alternately, you could add a national origin question to your survey, too. To speed the coding, you might write this as a multiple choice question where the neighborhood’s major national origins are given as answer choices. 

Question text: what is your ancestry or national origin. Answer choices: Cambodian, Cape Verdean, Dominican, Haitian, Jamaican, Korean, Other origin not listed here

Ultimately, how you collect this data should reflect what you’re going to do with it. There’s no need to create a long question with 20 answer choices if you don’t really have a plan to analyze or use that data. And you don’t want to create a question that alienates or offends respondents. While your funders and stakeholders might dictate how you report the data, they usually don’t dictate how you ask it. It is OK to recode data later if you are reporting to an agency that uses categories that don’t meet all of your needs. 







Pieta Blakely

About Pieta Blakely

I help mission-based organizations measure their impact so that they can do what they do well. I started my nonprofit career as a teacher in workforce development and adult basic education. It was important work and I was worried that we didn’t really know if we were doing it well. In the process of trying to answer that question, I got a Masters in Education and a PhD in Social Policy, and became an evaluator.

Leave a Comment

You must be logged in to post a comment.