Skip to main content

Methods

Please use this form to report any errors or issues you encounter in the project. Your feedback is valuable to helping us keep this tool functional and updated.

Initial article collection

To gather the initial list of trans-related news articles, the Trans News Initiative used Media Cloud, an open-source media research project by the Media Ecosystems Analysis Group, the Initiative for Digital Public Infrastructure at University of Massachusetts Amherst, and the School of Journalism at Northeastern University.

News articles were queried using a set of identity-related search terms from Media Cloud’s “United States - National” collection which includes over 240 sources. A team of journalists, researchers, and subject matter experts manually selected the initial eight identity-related terms listed below, focusing specifically on trans communities and not broader LGBTQIA+ communities. The terms are listed next to their exact queries, which account for different stylization, spellings and descriptors. Articles were considered a match if the query term appeared in the article’s headline or body text, when available. Because of legal and technical constraints, this project prioritizes breadth over depth at this time, only surfacing article headlines, not full body text.

termquery
transgendertransgender
trans"trans man" OR "trans men" OR "trans woman" OR "trans women" OR "trans person" OR "trans people" OR "trans individual" OR "trans community"
genderfluid"gender-fluid" OR genderfluid OR "gender fluid"
gender non conforming"gender non-conforming" OR "gender non conforming" OR "gender nonconforming"
gender expansive"gender-expansive" OR genderexpansive OR "gender expansive"
nonbinary"non-binary" OR nonbinary OR "non binary"
gender identity"gender identity"
gender ideology"gender ideology"

The team used Python scripts to collect and process this data each week. Weekly data is available beginning in June 2025 onward. Historical data is available in three month chunks from January 2020 to June 2025. The site itself is updated monthly. Each article is included as a row in the data with the following properties.

variabledescription
idA unique alphanumeric identifier for the article
indexed_dateThe date the article was accessed via Media Cloud
languageAll articles are tagged as “en” for English
media_nameThe publication that published the article
media_urlThe publication’s URL
publish_dateThe original publication date of the article
titleThe article’s headline
urlThe article’s URL
queryThe original query term from the list above

Article classification

Wire stories published by multiple outlets were treated as individual articles instead of collapsed, prioritizing news dissemination and reach over unique reporting. Generic news round-ups and recaps (e.g., "Weekend Report", “Top Stories”, “News Roundup”) were filtered from the event data. We then used the RoBERTa-base model to assign embeddings to each article headline, and employed these embeddings to cluster the output using HDBSCAN. The clusters were labelled using an LLM aimed at creating an umbrella cluster phrase from the individual article headlines in the same cluster.

Subsequently, we asked two teams of subject matter experts to manually label a subset of event clusters with the 13 overarching (non-exclusive) themes listed below. This created a sample of roughly 8,000 theme-labelled news articles, which were used to train a classifier that subsequently applied the overarching theme labels to all other news articles. We also created a review system that allowed for manual correction of event labels and themes. Event clusters were included in the data visualization if they had 10 or more related articles.

themedescription
Resilience, resistance, & solutionsPublic demonstrations, protests, pride events, and other acts of resilience and resistance
Anti-trans violence & hateDangerous and damaging acts directed at the trans community, including but not limited to: harassment, hate speech, and fatal violence
Trans youth & parental rightsThe debate around parental rights and child autonomy — this includes conversations about data privacy, K-12 education policy, and conflicts between which parents' rules preside
Ideology & culture warsPhilosophical, moral, religious, and cultural arguments surrounding transgender communities
U.S. federal measuresActions, policies, legislations, and rulings taken at the federal level by federal representatives — this includes rhetoric and threats, too
Censorship & free speechCensorship of supportive trans-related speech and use of harmful anti-trans rhetoric under the guise of free speech
Health care & bodily autonomyAccess to healthcare and the ability to make one's own decisions in regard to gender affirming care, abortions, and other medical care
Trans & nonbinary identityBoth the celebration of identity through protective laws or personal expriences — and the suppression of it through measures such as laws restricting legal identification
TransnationalityInternational events and laws, transnational conversations about gender policy, and policies around trans migrants
Pop culture & creativityRepresentation across pop culture, the arts, and other creative fields
Access to public spaceThe ability to exist in public regardless of how you identify — this includes the right to nondiscriminatory employment, bathroom access, marriage, voting, entertainment, and more
Trans people in sportsGender inclusivity and bans in sports spanning youth, school, collegiate, professional, and Olympic levels
U.S. state measuresActions, policies, legislations, and rulings taken at the state level or by state representatives

Publication political lean

To capture a publication’s political lean, we created an average lean by using Ground News’ media bias rating, which aggregates leans from Ad Fontes Media, All Sides, and Media Bias/Fact Check, and Media Cloud’s collections of publications tweeted by voters by political lean and tweeted by followers of politicians by political lean. Because this is done at the publication level, it may not reflect the political lean of individual coverage areas, sections, articles, reporters, or editors.

Each publication was assigned a value for each of the first 3 columns below. For example, CNN’s media bias rating from Ground News was “Lean left” (-1), CNN was tweeted evenly by Republican/Democratic voters in 2018 (0), and CNN was tweeted somewhat more by followers of liberal politicians in 2019 (-1).

Ground News LeanMedia Cloud Twitter/X Voter Lean, 2018Media Cloud Twitter/X Follower Lean, 2019value
LeftTweeted more by Democratic votersTweeted more by followers of Liberal politicians-2
Lean leftTweeted somewhat more by Democratic votersTweeted somewhat more by followers of Liberal politicians-1
CenterTweeted evenly by Republican/Democratic voters, 2018Tweeted evenly by followers of Conservative/Liberal politicians0
Lean rightTweeted somewhat more by Republican voters, 2018Tweeted somewhat more by followers of Conservative politicians1
RightTweeted more by Republican voters, 2018Tweeted more by followers of Conservative politicians2

Those values were then summed and averaged, giving CNN an average value of -0.67. That average value was then mapped to the lean ranges seen below and are the political leans that we use.

leanvalue
Left< -1.5
Lean left>= -1.5 and < -0.5
Center>= -0.5 and <= 0.5
Lean right> 0.5 and <= 1.5
Right> 1.5