Methods
Please use this form to report any errors or issues you encounter in the project. Your feedback is valuable to helping us keep this tool functional and updated.
Initial article collection
To gather the initial list of trans-related news articles, the Trans News Initiative used Media Cloud, an open-source media research project by the Media Ecosystems Analysis Group, the Initiative for Digital Public Infrastructure at University of Massachusetts Amherst, and the School of Journalism at Northeastern University.
News articles were queried using a set of identity-related search terms from Media Cloud’s “United States - National” collection which includes over 240 sources. A team of journalists, researchers, and subject matter experts manually selected the initial eight identity-related terms listed below, focusing specifically on trans communities and not broader LGBTQIA+ communities. The terms are listed next to their exact queries, which account for different stylization, spellings and descriptors. Articles were considered a match if the query term appeared in the article’s headline or body text, when available. Because of legal and technical constraints, this project prioritizes breadth over depth at this time, only surfacing article headlines, not full body text.
| term | query |
|---|---|
| transgender | transgender |
| trans | "trans man" OR "trans men" OR "trans woman" OR "trans women" OR "trans person" OR "trans people" OR "trans individual" OR "trans community" |
| genderfluid | "gender-fluid" OR genderfluid OR "gender fluid" |
| gender non conforming | "gender non-conforming" OR "gender non conforming" OR "gender nonconforming" |
| gender expansive | "gender-expansive" OR genderexpansive OR "gender expansive" |
| nonbinary | "non-binary" OR nonbinary OR "non binary" |
| gender identity | "gender identity" |
| gender ideology | "gender ideology" |
The team used Python scripts to collect and process this data each week. Weekly data is available beginning in June 2025 onward. Historical data is available in three month chunks from January 2020 to June 2025. The site itself is updated monthly. Each article is included as a row in the data with the following properties.
| variable | description |
|---|---|
| id | A unique alphanumeric identifier for the article |
| indexed_date | The date the article was accessed via Media Cloud |
| language | All articles are tagged as “en” for English |
| media_name | The publication that published the article |
| media_url | The publication’s URL |
| publish_date | The original publication date of the article |
| title | The article’s headline |
| url | The article’s URL |
| query | The original query term from the list above |
Article classification
Wire stories published by multiple outlets were treated as individual articles instead of collapsed, prioritizing news dissemination and reach over unique reporting. Generic news round-ups and recaps (e.g., "Weekend Report", “Top Stories”, “News Roundup”) were filtered from the event data. We then used the RoBERTa-base model to assign embeddings to each article headline, and employed these embeddings to cluster the output using HDBSCAN. The clusters were labelled using an LLM aimed at creating an umbrella cluster phrase from the individual article headlines in the same cluster.
Subsequently, we asked two teams of subject matter experts to manually label a subset of event clusters with the 13 overarching (non-exclusive) themes listed below. This created a sample of roughly 8,000 theme-labelled news articles, which were used to train a classifier that subsequently applied the overarching theme labels to all other news articles. We also created a review system that allowed for manual correction of event labels and themes. Event clusters were included in the data visualization if they had 10 or more related articles.
| theme | description |
|---|---|
| Resilience, resistance, & solutions | Public demonstrations, protests, pride events, and other acts of resilience and resistance |
| Anti-trans violence & hate | Dangerous and damaging acts directed at the trans community, including but not limited to: harassment, hate speech, and fatal violence |
| Trans youth & parental rights | The debate around parental rights and child autonomy — this includes conversations about data privacy, K-12 education policy, and conflicts between which parents' rules preside |
| Ideology & culture wars | Philosophical, moral, religious, and cultural arguments surrounding transgender communities |
| U.S. federal measures | Actions, policies, legislations, and rulings taken at the federal level by federal representatives — this includes rhetoric and threats, too |
| Censorship & free speech | Censorship of supportive trans-related speech and use of harmful anti-trans rhetoric under the guise of free speech |
| Health care & bodily autonomy | Access to healthcare and the ability to make one's own decisions in regard to gender affirming care, abortions, and other medical care |
| Trans & nonbinary identity | Both the celebration of identity through protective laws or personal expriences — and the suppression of it through measures such as laws restricting legal identification |
| Transnationality | International events and laws, transnational conversations about gender policy, and policies around trans migrants |
| Pop culture & creativity | Representation across pop culture, the arts, and other creative fields |
| Access to public space | The ability to exist in public regardless of how you identify — this includes the right to nondiscriminatory employment, bathroom access, marriage, voting, entertainment, and more |
| Trans people in sports | Gender inclusivity and bans in sports spanning youth, school, collegiate, professional, and Olympic levels |
| U.S. state measures | Actions, policies, legislations, and rulings taken at the state level or by state representatives |
Publication political lean
To capture a publication’s political lean, we created an average lean by using Ground News’ media bias rating, which aggregates leans from Ad Fontes Media, All Sides, and Media Bias/Fact Check, and Media Cloud’s collections of publications tweeted by voters by political lean and tweeted by followers of politicians by political lean. Because this is done at the publication level, it may not reflect the political lean of individual coverage areas, sections, articles, reporters, or editors.
Each publication was assigned a value for each of the first 3 columns below. For example, CNN’s media bias rating from Ground News was “Lean left” (-1), CNN was tweeted evenly by Republican/Democratic voters in 2018 (0), and CNN was tweeted somewhat more by followers of liberal politicians in 2019 (-1).
| Ground News Lean | Media Cloud Twitter/X Voter Lean, 2018 | Media Cloud Twitter/X Follower Lean, 2019 | value |
|---|---|---|---|
| Left | Tweeted more by Democratic voters | Tweeted more by followers of Liberal politicians | -2 |
| Lean left | Tweeted somewhat more by Democratic voters | Tweeted somewhat more by followers of Liberal politicians | -1 |
| Center | Tweeted evenly by Republican/Democratic voters, 2018 | Tweeted evenly by followers of Conservative/Liberal politicians | 0 |
| Lean right | Tweeted somewhat more by Republican voters, 2018 | Tweeted somewhat more by followers of Conservative politicians | 1 |
| Right | Tweeted more by Republican voters, 2018 | Tweeted more by followers of Conservative politicians | 2 |
Those values were then summed and averaged, giving CNN an average value of -0.67. That average value was then mapped to the lean ranges seen below and are the political leans that we use.
| lean | value |
|---|---|
| Left | < -1.5 |
| Lean left | >= -1.5 and < -0.5 |
| Center | >= -0.5 and <= 0.5 |
| Lean right | > 0.5 and <= 1.5 |
| Right | > 1.5 |