Dahyun Choi

I work on core questions across politics, markets, and governance of how government institutions and organized interests produce and evaluate information, and strategically shape the scientific and technical foundations of public policy. Leveraging machine learning, causal inference, and formal theory, I study the settings where regulatory agencies evaluate expertise, interest groups compete over policy implementation and knowledge production, and legislatures direct public investment in innovation.

I also develop machine learning methods for measuring political and organizational behavior.

Recent News

May 2026 Scheduled to give a talk at Northwestern Kellogg (Political Economy Rookiefest)
May 2026 Scheduled to give a talk at Cornell (Congress & History Conference)

Peer-Reviewed Publications

Fine-tuned Large Language Models Can Replicate Expert Coding Better than Trained Coders: A Study on Informative Signals Sent by Interest Groups
Understanding the political process in the United States requires examining how information is provided to politicians and the general public. While existing studies point to interest groups as strategic information providers, studying this aspect empirically has been challenging due to the need for expert-level annotation in measurement. We make two contributions. First, we demonstrate that fine-tuned large language models (LLMs) can replicate expert-level annotation in a specialized area above the accuracy of lightly-trained workers, crowd-workers, and zero-shot LLMs. Second, we quantify two types of interest group signals that are difficult to separate empirically using other means: 1) informative signals that help agents improve political decisions, and 2) associative signals that influence preference formation but lack direct relevance to the substantive topic of interest. We demonstrate the utility of this approach using two applications where our classifier generalizes out of distribution. This study shows methodologically the applicability of large language models for complex expert-driven measurement tasks but also shows substantively that interest groups strategically tailor the composition of signals under different institutional settings.

with Brandon Stewart and Denis Peskoff

Forthcoming in Political Science Research and Methods

Why Interest Groups With Divergent Goals Collaborate: Evidence From Climate Regulation
Why do interest groups with contrasting interests and policy goals work together? I present a theory of collaborative policy production and show that interest groups can achieve higher policy gains through collaboration, even though their ideal policy goals may diverge significantly. To test theoretical results, I introduce original measurement strategies that reveal systematic patterns in which firms and environmental groups invest in joint efforts to improve fine-grained details of policy to achieve greenhouse gas emissions targets. The analysis, using public comments spanning 2010-2020, demonstrates that comments written jointly by environmental groups and firms contain more information that can contribute to the quality of policy implementation than individual efforts alone, despite compromises on policy preferences. These findings highlight the hidden dynamics of regulatory politics, wherein divergent political goals are reconciled for high-quality policy implementation.

2026. Economics & Politics, 38: 46–61.

Working Papers

Partisan Bias and the Resilience of High-Impact Science
How do partisan bias and scholarly impact within academic communities jointly shape the use of evidence in regulatory policymaking? I investigate this question using a novel dataset of 16,783 peer-reviewed studies evaluated by the Environmental Protection Agency for the Integrated Science Assessments (ISA), which inform the National Ambient Air Quality Standards. I find that Democratic administrations are 15.4% more likely to cite pro-regulatory studies, while Republican administrations are 17.5% less likely to do so. These effects correspond to a two-standard-deviation change in study impact measures. Yet partisan bias is moderated by evidentiary impact: high-impact studies are cited consistently across administrations, while lower-impact studies are less penalized when aligned with an administration's policy agenda. Evidence on participant selection in the ISA process suggests a plausible mechanism underlying this pattern. Together, the findings suggest that while science retains epistemic authority, its application is shaped by political context within the administrative state.

Revise & Resubmit at American Journal of Political Science

How Much Data Is Enough? A Design-aware Approach to Empirical Sample Complexity
How much data is needed to ensure that a model performs reliably on new, unseen data? Despite its central importance to empirical research design, sample size decisions are often made heuristically—guided more by resource constraints than by principled diagnostics. Existing tools like power analysis and cross-validation offer limited insight into how predictive performance scales with sample size. We introduce a design-aware, empirical framework for estimating sample complexity bounds tailored to applied settings. By fitting smooth extrapolation functions to model performance from resampled pilot data, our method estimates the sample size needed to achieve researcher-specified generalization guarantees. Through applications to supervised learning tasks involving extensive human-annotated data, we show that generalization often stabilizes with as little as 10% of typical labeling costs. This approach provides a statistically grounded, interpretable diagnostic for generalization performance and a practical tool for political scientists designing data-intensive studies under resource constraints or design uncertainty.

with Perry Carter

Revise & Resubmit at American Journal of Political Science

When Algorithms Govern
Innovation by Design: How Legislative Institutions Shape the Direction of Federal R&D
Politics of Academic Experts: Evidence from Antitrust Regulations

with Nolan McCarty

Interest Group Ecologies and Ideological Niches

with Charles Cameron

Sample Complexity for Open-Ended Responses

with Perry Carter and Narrelle Gilchrist


Software

scR
scR is an R package developed by Carter & Choi (2025) designed to help researchers determine how much data is needed for reliable generalization. It provides a design-aware empirical framework that estimates sample complexity using smooth extrapolation of model performance from pilot data. Additionally, scR offers theoretical guidance by calculating Vapnik-Chervonenkis Dimension (VCD). This interpretable diagnostic is particularly helpful for empirical researchers designing data-intensive studies under resource constraints or uncertainty. For more details, see Carter & Choi (2025), "How Much Data Is Enough? A Design-aware Approach to Empirical Sample Complexity" (doi:10.31219/osf.io/evrcj_v2).

Available on CRAN · with Perry Carter