The Six Camps of Metascience

A field guide for policymakers (and everyone who cares about American science).

Jun 11, 2026

Two things can be true at once:

Science is the ultimate public good and has resulted in massive positive spillovers, making R&D one of the most important investments the federal government makes.1
The systems we use to fund, select, structure, and conduct that science have serious problems and could work much better.

The federal government spends nearly $200 billion a year on research and development. If we could make the process of identifying, distributing, and managing those funds even 5% more effective, we would effectively unlock another $10 billion a year for higher expected value science and technology.2 For context, that is slightly more than what we spend annually on NASA’s entire science budget, or on the National Cancer Institute.

This insight has driven a series of “science reform” movements over the years, some within existing science policy circles and some from other vantage points. Metascience is the study of the scientific enterprise, aimed at generating evidence about what increases the reliability, productivity, and impact of science. But different science reformers have distinct core hypotheses about what has gone wrong and what is most tractable to fix. How do the people building new Focused Research Organizations fit in with those working on the replication crisis? Are they all doing metascience? And how is metascience different from science policy in general?3

If policymakers and agency leaders are going to change how they spend billions of dollars of taxpayer money based on metascience advice, they should have a clear picture of the landscape of metascience and its various goals, perspectives, and terms.

This is our attempt to map the landscape. We’ve split the territory into six recognizable metascience ‘camps’. Each camp advances a distinct hypothesis about which parts of the scientific enterprise most need attention and how we should test what works.4

A few caveats before we share the landscape.

These camps are permeable. Many individuals and institutions move between two or three, and the borders are intellectual emphases rather than walls. Each is grappling with a different part of the same elephant.
For this landscape, we are also limiting ourselves to metascience rather than the broader landscape of science policy. The line is debatable, but we think of science policy and metascience as overlapping circles: metascience asks how the scientific enterprise functions and how to make it work better, where evidence about outcomes is the primary arbiter. Science policy includes many metascience questions but also broader normative, political, diplomatic, and national security ones. Not all metascience is science policy, and not all science policy is metascience, but the two have plenty to say to each other.
The methods we list for each hypothesis reflect dominant practice, not a logical constraint. In principle, most questions could be pursued with multiple methods.

With those caveats, here are the six camps and some examples of their associated research and initiatives.

Research Integrity

Hypothesis: Poor statistical practices and publication bias have often resulted in publishing inaccurate or overstated claims, leaving researchers to spend time following unproductive paths of inquiry. Science could be more reliable and current institutional incentives perpetuate the problem.

Theory of change: Fix methods and publication norms → fix the literature → more reliable science.

Policy relevance: This is the version of metascience that NIH Director Jay Bhattacharya has spoken about directly and launched initiatives to address. The “replication crisis” showed that in fields like behavioral psychology, a large fraction of published results don’t hold up when repeated (see examples below). A representative policy question: should we require replication studies as a condition of large grants?

Evidence and methods: Common methods include large-scale replication studies, meta-analyses, statistical power analyses, and adversarial collaborations to replicate empirical results. The inference that science needs stricter standards is supported by the pattern of failures rather than by randomized controlled trials (RCTs) testing the standards themselves.

Examples:

“Why Most Published Research Findings Are False,” which argued that most published findings are likely false when studies are small, effect sizes are small, and many teams pursue the same question. This is the foundational text of the replication crisis literature.
The Reproducibility Project: Psychology, which attempted to replicate 100 studies from three top psychology journals and found that only 36% of replications produced statistically significant results, even though 97% of the original studies had reported significant findings.
The Many Labs project series, which has run a series of replications of classic and contemporary psychology findings with replication results varying widely (from 30% to 75%) across iterations (2014, 2018, 2016)
Data Colada established the foundational case against “p-hacking” and has since surfaced evidence of data fabrication in high-profile research. P-hacking is manipulating data to artificially produce a statistically significant result.
Registered Reports, a publishing format in which a study’s design and analysis plan undergo peer review and receive in-principle acceptance before any data are collected. This structure was designed to reduce publication bias and selective reporting of positive results.

Open Science

Hypothesis: A lack of transparency enables bad practices and slows progress. The case for openness is both that transparency is good in itself and that science advances faster when it’s open by default.

Theory of change: Increase transparency → increase accountability → faster, more reliable science.

Policy relevance: This perspective shows up in transparency requirements in agency grants and in open access mandates, such as the 2022 White House Office of Science and Technology Policy memorandum requiring publications (and underlying data) to be available without an embargo and Plan S, a mandate for immediate-publication access from a coalition of European research funders. It has produced practical infrastructure that many researchers now use regularly, such as preprints, preregistration, and registered reports. A representative policy question: should agencies mandate open data, preregistration, and open access for federally funded research?

Evidence and methods: Historically, their approach has been normative and values-based, with some empirical analysis. The movement’s strength comes from a combination of advocacy-driven policy reform, platform development, and community norm-setting, increasingly paired with empirical evaluation of outcomes.

Examples:

The Open Science Framework, a free platform used by hundreds of thousands of researchers to host preregistrations, protocols, data, and study materials.
The Public Library of Science (PLOS), founded in 2000 through an open letter signed by ~34,000 scientists who pledged to publish only in journals making articles freely available, and which has since grown into one of the largest open-access publishers in science and demonstrated that fully open-access publications are commercially viable.
Preprint servers such as arXiv and bioRxiv, which allow researchers to share manuscripts before peer review and have dramatically accelerated dissemination in physics, mathematics, biology, and adjacent fields.
Open data citation analyses, including a multivariate analysis that found papers making their data publicly available received more citations than comparable papers without publicly available data.

Metrics of Science

Hypothesis: Science is a complex system with discoverable regularities. Large-scale data can reveal how breakthroughs happen, how careers unfold, and what predicts impact.

Theory of change: Understand patterns in how science works → more informed future program design → higher impact future research.

Policy relevance: This body of work provides evidence on optimal team size, career stage effects, and the novelty-versus-convention tradeoff. A representative policy question: should government programs fund more small teams, given that they are more likely to produce disruptive work than large teams?

Evidence and methods: Common methods are largely descriptive and include computational analysis of bibliometric data and machine learning on large scientific corpora. Some scholars track social dynamics (teams, citations, networks); others study materials used in research, arguing that new methods and tools are the primary engine of breakthroughs.

Examples:

The “Science of Science” research program, which synthesized large-scale quantitative analyses of scientific careers, citation networks, team dynamics, and knowledge production, providing the canonical statement of the field’s research agenda and methodological toolkit.
Disruption index research, which uses citation network patterns to distinguish papers and patents that consolidate fields from those that disrupt them and to measure whether disruptive work has been declining over recent decades.
Hot streaks research, which identifies periods within scientific, artistic, and film careers where individuals produce their highest-impact works in concentrated bursts, with no accompanying change in productivity, often following a phase of exploration that transitions into focused work.
Team size and innovation studies, which find that smaller teams are systematically more likely to produce disruptive work while larger teams develop and extend existing fields, suggesting that funding portfolios should support a diversity of team sizes.
Atypical combinations research, which finds that the highest-impact papers tend to combine conventional foundations of prior knowledge with unconventional mixtures of ideas (e.g., drawing on research from two fields that rarely go together).

Innovation Economics

Hypothesis: Funding structures and incentive design causally shape the direction, quality, and quantity of science.

Theory of change: Understand incentives and causation →redesign programs → better science and translation.

Policy relevance: If a policymaker needs to know what might happen when you pull a specific lever, this is where to look. A representative policy question: do person-based grants produce more innovation than project-based grants and, if so, should more of the R&D portfolio be invested through person-based mechanisms?

Evidence and methods: Both causal and descriptive empirical methods are used, including RCTs, natural experiments, and other quasi-experimental analyses. This body of work relies on research designs that isolate cause and effect, which makes its evidence attractive for specific mechanism questions. It focuses on testing questions where credible counterfactuals exist.

Examples:

Howard Hughes Medical Institute versus NIH comparisons, which find that scientists funded under HHMI’s “people, not projects” model, where grantees receive freedom to change direction and tolerance for early failures, produce more high-impact papers and explore more novel topics than comparable NIH-funded scientists.
Intellectual Property (IP) protection and follow-on innovation studies, including evidence that a brief, non-patent form of IP held by the firm Celera over portions of the human genome reduced subsequent research and product development on those genes by 20–30%. Later research suggests that gene patents, by contrast, had little effect on follow-on innovation.
Studies of expertise versus bias in peer review, including a quasi-experimental analysis of NIH grant evaluations suggesting that though reviewers are more biased about projects in their own subject area, the benefits of their expertise still weakly dominate the costs of bias. This implies that policies seeking to limit bias by using impartial evaluators may actually reduce decision quality.
Small Business Innovation Research (SBIR) Program evidence, including a quasi-experimental study of Department of Energy SBIR applicants showing that an early-stage award roughly doubles the probability of subsequent venture capital investment and produces large positive effects on patenting and commercialization.

R&D Management and Implementation

Hypothesis: Organizational design and management practices shape what science gets done. The same funding mechanism can succeed or fail depending on how the agency implements it. Beyond evidence about which mechanisms should work, you also need evidence about how to make those mechanisms succeed in practice.

Theory of change: Understand how R&D organizations work in practice→ design more practical reforms → more sustainable adoption.

Policy relevance: This approach answers the implementation questions generated by policy recommendations. A representative policy question: how should we design the organizational structure for a new agency metascience office, i.e. where should it sit, what authority does the director need, and how do we make sure it exists long enough to produce results?

Evidence and methods: Common methods include institutional analysis, comparative case studies, program evaluation, mixed-methods research, process tracing, and interviews with agency officials.

Examples:

Comparative institutional analyses of NSF, NIH, DARPA, and ARPA-E, which document how organizational features such as program-director authority, hiring flexibility, and active project management distinguish ARPA-style agencies from traditional grantmakers.
Studies of “open” versus “conventional” innovation procurement, including a quasi-experimental study comparing the outcomes of the awardees of Air Force SBIR “open” topics and conventional topics competitions. Winning an Open award increased subsequent non-SBIR Pentagon contracts, venture capital investment, and high-originality patents, whereas winning a conventional award increased the probability of winning a future SBIR contract.
Analysis of ARPA-E’s active program management, where program directors retain authority to expand, contract, or cancel awards based on performance. Research found ARPA-E-funded cleantech startups filed patents at roughly twice the rate of comparable firms.
NIH intramural technology transfer impact analysis, which traces how NIH-licensed inventions move from intramural labs to market. The study found that, in contrast to the simplistic linear “pipeline” model of innovation that agency policy often assumes, a chain-linked model in which research and product development continuously inform each other is a better fit to what NIH technology transfer offices actually do.

Institutional Entrepreneurs

Hypothesis: Existing institutions are too ossified to reform. The best path forward is to build new entities that demonstrate radically different models.

Theory of change: Generate hypotheses for new organizations → create existence proofs → competitive pressure and policy templates for incumbents.

Policy relevance: These experiments generate real-world evidence about what is possible. That evidence can be unusually persuasive to policymakers because it is demonstrated rather than hypothetical. A representative policy question: How much of a funder’s portfolio should fund new institutions?

Evidence and methods: Institutional entrepreneurs conduct real-world organizational experimentation, but these institutions are more proofs of concept than formalized experiments. The work resembles entrepreneurial iteration more than social science hypothesis testing. Its methods carry high persuasive power for demonstrating feasibility but make limited claims to generalizability.

Examples:

“Reinventing Discovery: The New Era of Networked Science,” an early and influential argument that internet-enabled collaboration tools could fundamentally transform how scientific discovery happens, providing an intellectual foundation for many of the open and collaborative experiments that followed.
Arc Institute, founded in 2021 with roughly $650 million in private funding, which provides investigators with eight years of unrestricted funding and lab support for up to 20 people in lieu of project-based grant applications.
Convergent Research and the Focused Research Organization (FRO) model, founded in 2021, which structures mid-scale engineering projects as nonprofit, time-limited startups (typically $30–50 million over 5-6 years). FROs aim to build tools and datasets that fall in the gap between academic labs and industry.
Arcadia Science, founded in 2021 as a for-profit research institute combining basic research, tool-building, and translational development under one roof, as well as experimenting with publishing outside traditional journals to release research products earlier and more openly.

Charting the map

The camps in this landscape agree on a surprising amount! Federal R&D investment having a positive ROI overall is a longstanding consensus across metascience and science policy. In addition, metascience scholars and policy entrepreneurs largely agree that the current system of science funding is suboptimal, peer review can suppress risk-taking, administrative burden is excessive, career incentives distort behavior, and evidence should inform system improvements.

But those shared premises can lead to very different reform recommendations, depending on the theory of change, method, and intellectual tradition.5 This metascience landscape reflects the diverse priorities held by the metascience community. Some camps, particularly Innovation Economics, R&D Management, and Institutional Entrepreneurs, converge on a similar diagnosis of the funding system’s biggest problems. Others approach the same enterprise from different angles: Research Integrity worries about the reliability of published findings; Open Science prioritizes transparency. This intellectual diversity is healthy and is a sign that the field of metascience is maturing.

Why a map matters

We are in a critical window for R&D reform. The Trump administration’s proposed funding changes and reorganizations to federal science agencies create both pressure and opportunity to revisit the structure of federally funded research programs. Multiple kinds of metascience evidence can inform that work, and policymakers are recognizing it — expressing demand for economic evidence to drive system reform, R&D management knowledge to ensure reforms actually work, and examples of new institutions they can use in legislation and program design.

NSF’s recent launch of X-Labs is an indication of that appetite. X-labs will award institutional block funding to independent organizations with operational autonomy, with milestone-based awards to encourage team-based, infrastructure-heavy, and long-horizon approaches to research. This program structure was informed by metascience and is a new experiment in portfolio diversification.

We are also excited to see the evidence-generating and evidence-translating communities depicted in this map continue to grow. We believe the most durable reforms tend to draw from several perspectives at once. For example, a reform inspired by new institution existence proofs (institutional entrepreneurs), backed by causal evidence (innovation economics), embedded in an organizationally realistic design (R&D and innovation management) is more likely to survive first contact with reality than one informed by any single tradition alone. Building more connective tissue between metascience communities would help ensure that our perspectives and methods do not stay siloed and that our reforms are more robust.

Building the connective tissue

Some funders have already prioritized evidence-building into their research programs, but this remains all too rare. Even the Congressional mandate to have an evidence-to-practice pipeline at Federal Agencies, including research agencies, under the Evidence Act did not produce a swell of metascience evidence.6

One of us (Jenn) experienced the dynamic of adapting evidence to program design firsthand while leading early-stage innovation programs at NASA. For program managers in the federal government, the demands of satisfying institutional requirements and culture while confronting resource limitations (people, IT, etc.) can make implementing changes to program design feel like hand to hand combat at best and insurmountable at worst. It helps when there is external evidence to point to, but that is often insufficient when proposing changes to institutional norms could introduce protest and oversight risk, create confusion for proposers and reviewers, and not pay off for many years.

In short, even if they want to, most program managers don’t have the time or the institutional cover to implement a routine evidence-to-practice pipeline that could in turn provide real-world experimental evidence.

Further, program managers often have their own hypotheses about program design and impact they’d love to evaluate. But they either are unaware of the metascience community eager to study these same questions or face significant barriers to working with the researchers (data access restrictions, timing of research findings not aligning with program reform windows, etc.).

Our map of the metascience landscape can help agencies recognize that they don’t need to embark on evidence production alone. For example, questions they’re already required to ask under the Evidence Act have intellectual homes — communities with methods, scholars, and accumulated evidence that could dramatically improve the quality of their answers.

At the same time, better coordination with the external metascience community cannot fully substitute for internal metascience capacity. Agencies’ experience with the last six years of learning agendas and evaluation plans developed under the Evidence Act demonstrates this: even when agencies have identified the right questions, they have lacked the in-house staff with metascience expertise to design and run high-quality evaluations, and the most policy-relevant data (proposals, reviewer scores, panel deliberations, and selection decisions) often cannot be shared with outside researchers for confidentiality and statutory reasons.

We believe that an important first step to drive more metascience understanding and improve the social ROI on R&D is to institutionalize metascience capacity inside the government through dedicated metascience units. In 2024, the UK established a Metascience Unit and the Administration’s FY 2027 budget request included a proposed Metascience Unit at NSF, which is a good start. These units should draw on research questions, methods and external expertise from across the metascience landscape.

Policymakers are hungry for evidence-informed ideas and answers. If we applied even a fraction of the funding we deploy for object-level science to studying what works and implementing those practices at scale, the payoffs could be enormous. Science needs more evidence about itself, and there is a broad metascience landscape poised and ready to help.

This is even more true for early stage investments in research where market failures and externalities disincentivize private sector investments.

For example, researchers spend an estimated 44.3% of their funded time on “meeting [administrative] requirements rather than conducting active research,” which has been increasing over time; peer review processes hundreds of thousands of proposals annually at substantial cost; and the federal R&D portfolio is heavily concentrated in a single funding mechanism (project-based grants).

Metascience community members Buck & Khanduja acknowledge this confusion and have suggested that to “clarify things, we should think of metascience at multiple levels, just like economics.”

These six camps are not all the same kind of thing: some are built around diagnoses, some around methodological traditions, and some around theories of institutional change. We group them because each offers a unique hypothesis about where reform should intervene and what kind of evidence should count. They are more like intellectual homes than closed tribes.

A recent publication in Nature reanalyzed the original data underlying 100 published studies across the social and behavioral sciences using whatever analytical approach they considered best. Only a third of the reanalyses reached the same result as the original study while one fourth of the reanalyses didn’t even reach the same broad conclusion. The study reenforces that analytical choices matter to the answers you get.

Federal science agencies are not oblivious to the aims of metascience. Under the Foundations for Evidence-Based Policymaking Act of 2018, agencies are required to identify priority questions about their own operations and develop plans to answer them. NSF’s first learning agenda, for example, included many questions of metascience: how its peer review process works, whether eliminating proposal deadlines would reduce the burden on scientists, and what makes some funding mechanisms more effective than others. But agencies’ ability to answer these questions remains thin. Most agencies lack dedicated evaluation staff with metascience expertise, use descriptive analyses when they need causal evidence, and publish results inconsistently.

A guest post by

Caleb Watney

Managing Director of Public Policy at Coefficient Giving

A guest post by

Jenn Gustetic

Accelerating breakthrough research. 👩🏼‍💻Currently @IFP (director of metascience and R&D policy). 👩🏻‍🚀Formerly @NASA space technology executive, White House OSTP Open Innovation assistant director, Harvard Fellow. 👩🏼‍🎓 ASE @UF. Tech Policy @MIT.

Zac Hill

Jun 22

The only better thing than elite merch is elite meta-merch

Becoming Human

Jun 11

The replicability crisis is not a process problem, it is a fundamental epistemological issue, especially in “sciences” like psychology, sociology, economics, even cognitive science.

Since the early 20th century, every field of inquiry has sought to be like physics and chemistry, a form of intellectual certainty-envy.

So each field has hemmed itself within analytic frameworks where only calculable factors are accepted in the academy, and most if not all does not yield to computation due to confounding, emergence, etc.

We have to stop pretending that there is any idea of certainty in non-sciences. The only places where there is that level of precision are very specific scales and systems that change over massive time periods.

Non-replicability is likely a fundamental aspect of those fields, and is exploited for tenure, it is not a thing to be fixed via review processes.

1 reply

2 more comments...

Discussion about this post

Ready for more?