For decades, the impact of a research paper was largely synonymous with the journal it appeared in. The Journal Impact Factor (JIF) reigned as a proxy for quality, and researchers optimized submissions accordingly. But that monoculture is fracturing. Funders, tenure committees, and even governments are demanding evidence of broader societal impact—things like policy influence, public engagement, and data reuse. Meanwhile, the rise of altmetrics and open science has made it possible to track diverse forms of attention. The challenge now is not a lack of metrics, but how to choose and combine them without falling into old traps. This guide is for research evaluation professionals, senior academics, and lab leaders who already understand basic citation analysis and need a practical framework for selecting and defending a multi-metric impact strategy.
Who Needs to Map Impact Differently—and Why Now?
The pressure to move beyond journal rankings comes from multiple directions. In the European Union, the Coalition for Advancing Research Assessment (CoARA) has committed signatories to reform how research is evaluated, emphasizing qualitative judgments over simplistic metrics. In the UK, the Research Excellence Framework (REF) now includes "impact case studies" that account for 25% of the overall score. Similar shifts are underway in Australia, Canada, and parts of Asia. For individual researchers, the stakes are personal: promotion and grant success increasingly hinge on telling a compelling story about reach and significance, not just publishing in high-JIF venues.
But abandoning the JIF entirely is not practical for most institutions. Journal rankings still serve as a quick filter for hiring committees and funding panels, especially in fields where altmetrics are sparse. The real task is to build a layered map—one that uses traditional citation data as a foundation, then overlays newer indicators where they add genuine insight. This requires understanding the strengths and blind spots of each metric type.
Consider a typical scenario: a mid-career researcher in environmental policy has published in both top-tier ecology journals and open-access policy briefs. Her h-index is solid but not stellar, because her policy papers are rarely cited in academic journals. Yet those briefs have been referenced in three national environmental regulations and downloaded over 50,000 times. A traditional evaluation would undervalue her work. A citation cartography that includes policy citations and altmetrics would tell a different story. The question is how to construct that map systematically, without cherry-picking metrics to flatter a particular profile.
We wrote this guide because the tools exist—but the methodology for combining them is still immature. Researchers and evaluators need transparent, reproducible criteria for deciding which metrics to include, how to weight them, and how to communicate the results. That is what we aim to provide.
The Metric Landscape: Three Approaches to Mapping Impact
No single metric captures the full reach of a research output. Instead, we see three broad strategies that evaluators adopt, often in combination. Each has distinct trade-offs.
Approach 1: Traditional Citation Metrics (JIF, h-index, Field-Weighted Indicators)
This is the baseline. Journal Impact Factor, h-index, and field-normalized metrics like the Field-Weighted Citation Impact (FWCI) from Scopus remain widely used. Their strengths are familiarity, availability, and a long track record. They also correlate reasonably well with peer review in many disciplines. However, they suffer from well-known biases: they favor English-language journals, reward incremental work over risky or interdisciplinary research, and can be gamed through self-citation or citation cartels. Moreover, they tell you nothing about who is citing the work or why. A paper might be cited hundreds of times—but only by a small group of collaborators, or for negative reasons.
Approach 2: Altmetrics and Engagement Indicators
Altmetrics capture online attention: mentions in news outlets, social media shares, blog posts, policy documents, Wikipedia citations, and downloads from repositories like PubMed Central. Tools like Altmetric.com and PlumX aggregate these signals. Their main advantage is speed—they accumulate within days of publication, unlike citations which can take years. They also capture non-academic audiences, which is critical for demonstrating societal impact. But altmetrics are noisy. A viral tweet does not equal a meaningful change in practice. They are also platform-dependent and can be manipulated (e.g., bot-driven shares). For most fields, altmetrics supplement rather than replace citation data.
Approach 3: Policy and Practice Impact Metrics
This category includes citations in policy documents (tracked by Overton or Altmetric), references in clinical guidelines, and mentions in patents. These are especially valuable for applied fields like public health, engineering, and environmental science. They directly link research to decisions that affect people's lives. However, coverage is patchy across disciplines, and the lag between publication and policy uptake can be long. Furthermore, not all policy citations are equal—a brief mention in a background document carries less weight than being cited as evidence for a regulation. Evaluators need to look beyond counts and examine the nature of the engagement.
How to Choose the Right Metric Mix: Decision Criteria
Selecting metrics is not a one-size-fits-all exercise. The right combination depends on your evaluation purpose, the discipline, and the career stage of the researcher. We recommend applying the following criteria.
Purpose Alignment
What decision will the metrics inform? For tenure and promotion within a traditional department, a balanced portfolio of citation-based and altmetric indicators is often expected. For a grant application focused on societal impact, policy citations and engagement data should carry more weight. For internal benchmarking, field-normalized indicators help correct for disciplinary differences. Always start with the question, not the metric.
Disciplinary Norms
In the humanities, book citations and reviews matter more than journal articles. In computer science, conference proceedings are primary. In the social sciences, policy citations are increasingly tracked. A metric that works in biomedicine may be meaningless in philosophy. Use discipline-specific databases when available (e.g., the Book Citation Index for humanities). If you are evaluating a cross-disciplinary team, consider using multiple normalized indicators and avoid averaging them.
Data Quality and Coverage
Not all metrics are equally reliable. Altmetrics from social media can be skewed by a single influencer. Citation counts from Google Scholar are less curated than those from Scopus or Web of Science. Policy citation databases have limited coverage of non-English documents. Before committing to a metric, verify that the source data is comprehensive for your field and that the provider is transparent about its methodology. Avoid metrics that are opaque or change their calculation without notice.
Researcher Career Stage
Early-career researchers (ECRs) often have low h-indices and few policy citations simply because they have not had time to accumulate them. Using the same metrics for ECRs and senior professors is unfair. For ECRs, consider relative indicators like percentile rank within their cohort, or qualitative evidence such as prize nominations and invited talks. For senior researchers, long-term citation impact and mentorship metrics become more relevant.
Trade-Offs in Practice: A Structured Comparison
To illustrate how these criteria play out, we compare three common metric portfolios across several dimensions. The table below summarizes the trade-offs.
| Portfolio | Strengths | Weaknesses | Best For |
|---|---|---|---|
| JIF + h-index + FWCI | Familiar, easy to compute, widely accepted | Ignores societal impact, rewards incremental work, field biases | Internal benchmarking in traditional departments |
| Altmetrics (Altmetric Attention Score) + PlumX | Captures public engagement, fast, good for social media | Noisy, platform-dependent, can be gamed | Public engagement evaluations, media campaigns |
| Policy citations + Patent citations + Altmetrics | Direct evidence of real-world use, aligns with impact agendas | Patchy coverage, long time lags, harder to compare across fields | Applied research, policy-oriented grants, REF impact cases |
In practice, most evaluators combine elements from all three portfolios. For example, a promotion dossier might include a JIF-based section for the candidate's top five papers, an altmetric summary for public engagement, and a narrative section on policy influence supported by Overton screenshots. The key is to be explicit about why each metric is included and what it is allowed to indicate.
A common mistake is to treat the composite score from a tool like SciVal or InCites as a definitive ranking. These platforms offer many normalized indicators, but they still rely on citation data only. They do not capture the qualitative context that a reader would get from reading the actual policy document or news article. Always pair quantitative metrics with qualitative evidence, such as a brief annotation explaining the nature of the engagement.
Implementation Path: Building Your Impact Map
Once you have chosen your metric mix, the next step is to operationalize it. We recommend a phased approach.
Phase 1: Audit Existing Data Sources
Identify which databases your institution already subscribes to—Scopus, Web of Science, Google Scholar, Altmetric.com, Overton. Check whether they cover your researchers' disciplines adequately. For policy citations, Overton has strong coverage for English-language policy documents but is weaker for non-English ones. If your researchers work in a non-English context, consider supplementing with manual searches of government websites.
Phase 2: Define a Metric Protocol
Write a short document specifying which metrics will be used for which purpose, how they will be calculated, and what thresholds (if any) apply. For example: "For promotion to associate professor, candidates must provide their h-index from Scopus, the FWCI for their top 10 papers, and a narrative of up to three policy or practice impacts with evidence." This protocol should be reviewed by a committee to ensure fairness and transparency.
Phase 3: Collect and Curate Data
Assign a research support officer to gather the data. For altmetrics, use the DOI to pull scores from Altmetric.com or PlumX. For policy citations, export a list from Overton and then manually verify the top hits. For traditional metrics, use Scopus or Web of Science author profiles. All data should be date-stamped because metrics change over time.
Phase 4: Present the Map
Create a visual summary—a dashboard or one-page report—that shows the metrics alongside a brief narrative. Avoid dumping raw numbers. Instead, highlight the most telling indicators and explain what they reveal. For example: "Dr. X's work on air quality has been cited in three WHO reports (policy impact) and has an Altmetric Attention Score in the top 5% of all publications in environmental health (public engagement)."
One team we read about used a radar chart with axes for citation impact, policy influence, public engagement, and interdisciplinary reach. This gave a quick visual of a researcher's overall profile. The chart was supplemented with a table of raw numbers for those who wanted details.
Risks of Getting the Map Wrong
Choosing the wrong metrics—or applying them without nuance—can damage careers and misallocate resources. Here are the most common risks.
Gaming and Perverse Incentives
If you overemphasize altmetrics, researchers may chase viral attention rather than rigorous work. If you rely solely on the h-index, they may avoid risky, high-reward projects. The history of the JIF shows that any metric, once used for evaluation, becomes a target for optimization. Mitigate this by using a diverse portfolio and by emphasizing qualitative evidence alongside numbers.
Disciplinary Inequity
Metrics that work in one field can systematically disadvantage others. For example, the h-index is lower in mathematics than in biomedicine because of different citation rates. Using a single threshold across all departments would be unfair. Always use field-normalized indicators or percentile ranks, and adjust for career stage.
Data Quality Pitfalls
Citation databases have errors: missing publications, duplicate entries, incorrect author disambiguation. Altmetrics can be inflated by bots or by a single viral post that does not reflect genuine engagement. Policy citation databases may miss important gray literature. Before making a high-stakes decision, verify the top-cited items manually. A quick check of a few papers can reveal whether the data is trustworthy.
Overreliance on Composite Scores
Composite scores (like the Altmetric Attention Score or the SciVal Topic Prominence percentile) are convenient but opaque. They combine multiple signals with proprietary weights. If you use them, understand what goes into the score and whether it aligns with your values. For instance, the Altmetric Attention Score weights news mentions much higher than Twitter mentions. That may be appropriate for some evaluations but not for others.
Frequently Asked Questions
Should we drop the Journal Impact Factor entirely?
Not necessarily. The JIF still has value as a rough filter, especially for early-career researchers who lack a long publication record. But it should not be the sole metric. Use it alongside other indicators, and always contextualize it within the discipline. Many institutions now require candidates to provide the JIF of the journals they publish in, but also to explain why that journal was chosen and what other impact their work has had.
How do we handle self-citations and citation cartels?
Most databases allow you to exclude self-citations when calculating the h-index or FWCI. For a fair evaluation, we recommend using a self-citation filter. Citation cartels—groups that excessively cite each other—are harder to detect. Look for unusual patterns, such as a sudden spike in citations from a small set of authors. If you suspect a cartel, consider using the corrected h-index or a fractional counting method that weights citations by the number of co-authors.
What about open access and preprint impact?
Preprints and open-access articles often have higher altmetrics and citation counts because they are freely available. This is a genuine advantage, not a bias to correct for. However, when comparing researchers, be aware that some fields have embraced preprints more than others. For evaluation, include preprints if they have a DOI and are citable, but note the version (e.g., preprint vs. peer-reviewed). Some metrics platforms now track preprint citations separately.
How often should we update impact maps?
Citation and altmetric data change over time. For annual performance reviews, use a snapshot from a consistent date each year. For promotion dossiers, use the data as of the submission date. Avoid updating the map retroactively after a decision has been made, as this can introduce bias. If you are tracking impact for a grant report, consider using a rolling 5-year window for citations and a 2-year window for altmetrics.
Can we compare impact across different fields?
Yes, but only with field-normalized indicators. The FWCI from Scopus and the Category Normalized Citation Impact (CNCI) from InCites are designed for cross-field comparison. They adjust for the average citation rate in each subject category. However, they still rely on citation data only. For a truly cross-disciplinary comparison, you would need to include non-citation metrics and qualitative evidence. In practice, many institutions compare researchers only within the same department or field.
Recommendations for Your Next Steps
Moving beyond journal rankings does not mean abandoning metrics. It means using them more intelligently and honestly. Based on the framework above, here are specific actions you can take.
1. Audit your current evaluation criteria. List every metric currently used for hiring, promotion, or funding decisions. Ask whether each metric serves a clear purpose and whether it could be gamed or biased. Remove any metric that cannot be justified.
2. Pilot a multi-metric dashboard for one department. Choose a department that is open to experimentation. Collect citation, altmetric, and policy impact data for all faculty members over a defined period. Present the results in a radar chart or similar visual. Gather feedback on whether the dashboard feels fair and useful.
3. Draft a metric protocol. Write a one-page document that specifies which metrics will be used for which decisions, how they will be normalized, and what qualitative evidence is required. Circulate it for comment before finalizing.
4. Train evaluators. Run a workshop for committee members on interpreting altmetrics and policy citations. Many evaluators are unfamiliar with these data sources and may dismiss them as unreliable. Education reduces resistance.
5. Communicate the change. When you introduce a new metric system, explain why it is being adopted and how it benefits researchers. Emphasize that the goal is to capture a fuller picture of impact, not to create a new ranking. Transparency builds trust.
The cartography of research impact is still being drawn. No map is perfect, but a map that acknowledges its limitations is far more useful than one that pretends to be complete. By combining traditional citation metrics with newer indicators and always grounding them in context, you can create an evaluation system that is fairer, more accurate, and more aligned with the diverse ways research makes a difference.
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!