Download Measuring and Rewarding Novelty in Science

Document related concepts
no text concepts found
Transcript
Promoting Novelty in Science
Jay Bhattacharya
Mikko Packalen
March 2017
Acts of genius
• Notoriously difficult to forecast or promote
genius
– It can be hard to recognize at the time .
– Such ideas can occur to even the most unlikely
people.
– Active research agenda to identify acts of genius and
its correlates
• Weinberg – HIT science
• Azoulay/Zivin – Superstar scientists
• By themselves, acts of genius will rarely move a
scientific discipline forward
Promoting Acceptance of Novelty
• Every new scientific idea needs vetting by
other scientists to be successful
– Debate, extensions, combinations
• In vibrant fields, scientists try out other
scientists’ new ideas
• Dead fields chew over the same ideas
Research Agenda
• Identify structures and correlates of scientific
environments that promote the acceptance of
novelty
– Age structure of scientific teams
– Willingness of journals to publish papers where
novel ideas are used
– Openness of funders to trying out novel ideas
Words and Ideas
• The words of a publication or patent reveal
the underlying idea inputs.
• A new idea is represented either by a new
word (e.g. microprocessor) or combination of
old words (e.g. polymerase chain reaction).
• Our results are not driven by synonyms.
• Our results are not driven by young scientists
using buzzwords.
Friday-Crusoe Overlapping
Generations Model of Science
• One scientist born each period, lives two
periods
– In each period, there are two scientists, one
young, one old
• One new idea introduced each period
• Each scientist has one unit of effort to spend
working on ideas in each period
Scientific Advance (I)
• Science advances through scientists trying out
ideas
– Initial efforts yield increasing returns
– Eventually, decreasing returns kick-in after an idea
has been thoroughly explored
Scientific Advance (II)
• The variance parameter, σ, mediates the
nature of scientific advance
– If σ is near zero, a revolutionary advance happens
as total effort approaches μ
– If σ is much greater than zero, all effort produces
incremental advances
Scientist Choices
• Young scientist  choose between two ideas
to allocate effort:
– the currently new idea and the idea introduced
last period
• Old scientist  choose among three ideas to
allocate effort:
– the currently new idea, the new idea introduces
last period, and the new idea introduced two
periods ago
Costs of Entry
• To work productively on an idea, scientists
must pay an effort price of 0<k<1.
• This effort yields no scientific advance in itself
– This is a model of learning
– k is exogenous
• Key comparative static: the NIH can alter k by
subsidizing ideas or scientists
Scientist Objective Function
• Scientists maximize the total advance over
their lifetime for which their effort is
responsible
• In the current version of the model, when a
younger and older scientist work on the same
idea, they share credit for the change in total
advance that their efforts create
– In future versions, we will explore the “Matthew
effect”
Equilibrium
• Over a lifetime, each scientist plays a game
against two other scientists:
– The old scientist when the scientist is young
– The young scientist when the scientist is old
• We search for a mixed strategy Nash
equilibrium in the steady state
Comparative
Static: k
• Large k: scientists
work by themselves
on one idea their
whole life
Comparative
Static: k
• Small k: scientists
work together on
ideas
• Prediction: NIH
funding increases
trying out
Common Methodology
Comprehensive Corpus of PeerReviewed Biomedical Papers
• Medline database (1946-2011)
• 16 million+ publications
– Abstracts, titles, etc.
Indexing Text
• N-gram approach: we index
all 1, 2 or 3 word sequences
in the Medline data (mainly
titles and abstracts)
• We use a UMLS thesaurus to
handle the problem of
synonyms.
– For journal rankings and NIH
calculations only
“Organ or Tissue Function”
C0302600|angiogenic|angiogenesis|angiogenicprocess
“Amino Acid, Peptide or Protein”
C1171892|vascularendothelialgrowthfactor_A_human|vascularendothelialgrowthfact
or|_VEGFA_|_VPF_|_VEGF148_|vascularendothelialgrowthfactorahuman|vascularend
othelialgrowthfactorhuman|_VEGF_|vascularpermeabilityfactor|_VEGF_proteinhuma
n|_VEGFA_proteinhuman|vascularendothelialgrowthfactor_A_
“Pharmacologic Substance”
C0796392|_BEVACIZUMAB__UNIDENTIFIED_|_BEVACIZUMAB_|recombinanthumanize
dantivegfmonoclonalantibody|antivegf|bevacizumabbiosimilar_BEVZ92_|bevacizuma
b|immunoglobulin_G1_humanmousemonoclonalrhumabvegfgammachainantihumanv
ascularendothelialgrowthfactordisulfidewithhumanmousemonoclonalrhumabvegflight
chaindimer|antivegfmonoclonalantibody|antivegfrhumab|monoclonalantibodyantive
gf|antivegfhumanizedmonoclonalantibody|moab_VEGF_|rhumabvegf
_
Estimating the Age of an Idea
• We take the year an idea is first mentioned in
the database as its year of origin in the
biomedical literature.
– Rank ideas based on future mentions in text
• The list of ideas produced reads like a good
history of scientific advance in biomedicine,
e.g.:
– 1986 (top idea): polymerase chain reaction
– 2001 (top idea): small interfering RNA
Age of Idea Inputs for A Paper
• For each paper in the Medline data, we
calculated the distribution over the age of the
idea inputs (words and word combinations)
referenced in the text.
Examples of Ideas Identified
How Innovative is Research
Funded by the National
Institutes of Health?
Does the NIH Prioritize Novel Science?
• Compare the vintage of ideas used in NIH
funded published work vs. vintage of ideas
used in other published work
– By idea input type [e.g. genes vs. research tools]
– By time period
Approach
• Look at the age of all the ideas in a paper
• Calculate a ratio of idea mentions per paper in
NIH funded work vs. idea mentions per paper
in non-NIH funded work
• Control for number of authors, basic science
vs. clinical paper…
• Focus on cohort of paper published between
2000 and 2009.
Idea-Mentions per Paper (NIH funded
vs. non-NIH funded)
Add Controls for Basic/Applied Status,
Number of Authors
Distribution of Idea Categories
Control for Idea Category
“Gene or Genome”
“Amino Acid, Peptide, or Protein”
“Pharmacologic Substance”
Is the NIH Funding Work With More
Recent Ideas Now?
• Up to now, results have focused on paper
published between 2000 and 2009
• Compare the relative idea-use measure for
papers published in the
– 1990s
– 2000s
– 2010s
1990s
2000s
2010s
Summary
• Innovative work funded more often than
other work
• Papers building on the very newest ideas
receive less funding than papers building on
new ideas in general
– The exception is research on genetics
• Interpretation: NIH rewards innovativeness
but not the very newest ideas and less so
since 2000
Age and the Acceptance of
Novelty
• Hypothesis: early career scientists are more likely to try out new ideas
Costs and Benefits of Age
• Older researchers have vested interests in old
ideas
– Planck: “Science advances one obituary at a time.”
– Watson: “Experience kills you as a scientist.”
• Older researchers have non-research greater
demands on their time
• Older researchers have (and need) security of
tenure to pursue risky new research paths.
Meta Data
• The Medline databases contains important
meta-data about each paper.
• Medline: author(s) and author order, year
published, research area
Estimating Author Career Age
• We adapt standard author disambiguation
methods to uniquely identify each author on
each publication.
– Smalheiser and Torvik (2009) method (based on
author name, coauthors, field of publication,
language, and affiliation.
– We employ a simpler method as well based only
on author name and coauthors
Summary
• Younger biomedical researchers are much
more likely to try out newer ideas than older
ones.
• Younger researchers, paired together with
mid-career senior authors, are most likely to
try out newer ideas.
• Larger scientific teams are more likely to try
out newer ideas than smaller teams.
Ranking Journals By
Willingness
to Publish Papers That
Try Out Newer Ideas
Motivation (I): Citations Do Not
Measure Innovation
• Biomedical journals are typically ranked based
on citation counts (or some variant).
• Citations are an indifferent measure of
innovation.
– There are strongly influenced by the social
structure of a scientific discipline. (Kuhn, 1962)
– This structure does not always reward innovation.
Motivation (II): Journal Rankings are
Important
• The editorial policies of top ranked journals (in
part) direct the development of scientific
discipline.
– Of course, journals reflect scientific developments
as well.
• It is useful to have an alternate ranking that
reflects socially desirable attributes (such as
the propensity to reward innovation).
Costs and Benefits of Publishing
Innovative Articles
• Ex ante, it may be difficult to tell whether an
innovative claim is visionary or foolish.
– e.g. continental drift vs. cold fusion
• Risk aversion on the part of editors may
militate against publishing articles that are
innovative in ways that cut against the
received wisdom of the field.
Measuring the Innovativeness of
Journals
• We have already classified each published
paper based on whether it tries out a newer
idea
– In top x% of referencing a newer idea for the year
published, for x=1%, 5%, and 20%.
• We take an average of this indicator over all
papers published in each biomedical journal in
each year.
Summary
• The top journals (based on impact factor) are
also the most open to papers that try out
newer ideas
• There is considerable variation though.
– Some top journals are unlikely to publish papers
that try out newer ideas
– Some lower ranked journals are more likely to
publish such papers
Conclusion
• Science needs structures that encourage
tinkering with new ideas
• A research agenda that identifies system
properties that encourage scientific
innovation could help.
– Distinct from a research agenda that tracks inputs
into scientific genius
• Funding agencies need to more systematically
encourage trying out of novel ideas