Download Research Profiling – Using VantagePoint to characterize a body of

Document related concepts
no text concepts found
Transcript
Research Profiling – Using
VantagePoint to characterize a body of
research publications:
•
•
•
A series of short presentations
(“podcasts”)
Mining Web of Science data
Case example: nano-enhanced, thin-film
solar cells Cells
• Nano-enhanced
Solar Cells
Alan Thin-film
Porter
Director of R&D, Search Technology, Inc.
[& Georgia Tech]
[email protected]
Pod 1: Overview of Research Profiling &
Getting data from Web of Science
Research Profiling
1. Overview of the general process & getting
data
2. Data into VantagePoint & cleaned
3. Basic descriptors
+ (tentatively):
a)
b)
c)
d)
e)
Trends
Topical emphases & Changes
Influence Measures
Research Networking: Maps
Locating a body of research: science & geo
maps
f) Super Profiling: Breakouts
g) Advanced Analyses
Session Strategy
A. ~10 minutes per session – sequential, but you can
skip to topics of interest after the introduction
B. Aim: To stimulate your ideas on how to apply
VantagePoint to gain insights from sets of research
publications
C. This first set of sessions keys on Web of Science
(“WOS”) results with a technology topic search focus
– i.e., “what?”
D. A future set will key on WOS search results based on
searching on a given organization – i.e., a “who?”
focus
E. Case example: Nano-enhanced Solar Cells
[with special thanks to Ying Guo]
5 Stages in Mining External R&D Knowledge
1. Literature review (within research community)
2. Research Profiling: Characterizing a body of
research publication activity
• Focus on research activities
• Largely descriptive
3. Tech Mining
• Multiple data to mine
• To generate effective technical intelligence
4. Structured Knowledge Discovery
5. Literature-Based Discovery (“LBD”)
Research Profiling 1: Getting Going
A. General overview of the Research Profiling
process and its aims



Questions
Answers
Data
B. Search; download
How to do Tech Mining (or Research
Profiling): 8 steps
1. Spell out the questions and how to answer
them
2. Get suitable data
3. Search (iterate)
4. Import into text mining software (e.g.,
VantagePoint)
5. Clean the data
6. Analyze & interpret
7. Represent the information well – communicate!
8. Standardize and semi-automate where possible
Start with the questions!
Types of Questions
Text and data mining techniques are good at
addressing:
WHO?
WHAT?
WHEN?
WHERE?
Additional questions usually require more
human insight:
HOW?
WHY?
“Answers”: Innovation Indicators
• Technology Life Cycle Indicators
- e,g, growth curve location & projection
• Innovation Context Indicators
- e.g., presence or absence of success factors
(funding, standards, infrastructure, etc.)
• Product Value Chain and Market
Prospects Indicators
- e.g., applications, sectors engaged
Six information types
Technical Information
• Science, Technology
& Innovation (“ST&I”)
Databases (e.g., Web
of Science; CSCD,
Thomson Innovation)
• Internet Sources
(e.g., Googling)
•
Technical Expertise
Contextual Information
• Business, competition,
customer, policy,
popular content
Databases (e.g.,
Thomson One)
• Internet Sources (e.g.,
blogs, website
profiling)
• Business Expertise
On-line Data Sources
Cambridge Scientific Abstracts
Delphion
Dialog
EBSCOHost
Ei Engineering Village
Custom Data
Factiva
ISI Web Of Knowledge
Lexis Nexis
Micropatent
Ovid
Patbase
Questel-Orbit
SilverPlatter
STN
Thomson Innovation
Databases
Aerospace
Art Abstracts
Biobase
Biological Abstracts
Biological Sciences
Biosis
Biotechno
Business & Industry
CAPlus (AnaVist export)
Cassis
CBNB
Claims
Computer & Info Systems
Corrosion
Current Contents
Derwent Biotech Abstracts
Derwent Innovations Index
Derwent World Patent Index
Ei Compendex
EMBase
EnCompass Literature
EnCompass Patents
Energy
EnergySciTech
Engineering Materials Abstr
Envr Sci & Pollution Mgmt
ERIC
EuroPat
FamPat
Comma/tab delimited tables
Microsoft Excel and Access
SmartCharts
XML
Record/Field Tools
Focust
Food Sci & Tech
Foodline Market
Foodline Science
Forege
Frosti
FSTA
Gale PROMT
GeoRef
Global Reporter
IFIPAT
IFIUDB
INPADOC
INSPEC
IPA
ISD
ITRD
JAPIO
JICST
Kosmet
LGST
MATBUS
Medline
METADEX
Mgmt and Org Studies
Micropatent Materials
Mobility
NSF Awards
NTIS
Pascal
Patent Citation Index
PCT
PCTPAT
Phin
Pira
Pluspat
PROMT
PsycINFO
PubMed
Rapra
Recent Refs
Reference Manager
Science Citation Index
SciSearch
Scopus
Tech Research
ToxFile
Transport
USApps
USPat
Waternet
WaterResAbs
Web of Science
WeldaSearch
Wisdomain
Combine duplicate records
Remove duplicate records
Create “frankenrecords”
(merge records from
dissimilar sources)
Classify records
Merge fields
Clean up fields
Apply thesauri
A wealth of
diverse
information
sources for
innovation
management
VantagePoint Import Filters and Tools
Management Issues
Requires Access to External
Information (License)
• Bulk Processing is a must
• Download in electronic form
• Requires competence in searching
Case Examples
Getting to
the data
- usually via
internet
Case Examples
Getting
the data
- search
within
databases
Case Examples
Retrieving the data
Resources
• www.theVantagePoint.com – offers multiple papers and
some case analyses
• View the VantagePoint Video Tutorial Series by Paul
Oldham on the website, especially Sessions 1, 2 & 3
• Tech Mining by Alan Porter and Scott Cunningham, Wiley,
2005.
• Porter, A.L., Kongthon, A., Lu, J-C., Research Profiling:
Improving the Literature Review, Scientometrics, Vol. 53, p.
351-370, 2002.
Pod 2: Cleaning the Data in VantagePoint
Research Profiling
1. Overview of the general process & getting
data
2. Data into VantagePoint & cleaned
3. Basic descriptors
+ (tentatively):
a)
b)
c)
d)
e)
Trends
Topical emphases & Changes
Influence Measures
Research Networking: Maps
Locating a body of research: science & geo
maps
f) Super Profiling: Breakouts
g) Advanced Analyses
Getting the data into VantagePoint
1.
2.
3.
4.
5.
Open VantagePoint
File > Import Raw Data File
Import Wizard opened:
Select Files
Select a suitable import filter
> Next
Select fields to import
- maybe Secondary Fields too
- you can later
“import more fields”
Case Examples
Summary
Sheet
VPT file
- Fields available
- Counts
- Coverage of
record set
“Right-Click” to
- set data type
- rename
- view statistics
- etc.
Search Refinement
• Confirm your search boundaries: time, geographical,
institutional
• Check your search quality
 Precision – how much noise did you retrieve?
 Recall – what did you miss?
• Check in VantagePoint
 Are you finding researchers and organizations you expect?
 Topical inclusion – especially check key terms
– Keywords (authors)
– Keywords Plus (based on recurring phrases in the titles of papers
referenced by the documents you’ve retrieved)
– Title NLP (Natural Language Processing) phrases
– Or a combination of these (use “Merge Fields”)
 You may well identify terms to try out in your WOS search
• Ask knowledgeable technical folks to review and advise
• Redo your search and download
Data Cleaning
• Just pointers here
• Fields > List Cleanup – Window opens
 Select field
 Select “.fuz” to apply: e.g.,
–
–
–
–
Organization Names.fuz
Person Names.fuz
General.fuz
BritishAmericanSpelling.fuz
 Option: Verify matches w/another Field
[e.g., Person Names with Author Affiliation]
• Fields > Thesaurus – Window opens
 Select field
 Select “.the” to apply: e.g., provided by Search Technology:
– Country.the
– AcadCorpGov.the
 Or select custom thesauri: e.g.,
– Azerbaijan Natl Acad Sci name variations in WOS.the
Whew!
• Remember to check your search coverage.
• Redo a refined search as needed
• Import and clean your data as warranted
• And the next podcast will get us into Research Profiling!
• Basic Descriptors coming up next
Pod 3: Dealing with single fields:
Getting set to work with Lists
Research Profiling
1. Overview of the general process & getting
data
2. Data into VantagePoint & cleaned
3. Basic descriptors
+ (tentatively):
a)
b)
c)
d)
e)
Trends
Topical emphases & Changes
Influence Measures
Research Networking: Maps
Locating a body of research: science & geo
maps
f) Super Profiling: Breakouts
g) Advanced Analyses
Research Profiling Segment 3:
“Basic descriptors”
A. Data prep – getting the target fields
(variables) all set
B. “Top N” lists and such
[single field tallies across the record set]
Nano-enhanced Thin-film Solar Cells
Analysis of Global Research Activities
with Future Prospects
Ying Guo
Ph.D. Candidate, Beijing Institute of Technology
Visiting Student, Georgia Institute of Technology
Alan L. Porter
Lu Huang
International Association for Management of Technology, 2009
Data Prep (1)
1. If you have refined your search, re-import
2. Clean -- as suitable to meet your objectives,
for basic descriptors, especially check:
a.
Publication Years [year.the available, but Web
of Science data are usually clean]
b. Countries [apply country.the]
c. Affiliations [organization names.fuz]
d. Authors [person names.fuz; potentially “verify
matches with another field” – use Affiliations
to help disambiguate names]
3. If you are apt to deal with a topic in the
future, save List Cleanup results as your own
topical thesaurus.
Data Prep (2)
1.
Topical fields
a.
b.
2.
Make Macro-disciplines from Subject Categories
[not a standard VP thesaurus, but we plan to make
available on our new academic website]
Keywords: decide if you want to MERGE some
combination of: Keywords (author’s) & Keywords
Plus & Title (NLP) phrases & Abstract (NLP) phrases
Keyword Clumping options
a.
b.
Human: Scan the combo Keywords field of choice;
make groups of interesting terms using FIND
Statistical: After a little pre-cleaning, use Factor
Mapping to form groups of the top %’s [e.g., 1%, 2%,
5% of records]; examine their performance; pick the
best level to get at topical emphases
Top N’s
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
(Document types)
(Publication Years)
(Times Cited)
Countries
Affiliations
Funding agencies
Authors
Journals (or Sources)
Key terms
Subject Categories
Macro-Disciplines
Organization Types
Top N’s
1. Pick your output venue(s) – e.g., in VP and/or
MS Excel, Word, Powerpoint
2. Decide if normalization is in order
a. % of All (or something else)
b. Across databases or datasets
c. Table or Figure
DONE! Research Profiling Segment 3:
“Basic descriptors”
A. Data prep – getting the target fields
(variables) all set
B. “Top N” lists and such
[single field tallies across the record set]
 Fields from the dataset
 Derived fields
Up next in Segment 4:
•
•
•
2 Fields together (matrices)
Trends
Discerning “Hot and New” topics
Pod 3+: VP Help & Interactions/Exercises
Research Profiling – Using
VantagePoint to characterize a body of
research publications:
•
•
•
A series of short presentations
(“podcasts”)
Mining Web of Science data
Case example: nano-enhanced, thin-film
solar cells Cells
• Nano-enhanced
Solar Cells
Alan Thin-film
Porter
Director of R&D, Search Technology, Inc.
[& Georgia Tech]
[email protected]
Help!
1. VantagePoint Help
2. Analyst’s Guide
Interacting
1. Discuss uses of VantagePoint to answer your
research profiling questions
 If you are together in a real or virtual group,
discuss materials presented
 Here’s a starter question (next slide)
2. Perform hands-on exercises
Interactive Ideas/Exercises
1. What “MOT” (management of technology, or
technology policy, or research opportunity)
questions might you want to answer from a Web of
Science dataset?
[next slides illustrative]
IAMOT 2009
For S&T Policy Maker and Manager:
• What are national R&D strengths and weaknesses?
• What is the existing status and what about forecasting
likely future developments for thin-film solar cells?
• How to gauge relative opportunities for collaborative
development, as well as monitor emerging competitors?
Who
MOT
What
When
By
Data Mining
Technology
Where
Why
How
Global
Research Activities with
Our Paper
Future Prospects
Need more experts’ inputs (we’re
working on this)
IAMOT 2009
We look at:
1. What research fields are involved?---map of science
2. quantity---publication numbers and trends
3. diversity---national contrasts
4. quality---citations
5. patterns of research networking---using VantagePoint
6. “Hot” nano-materials
For data:
a global dataset of
nano publications
downloaded from
the SCI
Basic Dataset
defined “thin film
and (solar or
photovoltaic)” as our
search expression
Search Expression
acquired the dataset
containing 1659
records for time
period from 2001 to
mid-2008
Result Dataset
Interactive Ideas/Exercises
2. Search on a topic with colleagues; consider how to
refine your search
•
•
•
Import preliminary search results into VP
[do you have the right import filter?]
Scan key terms, Subject Categories, etc. to check
coverage and identify ways to enhance your search
Refine and rerun the search if warranted and time
permits
Interactive Ideas/Exercises
3. Given your MOT questions, what data cleaning is in
order?
•
•
•
Step through cleaning actions for each key field
Apply suitable “List Cleanup” (using appropriate
“.fuz” files)
Apply thesauri as suitable (“.the” files)
Interactive Ideas/Exercises
4. A possible exercise: Thesaurus enhancement
•
•
•
•
•
Run the AcadCorpGov.the on your cleaned
Affiliations field [get rid of existing groups]
On that resulting field, “Create Group Using
Thesaurus” using this same “.the” file.
Select “Group for Each Alias.”
Research (e.g., Google) & assign some of the
multiply-occurring organizations to one of the 4
groups.
“Create thesaurus using groups”; select all 4
groups; save as AcadCorpGov-new date.the
Run it as thesaurus; run it to create groups.
Interactive Ideas/Exercises
5. A Web of Science Key Terms exercise
•
•
•
•
•
•
Merge fields (candidates include Keywords-Author; KeywordsPlus, Title NLP phrases; Abstract NLP phrases)
Apply general.fuz
Apply stopwords.the
Make your own “interesting” key terms set
• Scan for an interesting term; use FIND with “select all” and
make a GROUP of variations of that term
• Repeat for several interesting terms, making more groups
• Create a new Field from Group Names
Use Factor Map to statistically make a key terms set
• Make a group in the Key Terms field – selecting interesting
terms appearing in, say, >1% of the records
• Run Factor Map – then check out the resulting term
grouping (in a new Key Terms field created)
Compare the two key term sets – either useful?
Interacting
1. We’ll insert more candidate exercises as we
proceed, without great elaboration – use as you
choose
2. Now, back to the show
Pod 4: Matrices
Nano-enhanced Solar Cell Web of Science Subject
Category Concentrations of the Leading Countries
USA
India Germany Japan China
Materials Science,
Multidisciplinary
126 132
83
68
63
Physics, Applied
Physics, Condensed Matter
Chemistry, Physical
Energy & Fuels
Materials Science, Coatings
& Films
112
59
82
26
24
92
80
28
16
26
68
47
34
9
17
53
46
32
10
21
56
72
26
49
21
Acad-Corp-Gov Publishing
by Country
Cross-national Collaboration
USA
India
Germany
Japan
China
France
UK
South
Korea
Mexico
Spain
%
International
Cooperation
(among top
10)
USA
20.1%
26.4%
27.1%
India
Germany Japan China
France UK
South Mexico Spain
Korea
288 5
5
239
16 4
16
4
195
5
15
10
6
5
4
8
3
5
8
9
20
1
8
10
24.2%
10.4%
24.8%
34.5%
52.2%
5
6
5
3
9
15
10
2
8
8
1
182
4
2
5
2
4
182
2
2
1
2
2
113
4
5
2
4
84
1
2
1
1
2
38.5%
17.5%
8
1
10
1
2
4
5
20
4
2
1
69
2
3
1
1
4
3
1
2
65
2
2
63
Matrix-related Topics covered
in VantagePoint
• Matrix Viewer
•
Multiple visualizations available
• Activity-Diversity
•
Scattergram for one variable based on 2 others
• Aduna Clustering
•
•
Colorful visualization of intersecting sets (e.g., coauthoring)
Capability to zoom to records at those intersections
(extending to >2-way connections)
Pod 5: Trends
Trends
1. Decide if normalization is in order
a. Over time [rate of change]
b. Most recent year
2. Decide if comparative analyses are in order
a. What/who are the benchmarks?
b. How do you want to present your results?
DSSC research by
organization type (from SCI)
# of author
affiliations/paper for DSSC
publications (SCI)
Nano-Structured ZnO Thin-film Solar Cells Publication by Countries and Years
14
China
12
India
10
Japan
USA
8
Mexico
6
Germany
China and India
are notable!
4
China
Japan
2
Mexico
South Korea
0
2001 2002
2003 2004
2005 2006
France
2007
South
Korea
Spain
France
IAMOT 2009
100%
90%
France
80%
Spain
70%
South
Korea
Germany
60%
50%
Mexico
40%
USA
30%
Japan
India
20%
China
10%
0%
2001
2002
2003
2004
2005
2006
2007
Nano-Structured ZnO Thin-film Solar Cells Publication: Top 10 countries by Years
– note the increasing share for India & China
DSSC Publications (SCI) with % 2006 or later
Share of Nano-enhancedThin-film Solar Cells Publications by Countries
[Science Citation Index, 2001-08 (part-year)]
0.25
0.2
2001
0.15
2003
2005
0.1
2007
0.05
0
USA
India
Germany
Japan
China
Projecting Nano-enhanced
Solar Cell Research Activity
Actual data
Projected data
Research activity and impact characteristics—First Way
qualit y -# of c it at ions
2000
USA
1500
1000
UK
500
0
0
South Korea
Mexico
France
Spain
50
100
150
Japan
Germany
India
China
200
250
300
350
ac t iv it y -# of rec ords
• Nodes above the diagonal suggest relatively higher quality (US and UK).
Below the diagonal, the closer to the diagonal, the higher the quality of that
country’s research.
Research activity and impact characteristics—Second Way
200
# of A ged* Cit at ions , 2001 and 2006
Year denoted by s tar t and end
180
points
• The steeper the slope of
US
2006
2001
the line connecting these
160
two points, the greater the
140
increase in quality of the
120
country’s research on this
100
topic
80
• Compared with Japan and
60
Germany, China and India
India
China
40
are upgrading!
Germany
20
J apan
0
0
10
20
30
40
# of Rec ords , 2001 and 2006
50
60
Pod 6: “Hot topics”
Research Profiling – Using
VantagePoint to characterize a body of
research publications:
•
•
•
A series of short presentations
(“podcasts”)
Mining Web of Science data
Case example: nano-enhanced, thin-film
solar cells [Ying Guo, Lu Huang & me]
Cells
Alan Porter
• Nano-enhanced
Thin-film
Solar
Cells
Director of R&D, Search
Technology,
Inc.
[& Georgia Tech]
[email protected]
“Hot” topic as shown by relative trends
ZnO attracts increasing attention in recent years and is on trend to catch up with TiO2
ratio-recent # Records
1.14
47
0.85
74
0.85
61
0.74
66
0.65
28
0.53
72
0.52
94
0.50
48
0.48
49
0.46
51
0.41
65
0.36
49
0.32
37
0.29
102
0.28
92
0.24
21
0.22
39
0.17
21
0.00
37
0.00
22
0.44
Top 20 Key Terms
conjugated polymer
fabrication
TiO2
chemical vapor deposition
amorphous silicon
morphology
semiconductor
fullerene
zinc oxide
microstructure
spray pyrolysis
heterojunction
CdTe
electrodeposition
CuInSe2
anatase
chemical bath deposition
Cu(In
sol-gel
photoconductivity
Top 20 Key Terms combined
Ratio of
Occurrences
2007-08 to those
in 2001-06
New Topics via List Comparison
• Create VP sub-dataset for the recent nanoenhanced solar cells publications (new VP file –
I used 2007-08)
• Create VP sub-dataset for the earlier
publications (I used 2001-06)
• Under GROUPS, choose LIST COMPARISON;
I did so from the select keywords list (82) for
2007-08 and made a new group of those unique
to this dataset in comparison to the earlier one.
• Results: “characterize” and “deposit” are the 2
novel ones
[Warrants in-depth probing to check if these are
meaningful]
Key Terms by First Year
New Key Terms Recently
Year
2005
Records 225
New
Terms 3
device
[8 of 54]
TiO2 film
[8 of 29]
cD
[5 of 27]
2006
2007
2008
334
372
174
2
nanocrystal
[10 of 25]
2
DEPOSIT
[37 of 52]
0
room temperature CHARACTERIZE
[4 of 24]
[25 of 25]
Recent Entrants
• We need not restrict the temporal comparison to
key terms or topics
• Same modus operandi can be applied to identify
new or recent entrants to the research (e.g., first
papers on the topic from a given organization)
• Another variant is the inverse – to look for
which participants seem to have abandoned the
topic (no publications since Year X)
Pod 7: Maps
Visualization (Maps)
1. VantagePoint Maps



Auto-correlation maps
Cross-correlation maps
Factor maps
2. Social Network Analysis (SNA)
3. Science Overlay Maps
4. Geo-mapping
Auto-Correlation Maps
NETFSC Research networking comparison
USS (dispersed) vs Germany (1 central organization)
USA
Germany
Auto-correlation vs. Cross-correlation
Nano-enhanced Solar Cells Country Research Networks
Factor Map (Principal Components Analysis) –
groups terms based on their tendency to co-occur across records
Social Network Analysis (SNA)
• VantagePoint offers several application opportunities
 Create a sub-dataset for a given country or organization
 Within that target group, for the given research topic, explore
research network connections
• Examples
 Collaborations
 Shared interests
 Discrepancies between interests & collaboration
• Working with Pajek adds options
 Calculation of networking statistical measures (e.g.,
centrality)
 More mapping nuances
Science Overlay Map [see: www.idr.gatech.edu – includes “how to make
your own map” and full citations]
Agri Sci
Geosciences
Infec tious Diseases
Ecol Sci
Env Sci & Tec h
Chemistr y
Clinical Med
Energy & Fuels
Biomed Sci.
Chemistry, Physical
Health Sci
Cognitive Sci
Materials Science,
Coatings & Films
Physics, Applied
Mtls Sci
Materials Science,
Multidisciplinary
Engr Sci
Physics, Condensed Matter
Computer Sci
Physics
Nano-Thin -Film Publications 2001-08 Distribution
Ov erlay ov er base 175 Subject Category Science Map
Ley desdorff &Raf ols (Forthcoming) –
Nanotechnology Thin-film Solar Cells Publications by Research Field
Science Overlay Mapping
1. Start with Web of Science file in VantagePoint
•
•
Map the Subject Categories or
Cited Subject Categories (somewhat complicated process)
•
•
•
•
Special import filter to extract cited source titles
Applies a special Find/Replace thesaurus to those to make titles
more standardized (e.g., J vs. Jnl vs. Journal)
We then apply a special macro that uses a Journal-to-Subject
Category thesaurus to get Cited Subject Categories (“SCs”)
Output a vector file of SCs or Cited SCs
2. In Pajek
•
•
•
Select the SCI (175 SC) or SCI+SSCI (221 SC) base map
Edit your map (e.g., change node size)
Output in desired format (e.g., jpeg)
3. In MS Powerpoint
•
Overlay on the appropriate base map
4. Or, go to www.idr.gatech.edu/ -- select “Upload Map”
Geomapping
Geo-map: Nano-enhanced Solar Cells – European Institutions >=10 papers
Pod 7+: Activities for Matrices, Trends,
Hot Topics & Maps + … “SuperProfile”
Research Profiling
Interactions/Excercises for Matrices, Trends
& Hot Topics
**The following exercises may be
downloaded at
http://www.thevantagepoint.com/webinars.cfm
Alan Porter
Director of R&D, Search Technology, Inc.
[& Georgia Tech]
[email protected]
Interactive Ideas/Exercises
6. Matrix Fun & Games
•
In VantagePoint, on your dataset, make a matrix of interest
•
•
•
•
•
Try out matrix operations
•
•
•
•
Relate analytical possibilities to spell out what MOT questions
these could help answer?
One family of matrices involve Time (e.g., Year) vs. another
variable [“When vs. …]
Another family involves Topic (e.g., Key terms, Subject Categories)
vs. Performer (e.g., Country, Affiliation, Author) [“What vs. Who”]
An important matrix type entails a variable vs. itself (e.g., Author by
Author; Country by Country)
Flood the matrix to different degrees [use the Up & Down bars in
the upper left corner cell (headings by headings cell)
Open detail views to explore a group of cells together; select an
entry in a detail view to see the records to which it pertains in the
title view
Paint groups of cells; then re-sort
Address one or more MOT questions via your matrix content
Interactive Ideas/Exercises
7. Matrix Viz
•
•
•
In VantagePoint, with your matrix open, run the MatrixViewer
script. [If the view is too cluttered or not interesting, make a
more suitable matrix, possibly by creating a group on a
particular variable to select key entities.]
Try different “Layouts”; select and move entities in the viewer
Export the most interesting layout to file.
Interactive Ideas/Exercises
8. Activity-Diversity
•
•
•
•
Make a group of Top Affiliations in your dataset [experiment with
this – maybe start with an interesting 15-20]; create a field from
group items.
Open the Activity-Diversity Scatter 3D script; select that field to
plot; select the field to measure Diversity (e.g., Subject
Categories; Affiliations); select your minimum; try a Graphic
Size.
Say “yes” to “make changes to this chart” – and try out various
sizes, axis formats, font and label angles – to get a plot you like.
[Hint: You can keep redoing – but you can’t edit once you say
‘no.’]
Interpret – what can you say about differences in research
focus?
Interactive Ideas/Exercises
9. Aduna Clustering
•
•
•
•
•
Create a sub-dataset for a country of interest; save the VP file.
Create a “top n” (e.g., 10-30) affiliations group in that country
dataset.
Run the AdunaClusterMap macro for that group
Do you spot any interesting inter-institutional collaborations?
- any collaborations involving more than 2 organizations?
Consider whether such cluster maps could address your MOT
issues
•
•
At a higher level (inter-country collaboration investigation)
At a lower level (co-authoring patterns)
Interactive Ideas/Exercises
10. Plot Matrix (for Trend)
•
•
•
•
•
•
In your VP Summary sheet, check if you have “Number of
Authors” [alternatively, “Number of Affiliation (name only)”]; if not
import (they may be secondary fields in the Web of Science
import filter)
Make a matrix of Number of Authors by Publication Year
Sort; select all values except the last year.
Run the PlotMatrix script
Examine the resulting plots in MS Excel; pick one you like, or
make another (like the colorful plot of affiliations by year in Pod
5)
Interpret
11. Hot and New
•
List Comparison
•
•
•
•
•
•
Interactive Ideas/Exercises
Pod 6 illustrated use of “List Comparison” to hunt for new terms in
recent years; try your own version.
Pick a suitable set of key terms. If these are a subset of a large
field, it may be handy to make a new field of just those terms (e.g.,
by using “Group” capabilities)
Break your data set to give “recent” and “earlier” based on
publication years; create new Sub-datasets.
Under the “Groups” menu, select “List Comparison”; compare the
same key terms field in the 2 sub-datasets. Start with “Unique”
and explore what may be of interest. [Expect lots of noise, but
some interesting “new” to discover.]
Try out “List Comparison” for other purposes – e.g., compare two
organizations for relative emphases.
Expectancy Values
•
•
Open your Publication Years field. Show your key terms of interest
in a Detail Window [see next slide]
Sort in the Detail Window on the Expectancies (terms with triple or
double Up arrows are quick candidate “HOT” topics)
Another Way to get at Hot
Topics
Interactive Ideas/Exercises
12. Tracking Term Appearance: Terms by Year
•
•
•
Pick a terms field (e.g., “Keywords (author’s)” – but check record
coverage
Open the Terms by Year macro and run for “First Year,”
including Summary report in Excel
Examine the resulting VP list – sort by successive years and see
if you can spot a set of potentially interesting “new in Year X”
terms for recent years
Mapping
1. Pod 7 introduced 3 types of VantagePoint
maps + a couple of maps that begin with VP
analyses, extending to use of other software
2.
No separate exercise for Factor Maps in VP here
– adapt the ideas presented in Pod 7 to large
term sets and try out yourself.
No separate exercises for:
3.


Science overlay maps [Pod 7 points to a helpful
website to make your own maps from Web of Science
Subject Category lists]
Geo-mapping – Pod 7 presented to illustrate
possibilities [there are other ways to create geo-maps
from Web of Science affiliation information, processed
thru VP, working with mapping software]
Interactive Ideas/Exercises
13. Correlation Maps in VantagePoint: Collaboration Patterns
within an Organization
•
•
•
•
•
•
Select the target organization; create a sub-dataset for it
Open the authors LIST; create a group of interesting authors
(e.g., top 15)
Open the Mapping Wizard; Create an auto-correlation map
Then go back to the Wizard and Create a cross-correlation map
for those same interesting authors; select a topic field (e.g., key
terms or Subject Categories)
Compare the maps – open a couple of Detail Windows to
explore what is going on – similarities? Differences?
Right-click in a map – explore the various options – especially
“Edit Preferences”
•
•
•
Change the threshold for showing links
Change the canvas size
Change the font size
Interactive Ideas/Exercises
14. SuperProfile! [really versatile ‘research profiling’ tool – provides
“breakouts” for a set of entities to show other field values]
• From the Scripts menu, select SuperProfile
• Pick a field (or group) that you would like to profile (e.g.,
Country, Subject Category, Publication Year, Highly Cited
papers); make selections as the Wizard poses them
• In the “Browser” then – Pick Column Type (e.g., Top Items);
Pick Field (e.g., Subject Category); Pick # (e.g., how many
Subject Categories to list out); Pick minimum # to include (the
“Remove items” option); Pick output type – sheet is in VP; try
Excel); Add to Profile.
• Pick another – Column Type (e.g., another “Top Items” type
field) – or let’s try “Percent Recent-Database”; Pick field
(Publication Year); Pick # of years to use as “recent”; Add to
Profile
• Check the MS Excel results; if not quite what you want, redo; if
they are what you want, edit for appearance.