A presentation on new bibliometric indicators such as h-index, eigenfactor, SNIP, SJR, Publish or Perish; and the use of Google Scholar and Scopus for citation analysis.
1. Bibliometrics:
From Garfield to Google Scholar
Elaine M. Lasda Bergman
University at Albany
Upstate NY SLA Spring Meeting
April 20, 2012
2. What we’re going to cover
• What is the study of Bibliometrics?
• Bibliometrics which assess entire Journals
– JIF, Eigenfactor, SNIP, SJR
• Bibliometrics assessing authors, articles,
institutions
– citation count, H-index, e-index, etc. etc. etc.
3. What is bibliometrics?
• Scholarly communication:
Eugene Garfield tracing the history and
evolution of ideas from one
scholar to another
• Measures the scholarly
influence of articles,
journals, scholars,
institutions
5. Three sources for citation data
• Citation data overlaps, but not completely
• Unique citing references in all three databases
• Unique metrics developed using each
database
– Metrics could be computed in any one of these
but most are tied to a particular source
7. What is measured?
• Journal Ranking
– “Quality” or “Importance” of journal relative to
other journals
• Usually within a given field of study
• There are many ways to measure “quality,”
“importance”
8. “Impact”
• Journal Impact Factor (JIF)
• Web of Science – Journal Citation Reports
• Basically “how fast are ideas spreading from
this journal to other publications?”
• Formula is a ratio:
Number of citations to a journal in a given year
from articles occurring in the past 2 years,
divided by the number of scholarly articles
published in the journal in the past 2 years
9. Journal Impact Factor
• Journal of Hypothetical Examples
Citing references appearing in
100 2010, to articles published in
Journal in 2009 and 2008
Total number of articles
200 in Journal published in
2009 and 2008
0.50 JIF
10. Concerns with impact factor
• Cannot be used to compare across disciplines
• Two year time frame not adequate for social
sciences, humanities
• Coverage of some disciplines not sufficient in
Web of Science
• Is a measure of “impact” a measure of
“quality”?
11. “Influence”
• Eigenfactor.org
• Web of Science: Journal Citation Reports
• Eigenvector analysis: Similar to Google
PageRank, “chain of citations”
• Takes into account the total amount of
“citation traffic” appearing in JCR
Influence of the citing journal,
Divided by the total number of citations
appearing in that journal.
12. “Influence”
• Journal Impact Factor:
– All citing references weighted equally
• Eigenfactor:
– SOME CITING REFERENCES ARE MORE
IMPORTANT THAN OTHERS
• The citing articles from journals that are heavily cited
themselves demonstrate greater
influence
13. Considerations
• Eigenfactor will always be bigger if a journal is
larger, i.e., publishes more articles
• Article Influence Score: corrects for journal
size
– takes the journal’s Eigenfactor score and further
divides it by the number of articles in the journal.
– Correlation to the JIF
14. Examples
• For the year 2011, Neurology had an eigenfactor
score of .159. This number = % of all citation
traffic of articles in the JCR
• For the year 2011, Neurology had an article
influence score of 2.57. This means an average
article in this journal is roughly 2 ½ X more
influential than an average article in all of JCR
• www.eigenfactor.org
15. “Citation Potential”
• SNIP: Source Normalized Impact Per Paper
• Uses Scopus data
• Citation Potential = total number of citing
references in all journals which have cited this
journal
• Takes an average citation count
The ratio of the journal’s average citation count per
paper to the citation potential in its subject field
16. Pros and cons of SNIP
• Can compare SNIP scores across disciplines
• Aggregate of a journal, so larger journals
automatically have higher scores than smaller
journals
17. “Prestige”
• SJR: Scimago Journal Rank
• Uses Scopus data
• Measures “current average prestige per
paper”
Prestige factors include: # of journals in the
Scopus database, # of articles in Scopus from
this journal, citation count, eigenvector analysis
of important citing references, corrections for
self-citations, and normalization by the number
of significant works published in the journal.
18. Pros and Cons of SJR
• Corrects for self citations
• Correlated to JIF
• Scores can be compared across disciplines
• Web version provides data on countries
• Three year window not good for social sciences
• http://www.scimagojr.com/
26. Citation count
• Number of times cited within a given time
period
– Journals, Authors, Articles, etc.
• Does not take into account
– Materials not included in citation database
– Self citations
– Variations in citation patterns/rates
27. Citation count
• Citation counts will vary depending on which
database you use
• It is very difficult to get a complete count of all
citing references
28. H-index
• Scopus, Google Scholar, WoS?
• Meant to account for differences in citation
patterns (i.e., “one-hit wonders” vs. consistent
record of scholarship)
“A scientist has index h if h of his/her Np
papers have at least h citations each and the
other (Np-h) papers have no more than h
citations each” (Hisrch 2005)
29. H-index Example
30
Scholar A Scholar B
10 27
25
10 12
9 5
20 8 4
7 4
Number of Citations
H-index 6 2
15
Scholar A 6 2
Scholar B
10 56 citations 56 citations
6 h-index 4 h-index
5
0
1 2 3 4 5 6 7
Article Number
30. Variations on the H-index
• G-index (Egghe 2006): gives greater weight to highly cited articles
– The top g number of articles have received a combined total of
g2 citations
• E-index (Zhang 2009): gives greater weight to highly cited articles
– The square root of the surplus of citations in the h-set beyond
h2
• Contemporary h-index (Sidiropolous, et. Al. 2006): gives greater
weight to newer articles
– “parameterized”: current year, citations count 4 times, four
years ago, citations count 1 time, 6 years ago, citations count
4/6 times
31. Variations on the H-index
• Individual h-index (Batista, et al. 2006)accounts for co-authors
– Divides the h-index by the average number of authors per paper
• Alternative individual h-index (Harzing): accounts for co-authors
– Normalizes citation counts: divides # of citations by average # of
authors per each paper and then computes the h-index
• Another alternative individual h-index (Schreiber 2006):
accounts for co-authors
– Divides by fractions of papers instead of # of authors, keeps full
citation count
32. Variations on the H-index
• Age weighted citation rate and AW index (Jin
2007): accounts for variations in citation
patterns over time
– AWCR= The square root of the sum of all age-weighted citation
counts over all papers that contribute to the h-index
– AW-index= the square root of the AWCR
– Per-author AWCR: AWCR divided by number of authors for each
paper
33. Publish or Perish
• Google scholar citation information
• Interdisciplinary topics, fields relying on
conference papers or reports
• Greatest variety of metrics
• Dirty data
• Unverified data
• Nonscholarly sources
50. Considerations
• Don’t measure an individual article’s impact by the
metrics for the entire journal
• Do I need a comparison within a discipline or across
disciplines?
• Does the citation pattern matter or just the count?
• Does the database being used cover my subject as
thoroughly as possible?
• To what degree does my subject area rely on non-
journal scholarly publications?
• Not all citing references are positive!
“Father of Bibliometrics” wanted to trace scholarly thoughtBibliometrics are an empirical measurement. The way to measure the importance of scholar, journal etc, is either by reputation or by bibliometric measurement. Both have merits.
Number of journals covered, how evaluated, pros and cons to each, GS has more foreign language, conference proceedings, government reports, unpublished manuscripts, dissertations and theses. Scopus not consistent before 1996 (even then, questionable!), Google Scholar also around the same time frame, WoS goes back very far.
First measurement, developed by E. Garfield
A JIF greater than 1 would mean that the Journal had more citing references for that year than articles published in the journal the previous two years.
Carl Bergstrom and team at University of WashingtonScaled so that the sum of all “citation traffic” appearing in JCR for that year = 100. So, the influence is a measure of how likely that journal will be used within the total citations appearing in JCR.
HenkMoed, Leiden University, NetherlandsIn this case, the idea is how much is it cited relative to how much it “could have been” cited? Different disciplines have different generally accepted citation potentials.
“Normalizing” the impact by dividing by citation potential will account for discrepancies in citation rates between disciplinesNeed an example to do live
SCImago was generated from a think tank at the University of Grenada in Spain
Library Science
Most often used for promotion and tenure dossiers
G index of 3 means 3 articles received 9 citations eachE-index: h squared is the theoretical minimum required to get an h index of h. If there are highly cited articles, there will be a larger surplus of citations beyond the theoretical minimum required to obtain that h indexContemporary h index a concern for those with career interruptions (birth of child, etc)
In certain disciplines co-authors are a detriment, in others it doesn’t matter
Measures the number of citations for a corpus of work, adjusted for age. Number of citations for each article in the h-index is divided by age of paper. The weighted citation counts are added up, and then the square root is taken.AW index takes the square root a second time: this is done to allow for comparison with h-index, if rate of citation is stable, will approximate the h-index
Knowing the differences in these databases’ coverage and knowing what each metric can provide can help bring the strong points of a journal’s/author’s/article’s/instittution’s scholarly record to light