A brief introduction to bibliometrix

Massimo Aria and Corrado Cuccurullo

2020-09-28

bibliometrix

https://www.bibliometrix.org

Latest version

## bibliometrix  3.0.3

 

 

 

Citation for package ‘bibliometrix’

To cite bibliometrix in publications, please use:

Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, Journal of Informetrics, 11(4), pp 959-975, Elsevier.

 

 

A BibTeX entry for LaTeX users is

@Article{,

author = {Massimo Aria and Corrado Cuccurullo},

title = {bibliometrix: An R-tool for comprehensive science mapping analysis},

journal = {Journal of Informetrics},

volume = {11},

number = {4},

pages = {959-975},

publisher = {Elsevier},

year = {2017},

url = {https://doi.org/10.1016/j.joi.2017.08.007},

}

 

 

 

 

 

 

Authors’ affiliations

Dr. Massimo Aria

Full Professor in Social Statistics

PhD in Computational Statistics

Laboratory and Research Group STAD Statistics, Technology, Data Analysis

Department of Economics and Statistics

University of Naples Federico II

email

http://www.massimoaria.com

 

 

 

Dr. Corrado Cuccurullo

Full Professor in Strategy and Corporate Governance

PhD in Management

Dep.t of Management and Economics

University of Campania Luigi Vanvitelli

email

https://sites.google.com/site/cocuccurunina2/

 

 

 

 

 

 

Introduction

bibliometrix package provides a set of tools for quantitative research in bibliometrics and scientometrics.

Bibliometrics turns the main tool of science, quantitative analysis, on itself. Essentially, bibliometrics is the application of quantitative analysis and statistics to publications such as journal articles and their accompanying citation counts. Quantitative evaluation of publication and citation data is now used in almost all scientific fields to evaluate growth, maturity, leading authors, conceptual and intellectual maps, trends of a scientific community.

Bibliometrics is also used in research performance evaluation, especially in university and government labs, and also by policymakers, research directors and administrators, information specialists and librarians, and scholars themselves.

bibliometrix supports scholars in three key phases of analysis:

Bibliographic databases

bibliometrix works with data extracted from the four main bibliographic databases: SCOPUS, Clarivate Analytics Web of Science, Cochrane Database of Systematic Reviews (CDSR) and RISmed PubMed/MedLine.

SCOPUS (https://www.scopus.com), founded in 2004, offers a great deal of flexibility for the bibliometric user. It permits to query for different fields, such as titles, abstracts, keywords, references and so on. SCOPUS allows for relatively easy downloading data-queries, although there are some limits on very large results sets with over 2,000 items.

Clarivate Analytics Web of Science (WoS) (https://www.webofknowledge.com), owned by Clarivate Analytics, was founded by Eugene Garfield, one of the pioneers of bibliometrics.
This platform includes many different collections.

Cochrane Database of Systematic Reviews (https://www.cochranelibrary.com/cdsr/about-cdsr) is the leading resource for systematic reviews in health care. The CDSR includes Cochrane Reviews (the systematic reviews) and protocols for Cochrane Reviews as well as editorials. The CDSR also has occasional supplements. The CDSR is updated regularly as Cochrane Reviews are published “when ready” and form monthly issues; see publication schedule.

PubMed comprises more than 28 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher websites.

Data acquisition

Bibliographic data may be obtained by querying the SCOPUS or Clarivate Analytics Web of Science (WoS) database by diverse fields, such as topic, author, journal, timespan, and so on.

In this example, we show how to download data, querying a term in the manuscript title field.

We choose the generic term “bibliometrics”.

Querying from Clarivate Analytics WoS

At the link https://www.webofknowledge.com, select Web of Science Core Collection database.

Write the keyword “bibliometrics” in the search field and select title from the drop-down menu (see figure 1).

Figure 1

Choose SCI-EXPANDED and SSCI citation indexes.

The search yielded 291 results on May 09, 2016.

Results can be refined using options on the left side of the page (the type of manuscript, source, subject category, etc.).

After refining the query, you can add records to your Marked List by clicking the button “add to marked list” at the end of the page and selecting the records to save (see figure 2).

Figure 2

The Marked List page provides you with a list of publications selected and various means of exporting data.

To export the data you desire, choose the export tool and follow the three intuitive steps (see figure 3).

Figure 3

The export tool allows you to select the diverse fields to save. So, select the fields you are interested in (for example all the available data about marked records).

To download an export file, in an appropriate format for the bibliometrix package, make sure to select the option “Save to Other File Formats” and choose Bibtex or Plain Text.

The WoS platform permits to export only 500 records at a time.

The Clarivate Analytics Web of Science export tool creates an export file with a default name “savedrecs” with an extension “.txt” or “.bib” for plain text or BibTeX format respectively. Export files can be separately stored.

Querying from SCOPUS

The access to SCOPUS is via https://www.scopus.com.

To find all articles whose title includes the term “bibliometrics”, simply write this keyword in the field and select “Article Title” (see figure 4)

Figure 4

The search yielded 414 results on May 09, 2016.

You can download the references (up to 2,000 full records) by checking the ‘Select All’ box and clicking on the link ‘Export’. Choose the file type “BibTeX export” and “all available information” (see figure 5).

Figure 5

The SCOPUS export tool creates an export file with the default name “scopus.bib”.

bibliometrix installation

Download and install the most recent version of R (https://cran.r-project.org)

Download and install the most recent version of Rstudio (https://rstudio.com)

Open Rstudio, in the console window, digit:

install.packages(“bibliometrix”, dependencies=TRUE) ### installs bibliometrix package and dependencies

library(bibliometrix)   ### load bibliometrix package
## To cite bibliometrix in publications, please use:
## 
## Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, Journal of Informetrics, 11(4), pp 959-975, Elsevier.
##                         
## 
## http:\\www.bibliometrix.org
## 
##                         
## To start with the shiny web-interface, please digit:
## biblioshiny()

Data loading and converting

The export file can be read and converted using by R using the function convert2df:

convert2df(file, dbsource, format)

The argument file is a character vector containing the name of export files downloaded from SCOPUS, Clarivate Analytics WOS, Digital Science Dimenions, PubMed or Cochrane CDSR website. file can also contains the name of a json/xlm object download using Digital Science Dimenions or PubMed APIs (through the packages dimensionsR and pubmedR.

es. file <- c(“file1.txt”,“file2.txt”, …)

file <- "https://www.bibliometrix.org/datasets/savedrecs.bib"

M <- convert2df(file = file, dbsource = "isi", format = "bibtex")
## 
## Converting your isi collection into a bibliographic dataframe
## 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!

convert2df creates a bibliographic data frame with cases corresponding to manuscripts and variables to Field Tag in the original export file.

convert2df accepts two additional arguments: dbsource and format.

The argument dbsource indicates from which database the collection has been downloaded.

It can be:

The argument format indicates the file format of the imported collection. It can be “plaintext” or “bibtex” for WOS collection and mandatorily “bibtext” for SCOPUS collection. The argument is ignored if the collection comes from Pubmed or Cochrane.

Each manuscript contains several elements, such as authors’ names, title, keywords and other information. All these elements constitute the bibliographic attributes of a document, also called metadata.

Data frame columns are named using the standard Clarivate Analytics WoS Field Tag codify.

The main field tags are:

Field Tag Description
AU Authors
TI Document Title
SO Publication Name (or Source)
JI ISO Source Abbreviation
DT Document Type
DE Authors’ Keywords
ID Keywords associated by SCOPUS or ISI database
AB Abstract
C1 Author Address
RP Reprint Address
CR Cited References
TC Times Cited
PY Year
SC Subject Category
UT Unique Article Identifier
DB Bibliographic Database

For a complete list of field tags see https://www.bibliometrix.org/documents/Field_Tags_bibliometrix.pdf

Bibliometric Analysis

The first step is to perform a descriptive analysis of the bibliographic data frame.

The function biblioAnalysis calculates main bibliometric measures using this syntax:

results <- biblioAnalysis(M, sep = ";")

The function biblioAnalysis returns an object of class “bibliometrix”.

An object of class “bibliometrix” is a list containing the following components:

List element Description
Articles the total number of manuscripts
Authors the authors’ frequency distribution
AuthorsFrac the authors’ frequency distribution (fractionalized)
FirstAuthors corresponding author of each manuscript
nAUperPaper the number of authors per manuscript
Appearances the number of author appearances
nAuthors the number of authors
AuMultiAuthoredArt the number of authors of multi-authored articles
MostCitedPapers the list of manuscripts sorted by citations
Years publication year of each manuscript
FirstAffiliation the affiliation of the corresponding author
Affiliations the frequency distribution of affiliations (of all co-authors for each paper)
Aff_frac the fractionalized frequency distribution of affiliations (of all co-authors for each paper)
CO the affiliation country of the corresponding author
Countries the affiliation countries’ frequency distribution
CountryCollaboration the intra-country (SCP) and inter-country (MCP) collaboration indices
TotalCitation the number of times each manuscript has been cited
TCperYear the yearly average number of times each manuscript has been cited
Sources the frequency distribution of sources (journals, books, etc.)
DE the frequency distribution of authors’ keywords
ID the frequency distribution of keywords associated to the manuscript by SCOPUS and Thomson Reuters’ ISI Web of Knowledge databases

Functions summary and plot

To summarize main results of the bibliometric analysis, use the generic function summary. It displays main information about the bibliographic data frame and several tables, such as annual scientific production, top manuscripts per number of citations, most productive authors, most productive countries, total citation per country, most relevant sources (journals) and most relevant keywords.

Main information table describes the collection size in terms of number of documents, number of authors, number of sources, number of keywords, timespan, and average number of citations.

Furthermore, many different co-authorship indices are shown. In particular, the Authors per Article index is calculated as the ratio between the total number of authors and the total number of articles. The Co-Authors per Articles index is calculated as the average number of co-authors per article. In this case, the index takes into account the author appearances while for the “authors per article” an author, even if he has published more than one article, is counted only once. For that reasons, Authors per Article index \(\le\) Co-authors per Article index.

The Collaboration Index (CI) is calculated as Total Authors of Multi-Authored Articles/Total Multi-Authored Articles (Elango and Rajendran, 2012; Koseoglu, 2016). In other word, the Collaboration Index is a Co-authors per Article index calculated only using the multi-authored article set.

Elango, B., & Rajendran, P. (2012). Authorship trends and collaboration pattern in the marine sciences literature: a scientometric study. International Journal of Information Dissemination and Technology, 2(3), 166.

Koseoglu, M. A. (2016). Mapping the institutional collaboration network of strategic management research: 1980–2014. Scientometrics, 109(1), 203-226.

summary accepts two additional arguments. k is a formatting value that indicates the number of rows of each table. pause is a logical value (TRUE or FALSE) used to allow (or not) pause in screen scrolling. Choosing k=10 you decide to see the first 10 Authors, the first 10 sources, etc.

options(width=100)
S <- summary(object = results, k = 10, pause = FALSE)
## 
## 
## MAIN INFORMATION ABOUT DATA
## 
##  Timespan                              1985 : 2015 
##  Sources (Journals, Books, etc)        141 
##  Documents                             291 
##  Average years from publication        14.7 
##  Average citations per documents       11.73 
##  Average citations per year per doc    0.7463 
##  References                            6768 
##  
## DOCUMENT TYPES                     
##  art exhibit review              1 
##  article                         160 
##  article; proceedings paper      7 
##  biographical-item               1 
##  book review                     32 
##  correction, addition            1 
##  editorial material              41 
##  letter                          16 
##  meeting abstract                4 
##  note                            3 
##  review                          25 
##  
## DOCUMENT CONTENTS
##  Keywords Plus (ID)                    475 
##  Author's Keywords (DE)                365 
##  
## AUTHORS
##  Authors                               523 
##  Author Appearances                    635 
##  Authors of single-authored documents  121 
##  Authors of multi-authored documents   402 
##  
## AUTHORS COLLABORATION
##  Single-authored documents             144 
##  Documents per Author                  0.556 
##  Authors per Document                  1.8 
##  Co-Authors per Documents              2.18 
##  Collaboration Index                   2.73 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     1985        4
##     1986        3
##     1987        6
##     1988        7
##     1989        8
##     1990        6
##     1991        7
##     1992        6
##     1993        5
##     1994        7
##     1995        1
##     1996        8
##     1997        4
##     1998        5
##     1999        2
##     2000        7
##     2001        8
##     2002        5
##     2003        1
##     2004        3
##     2005       12
##     2006        5
##     2007        5
##     2008        8
##     2009       14
##     2010       17
##     2011       20
##     2012       25
##     2013       21
##     2014       29
##     2015       32
## 
## Annual Percentage Growth Rate 7.177346 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1     BORNMANN L         8     BORNMANN L                    4.67
## 2     KOSTOFF RN         8     WHITE HD                      3.50
## 3     MARX W             6     MARX W                        3.17
## 4     HUMENIK JA         5     ATKINSON R                    3.00
## 5     ABRAMO G           4     BROADUS RN                    3.00
## 6     D'ANGELO CA        4     CRONIN B                      3.00
## 7     GARG KC            4     BORGMAN CL                    2.50
## 8     GLANZEL W          4     MCCAIN KW                     2.50
## 9     WHITE HD           4     PERITZ BC                     2.50
## 10    ATKINSON R         3     KOSTOFF RN                    2.10
## 
## 
## Top manuscripts per citations
## 
##                                   Paper                                     DOI  TC TCperYear
## 1  DAIM TU, 2006, TECHNOL FORECAST SOC CHANG     10.1016/j.techfore.2006.04.004 211     14.07
## 2  WHITE HD, 1989, ANNU REV INFORM SCI TECHNOL   NA                             196      6.12
## 3  BORGMAN CL, 2002, ANNU REV INFORM SCI TECHNOL NA                             192     10.11
## 4  WEINGART P, 2005, SCIENTOMETRICS              10.1007/s11192-005-0007-7      151      9.44
## 5  NARIN F, 1994, SCIENTOMETRICS                 10.1007/BF02017219             141      5.22
## 6  CRONIN B, 2001, J INF SCI                     NA                             129      6.45
## 7  CHEN YC, 2011, SCIENTOMETRICS                 10.1007/s11192-010-0289-2      101     10.10
## 8  HOOD WW, 2001, SCIENTOMETRICS                 10.1023/A:1017919924342         71      3.55
## 9  D'ANGELO CA, 2011, J AM SOC INF SCI TECHNOL   10.1002/asi.21460               64      6.40
## 10 NARIN F, 1994, EVAL REV                       10.1177/0193841X9401800107      62      2.30
## 
## 
## Corresponding Author's Countries
## 
##           Country Articles   Freq SCP MCP MCP_Ratio
## 1  USA                  81 0.3057  76   5    0.0617
## 2  UNITED KINGDOM       27 0.1019  27   0    0.0000
## 3  GERMANY              17 0.0642  12   5    0.2941
## 4  FRANCE               13 0.0491  11   2    0.1538
## 5  BRAZIL               12 0.0453  10   2    0.1667
## 6  CHINA                10 0.0377   8   2    0.2000
## 7  INDIA                10 0.0377  10   0    0.0000
## 8  AUSTRALIA             8 0.0302   6   2    0.2500
## 9  CANADA                8 0.0302   7   1    0.1250
## 10 SPAIN                 8 0.0302   8   0    0.0000
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  USA                       1831                     22.60
## 2  GERMANY                    330                     19.41
## 3  ITALY                      163                     32.60
## 4  AUSTRALIA                  134                     16.75
## 5  UNITED KINGDOM             125                      4.63
## 6  CANADA                     111                     13.88
## 7  INDIA                       85                      8.50
## 8  IRAN                        74                     37.00
## 9  SPAIN                       73                      9.12
## 10 BELGIUM                     70                     10.00
## 
## 
## Most Relevant Sources
## 
##                                                            Sources        Articles
## 1  SCIENTOMETRICS                                                               49
## 2  JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY       14
## 3  JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE                       8
## 4  JOURNAL OF DOCUMENTATION                                                      6
## 5  JOURNAL OF INFORMATION SCIENCE                                                6
## 6  JOURNAL OF INFORMETRICS                                                       6
## 7  BRITISH JOURNAL OF ANAESTHESIA                                                5
## 8  LIBRI                                                                         5
## 9  SOCIAL WORK IN HEALTH CARE                                                    5
## 10 TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE                                   5
## 
## 
## Most Relevant Keywords
## 
##    Author Keywords (DE)      Articles Keywords-Plus (ID)     Articles
## 1      BIBLIOMETRICS               63    SCIENCE                   38
## 2      CITATION ANALYSIS           11    INDICATORS                24
## 3      SCIENTOMETRICS               7    IMPACT                    23
## 4      IMPACT FACTOR                5    CITATION                  20
## 5      INFORMATION RETRIEVAL        5    CITATION ANALYSIS         15
## 6      PEER REVIEW                  5    JOURNALS                  14
## 7      CITATION                     4    H-INDEX                   13
## 8      CITATIONS                    4    PUBLICATION               12
## 9      H-INDEX                      4    INFORMATION-SCIENCE       10
## 10     IMPACT FACTORS               4    IMPACT FACTORS             8

Some basic plots can be drawn using the generic function :

plot(x = results, k = 10, pause = FALSE)

Analysis of Cited References

The function citations generates the frequency table of the most cited references or the most cited first authors (of references).

For each manuscript, cited references are in a single string stored in the column “CR” of the data frame.

For a correct extraction, you need to identify the separator field among different references, used by ISI or SCOPUS database. Usually, the default separator is “;” or ". " (a dot with double space).

# M$CR[1]

The figure shows the reference string of the first manuscript. In this case, the separator field is sep = ";".