The General Index : Public Resource : Free Download, Borrow, and Streaming : Internet Archive

6 min read Original article ↗

Welcome to the General Index

The General Index consists of 3 tables derived from 107,233,728 journal articles. A table of n-grams, ranging from unigrams to 5-grams, is extracted using SpaCy. Each  of the 355,279,820,087 rows of the n-gram table consists of an n-gram coupled with a journal article id. A second table is constructed using Yake and consists of 19,740,906,314 rows, each with a keywords and an article id. A third table associates an article id with metadata. 

Public Resource has also made available The TDM Today Show and an early release of The Florilegium: A Special Index to Plants. A previous article about our work in this area appeared in Nature. The General Index was also the subject of a more recent article in Nature. There is also a Special Index to Species available.

Declaration of Support for the General Index

“Public Resource, a registered nonprofit organization based in California, has created a General Index to scientific journals. The General Index consists of a listing of n-grams, from unigrams to five-grams, extracted from 107 million journal articles.

The General Index is non-consumptive, in that the underlying articles are not released, and it is transformative in that the release consists of the extraction of facts that are derived from that underlying corpus. The General Index is available for free download with no restrictions on use. This is an initial release, and the hope is to improve the quality of text extraction, broaden the scope of the underlying corpus, provide more sophisticated metrics associated with terms, and other enhancements.

Access to the full corpus of scholarly journals is an essential facility to the practice of science in our modern world. The General Index is an invaluable utility for researchers who wish to search for articles about plants, chemicals, genes, proteins, materials, geographical locations, and other entities of interest. The General Index allows scholars and students all over the world to perform specialized and customized searches within the scope of their disciplines and research over the full corpus.

Access to knowledge is a human right and the increase and diffusion of knowledge depends on our ability to stand on the shoulders of giants. We applaud the release of the General Index and look forward to the progress of this worthy endeavor.”

Signatories to the Declaration of Support

  1. Dr. Vinton G. Cerf, Internet Pioneer
  2. Dr. Gitanjali Yadav, National Institute of Plant Genome Research and Cambridge University
  3. Dr. Ross Mounce, Arcadia
  4. Dr. Ian T. Foster, University of Chicago
  5. Dr. Amitabh Joshi, J.C. Bose National Fellow; Jawaharlal Nehru Centre for Advanced Scientific Research
  6. Heather Joseph, Executive Director, SPARC
  7. Dr. Corynne McSherry, Legal Director, Electronic Frontier Foundation
  8. Dr. Lawrence Liang, Ambedkar University School of Law
  9. Dr. Dinesh Singh, Former Vice Chancellor, University of Delhi
  10. Dr. Pamela Samuelson, University of California, Berkeley School of Law
  11. Alexander B. Howard, Director, The Digital Democracy Project
  12. Blair MacIntyre, Professor, Georgia Tech
  13. Kaylea Champion, PhD Student, University of Washington
  14. Samuel Klein, Curator, Knowledge Futures Group
  15. Eric Brunner, Oregonian
  16. Peter C. Richardson, Associate Technical Fellow, The Boeing Company
  17. Federico Leva, Wikimedia Italia
  18. Nick Shockey, Director of Programs & Engagement, SPARC
  19. Christof Schöch, Professor of Digital Humanities, Trier University, Germany
  20. Dave Hansen, Librarian, Duke University
  21. Lambert Heller, Librarian, Leibniz Information Centre for Science and Technology
  22. Cameron Neylon, Professor, Curtin University
  23. Roger Levy, Professor, Massachusetts Institute of Technology
  24. Lingfei Wu, Professor, The University of Pittsburgh
  25. Onur Varol, Professor, Sabanci University
  26. Matthew Elvey, Medical Researcher, Yale University
  27. Daniel Stökl Ben Ezra, Professor, Ecole Pratique des Hautes Etudes
  28. James Evans, Professor, University of Chicago
  29. Peter Suber, Office for Scholarly Communication, Harvard University
  30. Philip Young, Librarian, Virginia Tech
  31. Gavin Moodie, Doctor, University of Toronto
  32. Memo Cordova, Librarian, Boise State University
  33. Oscar Perea Rodriguez, Lecturer, University of San Francisco
  34. Kyle K. Courtney, Copyright Advisor, Harvard University
  35. Agitha T.G, Professor, Retired
  36. Subbiah Arunachalam , Professor, Indian Institute of Science
  37. Rochelle Pinto, Independent Researcher
  38. Rahul Siddharthan, Professor "G", The Institute of Mathematical Sciences, Chennai
  39. Dr. Himender Bharti, Professor, Puniabi University, Patiala
  40. Fernando Gonzalez-Candelas, Professor, University of Valencia, Spain
  41. Jasjeet Singh Bagla, Professor, IISER Mohali
  42. Anirudh Gupta, Data Scientist, Thoughtworks
  43. Michael Travers, Software Engineer, Parker Institute for Cancer Immunotherapy
  44. M P Gururajan, Professor, IIT Bombay
  45. Tim O'Reilly, CEO, O'Reilly Media
  46. Chris Mills, Engineering Manager, Indeed
  47. Mark Johnson, Technologist and Adjunct Professor, North Carolina State University
  48. Jeff Cox, Lawyer, UniCourt
  49. Cable Green, Director of Open Education, Creative Commons
  50. Ashutosh Sharma, Research Student, The University of Trans-Disciplinary Health Sciences and Technology, India
  51. David S. Reed, Founder, Center for Public Administrators
  52. Jean-Claude Guédon, Professor (retired), Université de Montréal
  53. Chris Hartgerink, Director, Liberate Science GmbH
  54. Khaeruddin Kiramang, Student, Curtin University
  55. Ramy Arnaout, Professor, Beth Israel Deaconess Medical Center
  56. Ian Connor, Industry Supervisor, QUT
  57. Martin R. Lucas, Lawyer, Chambers of Martin R. Lucas
  58. Jorge Cortell, Former Associate Professor of Intellectual Property, Polytechnic University of Valencia
  59. Lane Rasberry, Research Scientist, School of Data Science, University of Virginia
  60. James Clement, Research Scientist, Betterhumans Inc
  61. Uri Hasson, Professor of Cognitive Neuroscience, University of Trento Italy
  62. LJ Eads, Data Scientist, The MITRE Corporation
  63. Jerry Goldman, Professor Emeritus, Northwestern University
  64. Alex O. Holcombe, Professor, The University of Sydney
  65. Rajarshi Das, Research, FatBrain.AI
  66. M Madhan, Librarian, Jindal Global University
  67. Mark Hahnel, CEO, Figshare
  68. Nidhal Selmi, Software Engineer, Arizona State University
  69. John J. Murphy, Physician/Clinical Informaticist, Veterans Health Administration
  70. Allen Riddell, Assistant Professor, Indiana University Bloomington
  71. Derek Hefley, Graduate Student, Missouri S&T
  72. Antonio Max, Author, Independent Researcher
  73. Bethaney Hatch, Executive Assistant, ArrantaBio
  74. Deborah Salerno, Independent Medical Writer, Salerno Scientific
  75. Geethanjali Sreenivasarao Pavar, PhD Researcher, University of Edinburgh
  76. Dr. Nimal Chandrasena, Former Associate Professor, University of Colombo
  77. Carlos Denner, Professor, University of Brasilia
  78. Álvaro Saladén Roa, Professor, Universidad de Cartagena
  79. NAFIUL, Student, Mymensingh Medical College
  80. Vincent Raymond, Professor, Université Laval (Ville de Québec)
  81. Dr. O. O. Ilori, Head of Department , Obafemi Awolowo University
  82. Gina Santos Itchon, Professor, Xavier University - Ateneo de Cagayan
  83. Dr. Marjorie J. Hinds, Independent Researcher
  84. Rafael Lairet, Professor, Universidad Simón Bolívar
  85. C. Mitchell Clark, University of Nebraska-Lincoln
  86. Oladimeji Oluwalasinu, Student, Obafemi Awolowo University Ile-Ife
  87. Zhiwen Hu, Professor, Zhejiang Gongshang University
  88. Sahaya G. Selvam, Associate Professor, Marist International University College
  89. Dr. Johannes Kabisch, Professor, NTNU
  90. Philip Meier, CTO, Maila Health
  91. Filip Vukovinski, Researcher, Staatsinstitut
  92. Catherine Demoliou, Professor, University of Nicosia
  93. Daniel Mietchen, Researcher, Ronin Institute
  94. Marc Robinson-Rechavi, Professor, University of Lausanne
  95. Tristan Henderson, Graduate Student, Mississippi State University
  96. Ivan Arisi, Scientist, European Brain Research Institute (EBRI)
  97. V. Jithin, Student, Wildlife Institute of India
  98. Amos Bairoch, Professor, Swiss Institute of Bioinformatics
  99. Peter Murray-Rust, Dr, University of Cambridge
  100. Robert H'obbes' Zakon, Founding Principal, Zakon Group LLC