Case Study: Public Library of Science Thesaurus: Year One

75 %
25 %
Information about Case Study: Public Library of Science Thesaurus: Year One

Published on March 17, 2014

Author: accessinnovations



Presented at the 10th annual Data Harmony Users Group meeting on Tuesday, February 11, 2014 by Rachel Drysdale of PLOS. Discusses the process of building and integrating their new thesaurus into the PLOS journals workflow and publication platform. From constructing the thesaurus to creating channels for feedback and updates, through building new current awareness and discovery tools, to gathering data for article level metrics and web site analytics, follow their progress through to today’s PLOS websites and services.

The PLOS Thesaurus: the first year Rachel Drysdale – Taxonomy Manager, PLOS DHUG 2014 11th February, 2014

Public Library of Science - evolution 2000 PLOS founded 2003 PLOS Biology 2004 PLOS Medicine 2005 PLOS Computational Biology (June) PLOS Genetics (July) PLOS Pathogens (September) 2006 PLOS ONE 2007 PLOS Neglected Tropical Diseases 2

Journal Article Count PLOS Biology 3,450 PLOS Medicine 2,626 PLOS Computational Biology 3,112 PLOS Genetics 4,048 PLOS Pathogens 3,639 PLOS ONE 87,296 PLOS Neglect Trop Diseases 2,444

Journal Article Count PLOS Biology 3,450 PLOS Medicine 2,626 PLOS Computational Biology 3,112 PLOS Genetics 4,048 PLOS Pathogens 3,639 PLOS ONE 87,296 PLOS Neglect Trop Diseases 2,444 beautiful monster….

Overview – today’s talk The Solution: Good Thesaurus + Machine Aided Indexing  Building the new Thesaurus with AI  The initial implementation at  MAIstro integration into Publishing workflow  Thesaurus maintenance The Service:  Content Discovery Article Analysis  Relative Metrics 5

Starting point 2011 – the old Taxonomy Inadequate in content – just over 3100 specific terms Inflexible in structure – terms in pre-defined paths Housed in Editorial Manager ossified and difficult to update Author-chosen terms - association with article 6

PLOS delivered to Access Innovations…. A copy of the old PLOS Taxonomy Over 2,000 suggested changes “Research analysis and methods” branch request Use cases: Subject Area-based searches Hierarchy-based exploration of our corpus Email Alerts based on Subject Area searches RSS Feeds based on Subject Areas 7

Access Innovations added:  STEM vocabulary  Broader/Narrower term relationships  Rules for the Machine Aided Indexing  Synonyms  Analysis with respect to the PLOS corpus and fro with PLOS …. Result: Vastly improved NISO Z-39.19-compliant thesaurus 8

Statistics 9 Old Taxonomy A. I. Thesaurus Terms 3,132 10,156 Synonyms 0 3,291 Tiers 5 7 Rules 0 14,798

Top-level Terms 1. Biology and life sciences 2. Computer and information sciences 3. Earth sciences 4. Engineering and technology 5. Environmental sciences and ecology 6. Medicine and health sciences 7. Physical sciences 8. Research and analysis methods 9. Science policy 10. Social sciences 10

Infrastructure PLOS Taxonomy server: Thesaurus – plos2012thes Data Harmony Thesaurus Master and MAI Rule Builder Corpus fed to the Taxonomy Server for MAI Article by article Initial implementation: Title – Abstract - Results – Methods Top 8 hits selected 11

Elapsed time from project kick-off until terms appeared on published articles: 9 months

13 Learning curve – teething troubles Not all articles had Subject Area terms – why not? Initial implementation – text to index: Title + Abstract* + Results + Methods Upon consideration – text to index: Full Text (though not references) Implementation of “all paths” Polyhierarchy implications

Consider “White blood cells” Biology and life sciences Medicine and health sciences Immunology Immunology Immune cells Immune cells White blood cells White blood cells Biology and life sciences Biology and life sciences Cell biology Cell biology Cellular types Cellular types Animal cells Animal cells Blood cells Immune cells White blood cells White blood cells 14 The polyhierarchy and Search

15 Establishing update cycle - articles: Initial implementation: Entire back-corpus indexed at once New Papers: PLOS submits text to MAIstro at publication MAI returns terms and term frequencies PLOS stores terms in search engine

16 Establishing update cycle - thesauri: Separate instances (nerves): Production server – plosthes.2013-6 Working version – plosthes.2013-7 When ready to release a new version: Load onto test server – MAI corpus - Index Test: new/changed/deleted terms rule changes structural changes any implementation changes

17 Thesaurus updates – why? More terms : Memory T cells, Monocotyledons Errrm… : Report gene detection What? : Webs Hierarchy changes deemed desirable: Geographical locations Organisms (Un)Rule(y) : snails, fabrication, pumas

Thesaurus updates – how? 18

Thesaurus updates – how? 19

Thesaurus updates – how? 20

Thesaurus updates – how? 21

22 Rule-Building in MAIstro – Pumas before...

23 Rule-Building in MAIstro – Pumas before... p53 upregulated modifier of apoptosis or

Rule-Building in MAIstro – Pumas after… 24


26 Thesaurus updates – prioritisation? Miss-hits and missed term reports: Ourselves: article pages Our readers: in email complaints in twitter in correspondence with our editorial staff via Journal and Saved Search alerts via article pages – Flagged Term reports


28 Things we learned – Thesaurus editorial Tension: strict and rigorous taxonomy/ontology construction vs user utility Abbreviations and Synonyms Issues that continue to exercise us: T cells/Memory T cells Obesity/Childhood obesity When should we make both explicit? Rule work – working to top 8

29 Building a new project - exports

30 Building a new project - import

Content Discovery How has having the thesaurus changed the way that users interact with PLOS web sites?

32 • Journal alerts • Saved Searches • RSS feeds • Hierarchy exploration Problem: How to keep up? Solution: Current Awareness Tools


34 Journal alerts

35 Journal alerts

36 Journal alerts

37 Journal alerts

38 Journal alerts

39 Saved search

40 Saved search

41 RSS feeds

42 RSS feeds

43 Hierarchy exploration

44 Hierarchy exploration

45 Hierarchy exploration

46 Hierarchy exploration

47 Hierarchy exploration

48 Hierarchy exploration

Relative Metrics

Relative Metrics: Defining a Paper’s Peer Group 1. Group papers by Subject Area Accommodate multiple topics per paper 2. Group papers by age Important for comparison of cumulative measures like total downloads or citations 3. Determine norms for peer group The average usage of each paper is compared with the median usage of its peer group More on Relative Metrics at: 50

51 Relative Metrics

52 Relative Metrics



Area of development - Editorial Workflow

The PLOS Thesaurus and Peer Review Maintaining a copy of the PLOS thesaurus in Editorial Manager helps with editor and reviewer matching 56 Classifications for People Classifications for Papers

The PLOS Thesaurus and Peer Review • Authors select Subject Area terms related to their article submissions • Editors and Reviewers select terms that represent their areas of expertise • Staff and Editors use these terms to help ensure editors and reviewers are well matched to the submissions they are handling 57

Planned Enhancements • Automate the application of terms associated with Editors, Reviewers and submitted articles with MAIstro • Provide Editors and Staff with detailed terms to assist with reviewer selection and vetting – Academic disciplines help Editors gauge Subject Area relevance of potential Reviewers – Methods, protocols and model organisms help Editors gauge technical suitability of potential Reviewers 58

59 Jonas Dupuich Product Manager Patrick Polischuk Product Manager Sebastian Toomey Interaction Designer Jennifer Lin Senior Product Manager Martin Fenner ALM Technical Lead Kallie Huss Senior Publications Assistant John Chodacki Director - Product Management Dramatis personae:


Add a comment

Related pages

PLOS | Public Library of Science

PLOS ONE. PLOS Neglected Tropical Diseases. PLOS Collections. PLOS Currents. PLOS Blogs Network.
Read more

Thesaurus Relationships - YouTube

Thesaurus Relationships ... New Delhi in Library & Information Science by Dr Mangala ... Case Study: Public Library of Science Thesaurus, ...
Read more

Libraries synonyms, libraries antonyms -

Synonyms for libraries in Free Thesaurus. ... even public libraries of their books ... History meets state of the art 2 case studies: one library blends hi ...
Read more

ERIC - Thesaurus - Case Studies

ERIC is an online library of education research and information, sponsored by the Institute of Education Sciences ... Case Studies (Education) ...
Read more

Library science - Wikipedia, the free encyclopedia

Library science (often termed library ... with library science as the study of the aims and ... Gambier Public Library is known as one of the ...
Read more

Using PsycINFO's APA Thesaurus and Subject Terms - YouTube

Using PsycINFO's APA Thesaurus and Subject Terms ... "explode" a thesaurus term, ... Case Study: Public Library of Science Thesaurus, ...
Read more

User experience in the library--a case study

User experience in the library: a case study. New ... Internet services in the last few years ... the first study consisted of one faculty ...
Read more

The Case Study as a Research Method - University of Texas ...

The Case Study as a Research ... information to the public. The case study method is ... to Library and Information Science Case study ...
Read more

National Center for Case Study Teaching in Science (NCCSTS)

The case study method of teaching applied to college science teaching, from The National Center for Case Study Teaching in Science
Read more | Find Synonyms and Antonyms of Words at ...

... you by Quickly find synonyms and antonyms. ... iOS revolutionizes the thesaurus for the first time in 160 years. ... One Basket: 8 ...
Read more