advertisement

Linked science presentation 25

41 %
59 %
advertisement
Information about Linked science presentation 25
Science

Published on October 19, 2014

Author: FrancescoOsborne

Source: slideshare.net

Description

Clustering Citation Distributions for Semantic Categorization and Citation Prediction
by F. Osborne, S. Peroni, E. Motta

In this paper we present i) an approach for clustering authors according to their citation distributions and ii) an ontology, the Bibliometric Data Ontology, for supporting the formal representation of such clusters. This method allows the formulation of queries which take in consideration the citation behaviour of an author and predicts with a good level of accuracy future citation behaviours. We evaluate our approach with respect to alternative solutions and discuss the predicting abilities of the identified clusters.

URL: http://oro.open.ac.uk/40784/1/lisc2014.pdf
advertisement

1. Clustering Citation Distributions for Semantic Categorization and Citation Prediction Francesco Osbornea , Silvio Peronibc, Enrico Mottaa, a KMi, The Open University, United Kingdom b Department of Computer Science and Engineering, University of Bologna, Bologna, Italy c Institute of Cognitive Sciences and Technologies, CNR, Rome, Italy October 2014

2. Is it possible to say who will have a bigger impact?

3. Can I exploit this information for semantic expert search?

4. Clustering of Citation Distribution Authors’ data Clusters of authors with similar citation patterns EExxttrraaccttiioonn ooff semantic features RDF BiDO Ontology Our approach

5. Clustering Citation Distributions We cluster the citation distributions by exploiting a bottom-up hierarchical clustering algorithm. We thus need to define: • A norm • A metric to assess the quality of a set of clusters

6. Clustering Citation Distributions A B C D

7. Clustering Citation Distributions A B C D 1. dis(A, B) = dis(C, D)

8. Clustering Citation Distributions A B C D 1. dis(A, B) = dis(C, D) 2. dis(A, C) > 0 , dis(B, D) > 0

9. Clustering Citation Distributions A B C D 1. dis(A, B) = dis(C, D) 2. dis(A, C) > 0 , dis(B, D) > 0 3. Can be computed incrementally

10. Clustering Citation Distributions A simple way to satisfy these three requirements is to use a normalized Euclidean distance:     

11.        

12.      /2

13. Clustering Citation Distributions We want to maximize the homogeneity of the cluster populations in the following years.

14. Standard deviation is not the solution…

15. Standard deviation is not the solution…

16. Clustering Citation Distributions We estimate the homogeneity by computing the weighted average of the MAD:        MAD (Median Absolute Deviation ) is a robust measure of statistical dispersion and it is used to compute the variability of an univariate sample of quantitative data.

17. Clustering Citation Distributions We then compute the memberships of all authors in our dataset with the centroids of the resulting clusters. !  " Σ $%&' (')*,, $%&' (')-,, 

18. /./0 Finally we calculate a number of statistics for estimating the evolution of the members of each clusters.

19. How can we represent this data? Bibliometric data are subject to the simultaneous application of different variables. In particular, one should take into account at least: • the temporal association of such data to entities; • the particular agent who provided such data (e.g., Google Scholar, Scopus, our algorithm); • the characterisation of such data in at least two different kinds, i.e., numeric bibliometric data (e.g., the standard bibliometric measures such as h-index, journal impact factor, citation count) and categorial bibliometric data (so as to enable the description of entities, e.g., authors, according to specific descriptive categories).

20. BiDO

21. Extraction of Semantic features

22. :hasCurve [ a :Curve ; :hasTrend :increasing ;

23. :hasCurve [ a :Curve ; :hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ;

24. :hasCurve [ a :Curve ; :hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ; :hasSlope [ a :Slope ; :hasStrength :low ;

25. :hasCurve [ a :Curve ; :hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ; :hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ;

26. :hasCurve [ a :Curve ; :hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ; :hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ; :hasOrderOfMagnitude :[243,729) ;

27. :hasCurve [ a :Curve ; :hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ; :hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ; :hasOrderOfMagnitude :[243,729) ; :concernsResearchPeriod :5-years-beginning .

28. :increasing-with-premature-deceleration-and-low-logarithmic-slope-in- [243,729)-5-years-beginning a :ResearchCareerCategory ; :hasCurve [ a :Curve ; :hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ; :hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ; :hasOrderOfMagnitude :[243,729) ; :concernsResearchPeriod :5-years-beginning .

29. :john-doe :holdsBibliometricDataInTime [ a :BibliometricDataInTime ; tvc:atTime [ a time:Interval ; time:hasBeginning :2014-07-11 ] ; :accordingTo [ a fabio:Algorithm ; :increasing-with-premature-deceleration-and-low-logarithmic-slope-in-[243,729)- 5-years-beginning a :ResearchCareerCategory ; :hasCurve [ a :Curve ; frbr:realization [ a fabio:ComputerProgram ] ] ; :withBibliometricData :increasing-with-premature-deceleration-and-low-logarithmic- :hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ; :hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ; slope-in-[243,729)-5-years-beginning . :hasOrderOfMagnitude :[243,729) ; :concernsResearchPeriod :5-years-beginning .

30. Evaluation • We evaluated our method on a dataset of 20000 researchers working in the field of computer science in the 1990-2010 interval. • This dataset was derived from the database of Rexplore , a system to provide support for exploring scholarly data, which integrates several data sources (Microsoft Academic Search, DBLP++ and DBpedia).

31. Evaluation

32. Evaluation Y C18 (1.4%) C22 (2.5%) C25 (2.7%) C28 (2.3%) C29 (8.8%) range mean range mean range mean range mean range mean 6 420-800 567±98 160-280 209±34 100-180 129±25 60-100 72±14 40-60 39±9 7 440-960 610±120 160-320 225±45 100-200 138±30 60-120 79±18 40-80 45±14 8 440-1020 650±137 160-400 246±58 100-260 158±45 60-160 90±26 40-100 50±18 9 440-1260 699±186 160-440 269±74 100-340 187±68 60-200 104±37 40-120 57±25 10 480-2940 751±411 160-500 292±85 100-400 211±82 60-280 125±57 40-160 68±35 11 480-2480 826±336 180-660 331±112 100-520 241±100 60-540 155±103 40-200 82±47 12 480-3520 914±467 180-860 370±151 100-640 270±126 60-440 166±96 40-260 97±60

33. Evaluation

34. Future Works • Augment the clustering process with a variety of other features (e.g., research areas, co-authors); • apply this technique to groups of researchers rather then single individuals; • extend BiDO in order to provide a semantically-aware description of such new features; • make available a triplestore of bibliometric data linked to other datasets such as Semantic Web Dog Food and DBLP.

35. Questions? francesco.osborne@open.ac.uk silvio.peroni@unibo.it e.motta@open.ac.uk BiDO Ontology: http://purl.org/spar/bido

Add a comment

Related presentations

How organisms adapt and survive in different environment.

Aplicación de ANOVA de una vía, modelo efectos fijos, en el problema de una empres...

Teori pemetaan

Teori pemetaan

November 10, 2014

learning how to mapping

Libros: Dra. Elisa Bertha Velázquez Rodríguez

Materi pelatihan gis

Materi pelatihan gis

November 10, 2014

learning GIS

In this talk we describe how the Fourth Paradigm for Data-Intensive Research is pr...

Related pages

Linked science presentation 25 - Science - docslide.us

In this paper we present i) an approach for clustering authors according to their citation distributions and ii) an ontology, the Bibliometric Data ...
Read more

LinkedIn - Official Site

300 million+ members | Manage your professional identity. Build and engage with your professional network. Access knowledge, insights and opportunities.
Read more

Linked Science

Supporting learning and reproducibility online via Linked Open Science.” at the ... 25: Opening 09:25 – 10 ... 11:00 – 12:20: Paper presentation ...
Read more

Science Presentation | LinkedIn

View 1942 Science Presentation posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn.
Read more

PowerPoint Presentation - University of Pittsburgh

Personalized science and medicine. ... mPAP < 25 mmHg. PAH . mPAP≥ 25 mmHg. ... 1005 subjects have not been linked to an ejection fraction.
Read more

Science PowerPoint Templates and PPT Slides - SlideTeam

Science PowerPoint Templates and PPT Slides ... If you have a biology presentation about DNA, ... or any field of science.
Read more

Presentation Software | Online Presentation Tools | Prezi

Welcome to Prezi, the presentation software that uses motion, zoom, and spatial relationships to bring your ideas to life and make you a great presenter.
Read more

LinkedIn Help Center

Overall, how satisfied were you with your experience on the LinkedIn Help Center today? * This field is required. Very satisfied; Satisfied; Somewhat satisfied
Read more