Published on February 18, 2014
Lessons Learned from LOD (Linked Open Data) Failure and Big Data: The Future Trend Youngwhan Lee, Ph. D. 전화: 010-7997-0345 이메일: firstname.lastname@example.org Facebook: Youngwhan Nick Lee Twitter: nicklee002 1
Web Evolution and Big Data
Internet Today 2010: • Estimated 1011 Web pages in the World 2012: • • • Social Media: Facebook (1 Billion Monthly Active Users) 문자 발명후 2003년까지 5 엑사 바이트 2012년 현재 매일 7 엑사바이트 데이터 생성 중 Is “big data” a big pile of garbage? 1-3
Web Explosion and Big Data • • Number of Web Users (Mar. 2012): 2.3 Billion 1011 Web pages in the World (Est. 2010) – Since the inception of Web, there were 7000 days (i.e. 20 years). This means humans create over 10 Million pages a day. • Digital Information Created in the year 2010: 1 zetabytes (1021) - - • "There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.“ –Eric Schmitt (2010) 2012, almost 7 exabytes are created everyday. We call it “Big Data.” What does this mean?
Aggregation 데이터분석 지식구조화 큐레이션 RIF SPARQ L OWL RDF LOD NoSQL MapReduce R-DBMS Understanding Modified, based on Gene Bellinger, Durval Castro, Anthony Mills http://www.systems-thinking.org/dikw/dikw.htm , http://yjhyjh.egloos.com/39721
빅데이터/웹에서의 정보/지식 추출 • 정보 검색 – SEO(Search Engine Optimization) PageRank, EdgeRank • Data Mining: 프로그램에 의한 정보(지식) 추출 가능 – 통계분석, Rule-based Analysis, 신경망 분석 – Visualization 데이터사이언스 • 지식공학 이용 – RDF/OWL 사용한 온톨로지 누적 연결 – Raw Data 연결하고 분석 가능하도록 개방 (Linked Open Data; LOD) – 프로그램에 의한 논리분석 가능한 지식 추출 가능 • SPARQL • RIF(Rule-based Interface Framework) 지식공학 • 인간의 힘 이용: 큐레이션 – 인간의 눈과 지식을 이용하여 정보를 필터하고 종합 • 예: pinterest.com, videocooki.com, storify.com, scoop.it, curated.by
Pareto’s Law Longtail Bighead
Longtail Phenomena in The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains Longtail Applications Popularity Mobile Apps iPhone Apps Android Apps SNS Apps Facebook Apps Twitter Apps LOD and Others Medical Apps 공공 정보 활용 Apps … … … Bighead Applications … …
지식공학에서의 접근 • 온톨로지 구축 – Cyc – WolframAlpha – Siri • 데이터의 웹(Web of Data) – LOD LOD2
Old “Layercake” of Semantic Web 정보 교환
Linked Open Data (LOD) Principles Linking Open Data (LOD) is to connect and to open data to public A little history of LOD Project Tim Berners-Lee proposed LOD(Linking Open Data) project (2006) Since the proposal, numerous countries and organizations participated, caused LOD to explode in terms of the number of data Wikipedia DBpedia (www.dbpedia.org) Bio2RDF project opened in 27 fields of Biology, Genetics, Medical-related, of which the data sets are about 2.3 billions (Bio2RDF.org) (2008.10) BBC announced to participate LOD project (www.bbc.org), now one of the institutes actively utilizing the data US Data.gov released 5 billion data triples US Library of Congress announced to join LOD project. (http://id.loc.gov/authorities/sh85042531#concept) NY Times ( data.nytimes.com) release their data of 150 years of publication (2009.10) US Whitehouse release a plan to open data in RDF (2009.11) 4 Principles of LOD 1. 2. 3. 4. Use URIs as names for things Use HTTP URIs When someone looks up a URI, provide useful information Include links to other URIs
Advantages of LOD • • • • • • • • Elegant Expandable Flexible Powerful Decentralized Participatory Inclusive, and “Free” to use
Linked Open Data (LOD) Principles
Change of Web Structure 유저 인터페이스 인간을 위한 웹 페이지 연결 웹페이지 연결 버스 유저 인터페이스 인간을 위한 웹 페이지 연결 웹페이지 연결 버스 매쉬업 매쉬업 컴퓨터를 위한 웹 데이터 연결 웹데이터 연결 버스 18
Mar., 2008 May, 2007 Sep., 2008 July, 2009
SPARQL (Simple Protocol and RDF Query Language)
Web 3.0: Merging the two Perspectives WWW Propoal (1989) Semantic Web Technology Innovation Perspective LOD Proposal (2006) “GGG” Proposal (2007) Knowledge-based Semantics Next Generation Web Data-based Semantics Market Behavior Perspective WEB 1.0 WEB 2.0 Web 3.0 “WEB2” Proposal (2009) Technical Proposal Phase Practical Use Phase
But no Champaign… • Definition Unclear – Berners-Lee’s 4 principles are ambiguous • • • • Interpretation difficult Inconsistent Difficult both to learn and use Difficult to build browsers and reasoners • “Free” to use Full of incomplete and inconsistent RDFs, no way to make them evolve In short, “Garbage in, Garbage out” experienced
Solution to LOD problems: LOD2 • LOD2 Stack: A Technical Approach – Linked Data Management – Enrichment and Quality Improvement – Various Tools to use • • • • • Storage and Querying Revision and authoring Interlinking and fusing Classification and enrichment …
Q: Is this technical approach for LOD good enough? A: Business approach is definitely needed.
Big Data What did we do with big data in 2013? What would we do with big data in 2014?
빅데이터와 데이터 지상주의 End of Theory “이론의 종말” by Chris Anderson
Implication • Issue: Have and Have-not are separated – E. g. in marketing • 4Ps – Price, product, place, promotion • STP – Segmentation, targeting, and positioning
Implication • Is Technical Approach needed?
Business Approach • Data Markets – Azure Data Marketplace – Data.com – Infochimps.com – DataMarket.com – Kaggle.com
Data Market: Azure Data Marketplace
Data Market: Data.com
Data Market: Infochimps.com
Data Market: DataMarket.com
Data Market: Kaggle.com
Conclusion • Positioning for Korea, – Where are we? – Where are we heading to?
참고문헌 • 웹3.0 세상을 바꾸고 있다. – 이영환 • A Semantic Web Primer (Cooperative Information Systems series) – Grigoris Antoniou, Frank van Harmelen • Semantic Web for the Working Ontologist, Second Edition: Effective Modeling in RDFS and OWL – Dean Allemang, James Hendler • 온톨로지: 인터넷 진화의 열쇠 – 노상규, 박진수 • 월드와이드웹 – 팀 버너스-리 • 큐레이션 – 스티븐 로젠바움 저, 이시은 역
Web sites • Problems of Linked Data – http://milicicvuk.com/blog/2011/07/26/problems-of-linked-data14-identity/ • LOD2 – http://lod2.eu/Welcome.html – http://stack.lod2.eu/blog/ • How to Define Web 3.0 – http://howtosplitanatom.com/news/how-to-define-web-30-2/ • SPARQL by Example – http://www.cambridgesemantics.com/semantic-university/sparqlby-example#(1) • Practical P-P-P-Problems with Linked Data – http://www.mkbergman.com/917/practical-p-p-p-problems-withlinked-data/ • Linked-Data-Api – https://code.google.com/p/linked-data-api/
The purpose of lessons learned is to bring together any lessons learned during a project that can be usefully applied on future projects.
Big data is not just some big idea for ... Behavorial analytics; The Future of Big Data; ... is about sharing big ideas, best practices and lessons learned.
Download the annual Big Data Trends Overview report in the ... Knowledge Management Explained. May 4 ... data mining best practices / lessons learned/after ...
Palliative surgical management of patients with unresectable pancreatic adenocarcinoma: trends and lessons learned from a large, ... Data regarding ...
I learned seven lessons from that failure, ... and market data indicated ... You’re never the only person out there looking for the next big ...
Post-Implementation Reviews ... You also need to ensure that the lessons learned during the project are not ... to learn lessons for the future, ...
When Things Go Wrong: How Health Care Organizations Deal ... and for implementing the lessons learned. ... Failure: Lessons for Quality ...
Project Management: Challenges & Lessons Learned Joseph Amalraj Christine Hernani Kelly Ladouceur Aparna Verma BUEC 663 Dr. Joseph Doucet Friday February 9 ...
Business Intelligence is rapidly accelerating the New Zealand business landscape into a data driven future. Business Intelligence is ... Lessons learned ...