Pentaho and NoSQL

50 %
50 %
Information about Pentaho and NoSQL

Published on March 12, 2014

Author: feristhia1

Source: slideshare.net

Description

This is the powerpoint slide presentation given during Jakarta Java Meet Up, 7th March 2014 at BliBli.com.

1JaMU – Jakarta 7 Maret 2014 Pentaho and NoSQL Java Meet Up (JaMU), Jakarta 7th March, 2014 Feris Thia feris@phi-integration.com 08176-474-525

2JaMU – Jakarta 7 Maret 2014 ABOUT ME Founder 2007 2013 Feris Thia PHI-Integration

3JaMU – Jakarta 7 Maret 2014 ABOUT ME Book Author Feris Thia November 2013

4JaMU – Jakarta 7 Maret 2014 ABOUT ME Community Manager Feris Thia Excel Indonesia User Group (EIUG) Pentaho User Group Indonesia (Pentaho-ID) 2008 (~1000 members) 2013 (~5000 members)

5JaMU – Jakarta 7 Maret 2014 ABOUT ME PHI-Integration Clients Community Manager Feris Thia

6JaMU – Jakarta 7 Maret 2014 AGENDA DATA PREPARATION What and why it is important? PENTAHO DATA INTEGRATION Popular Open Source ETL NOSQL An Emerging Non Relational DatabaseTechnology

7JaMU – Jakarta 7 Maret 2014 PROBLEMS?

8JaMU – Jakarta 7 Maret 2014 image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/ What cause sales increase in this area? Is there something unusual happen? WHAT?? So we cannot make any decisions until the data ready. We need some times to prepare additional data to answer that. Yes, sir….

9JaMU – Jakarta 7 Maret 2014 Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/ TYPICAL SOLUTION SOPHISTICATED REPORTING OR DASHBOARD APPLICATION!

10JaMU – Jakarta 7 Maret 2014 Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg PROBLEMS REMAIN…

11JaMU – Jakarta 7 Maret 2014 Time Spent on Data Preparation 80 % Data Quality 50% Extract, Transformation & Load 30%

12JaMU – Jakarta 7 Maret 2014

13JaMU – Jakarta 7 Maret 2014 DATA PREPARATION IS THE KEY Entry Systems Data Preparation Reporting Basic Data Presentation Performance Dashboard (Visualization) 1 2 3 4 Notes: Data preparation is often undermine.

14JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE Entry Systems Data Warehouse Business Intelligence 1 2 3

15JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE

16JaMU – Jakarta 7 Maret 2014 CHALLENGES

17JaMU – Jakarta 7 Maret 2014 INTEGRATION of many data sources INCREMENTAL Extract only changes DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAQUALITY missing data, conversion etc. PROTOCOL driver availability, reliability, etc. EXTRACT

18JaMU – Jakarta 7 Maret 2014 NORMALIZE DENORMALIZE SPLIT/ MERGE DATAREDUCTION (Aggregate,etc) TRANSPOSE TEXTPARSING TRANSFORM

19JaMU – Jakarta 7 Maret 2014 PERFORMANCE of many data sources CHANGES structure, data type, column size, etc DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAMAPPING sync with correlated data Output Format Excel, PDF, HTML, RDBMS, etc. LOAD

20JaMU – Jakarta 7 Maret 2014 DEMO Data structure changes to increase SQL query performance.

21JaMU – Jakarta 7 Maret 2014 Pentaho Data Integration Open Source ETL

22JaMU – Jakarta 7 Maret 2014 FEATURES AND BENEFITS • Open Source • Cost Efficient • More than 200 modules • Multi OS Platform • Working with emerging Big Data platforms • Low Learning Curve

23JaMU – Jakarta 7 Maret 2014 DEMO Basic Extract and Transformaion More I/O Helper Table (Closure) 1 2 3

24JaMU – Jakarta 7 Maret 2014 NoSQL Not only SQL

25JaMU – Jakarta 7 Maret 2014 2009 Redis Initial Release TIMELINE Emergence of open source NoSQL 2004 2006 2007 2008 2009 2011 2012 2013 2014 2007 MongoDB Started, Neo4J Initial Release 2004 Google’s Map Reduce Paper Published 2012 Google Spanner Paper Published 1998 1998 NoSQL coined 2006 Hadoop Started 2008 Apache Hbase, Apache Cassandra

26JaMU – Jakarta 7 Maret 2014 NOSQL GROUPS DOCUMENT MongoDB, CouchDB, Ria k WIDE COLUMN Cassandra, Hbase, Hype rtable GRAPH Neo4J, OrientDB KEY - VALUE Redis, MemcacheDB, SimpleDB <K, V>

27JaMU – Jakarta 7 Maret 2014 NOSQL VS SQL http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/ Data Store Type Use Cases Advantages Disadvantages Key Product Key-Value In-memory cache, web-site analytics, log file analysis Simple, replication, versioning, locking, transactions, and sorting web-accessible, schema-less, distributed Simple, small set of data types, limited transaction support Redis, Scalaris, Tokyo Cabinet Tabular or Columnar Data mining, analytics Rapid data aggregation, scalable, versioning, locking, web- accessible, schema-less, distributed Limited transaction support Google BigTable, Hbase or HyperTable, Cassandra Document Store Document management CRM, Business continuity Stores and retrieves unstructured documents, map/reduce, web- accessible, schema-less, distributed Limited transaction support CouchDB, MongoDB, Riak Traditional Relational Transaction processing, typical corporate workloads Well documented and supported, mature code, widely implemented in production Cost, vertical scaling, increased complexity Oracle, Microsoft SQL Server, MySQL Cluster

28JaMU – Jakarta 7 Maret 2014 Nosql VS SQL • Schema are much more flexible • Non relational (no joins) • Horizontal Scalability • Master – Slave • Peer-to-peer • Data Pipeline – Expressions – Functional Programming • ACID (Atomicity, Consistency, Isolation, Du rability) • BASE (Basic Availability, Soft- state, Eventual consistency) • CAP (Consistency, Availability, Partition Tolerance)

29JaMU – Jakarta 7 Maret 2014 DB-ENGINES.COM DB RANKING PER 7 MARCH 2014 Rank Last Month DBMS Database Model Score Changes 1 1Oracle Relational DBMS 1491.8 -8.43 2 2MySQL Relational DBMS 1290.21 1.83 3 3Microsoft SQL Server Relational DBMS 1205.28 -8.99 4 4PostgreSQL Relational DBMS 235.06 4.61 5 5MongoDB Document store 199.99 4.81 6 6DB2 Relational DBMS 187.32 -1.14 7 7Microsoft Access Relational DBMS 146.48 -6.4 8 8SQLite Relational DBMS 92.98 -0.03 9 9Sybase ASE Relational DBMS 81.55 -6.33 10 10Cassandra Wide column store 78.09 -2.23

30JaMU – Jakarta 7 Maret 2014 MongoDB Document Oriented Database • Schemaless • Distributed • Auto Sharding • Map Reduce Capabilities • Multi Platform • Structures – Database – Collections – Documents • Document – A record is a document – Similar to JSON Objects

31JaMU – Jakarta 7 Maret 2014 MongoDB • MongoDB Shell • Insert db.koleksi.insert( {nama: “PHI-Integration”, type: “Company”}) • Insert / Update db. koleksi.update( {nama: “PHI-Integration”}, {name: “Lightora”}, {upsert:true}) • Delete db. koleksi.remove( {nama: “PHI-Integration”, type: “Company”}) • Read / Query db. koleksi.find( {nama: “PHI-Integration”, $and [ {posting: {$gt : 100}} , posting: {$lt: 200}}]) Basic Commands & Expressions

32JaMU – Jakarta 7 Maret 2014 MONGODB DEMO Basic Commands PDI Extract and Load Aggregation Framework 1 2 3

33JaMU – Jakarta 7 Maret 2014 Neo4j Graph Database Properties Relationship Cypher Node

34JaMU – Jakarta 7 Maret 2014 Neo4J • Neo4J Web Admin • Create Node CREATE (n {property_name :“property_value" }) • Create Relation CREATE n-[:RELATION]->m • Where: – n, m is identifier – :RELATION is relation name Basic Utility, Commands & Expressions

35JaMU – Jakarta 7 Maret 2014 Neo4J • Matching and Returning Objects START emil=node:people(name='Emil') MATCH emil-[:MARRIED_TO]-madde RETURN madde Basic Commands & Expressions

36JaMU – Jakarta 7 Maret 2014 HIERARCHICAL MODEL Neo4j Case Demo Root Child 3 Child 4Child 2Child 1 Child 5

37JaMU – Jakarta 7 Maret 2014 Q&A

38JaMU – Jakarta 7 Maret 2014 Universitas Multimedia Nusantara New Media Tower, Lv.12 Scientia Boulevard St. Tangerang, Banten, 15811 +6221-7038-7738 (phone) + 628176-474-525 (mobile) https://www.facebook.com/feris.thia @FerisThia feris@phi-integration.com CONTACT ME

39JaMU – Jakarta 7 Maret 2014 BIG THANK YOU!

Add a comment

Related presentations

Related pages

NoSQL Database Development & Analytics | Pentaho

NoSQL databases become robust analytical tools with Pentaho Business NoSQL analytics. Get rich data visualization and exploration tools for a competitive edge.
Read more

MongoDB und Pentaho: Real-Time Analytics mit NoSQL

Wir werfen einen Blick auf MongoDB, den Marktführer im Bereich NoSQL-Datenbanken, und zeigen euch die vielfältigen Möglichkeiten des Datenbanksystems auf.
Read more

Pentaho and MongoDB | Skills Matter Meetup

Join Pentaho Users Group for their first meetup of 2014 . At this month's meetup Mark Melton will talk about his TFL live data demo, MongoDB and more.
Read more

Kettle and NoSQL: MongoDB » Jens Bleuel about Kettle (PDI)

On May 23rd 2012 Pentaho and 10gen are jointly announcing a partnership to provide direct integration between Pentaho Business Analytics and MongoDB.
Read more

MongoDB - Pentaho Big Data - Pentaho Wiki

This section contains a series of How-Tos that demonstrate the integration between Pentaho and the MongoDB NoSQL Database. These how-to guides show, with ...
Read more

Pentaho | MongoDB

Working together, Pentaho and MongoDB offer a comprehensive end-to-end analytics solution, including data ingestion, ... End-user driven NoSQL environment;
Read more

Pentaho | Data Integration, Business Analytics and Big ...

Big data integration and analytics solutions from Pentaho turn information into insights. Gain a competitive advantage with Pentaho’s platform.
Read more

Pentaho and NoSQL - Documents - Discover, share, present ...

1. 1JaMU – Jakarta 7 Maret 2014 Pentaho and NoSQL Java Meet Up (JaMU), Jakarta 7th March, 2014 Feris Thia feris@phi-integration.com 08176-474-525
Read more

Crazy NoSQL Data Integration with Pentaho

Pentaho Mission Delivering the future of analytics today: modern, unified data integration and business intelligence platform • Full business analytics ...
Read more