A peek into the future

Published on February 5, 2014

Author: PrateekChauhan2

The presentation is divided into 3 parts:
1. ORM- Object Relational Mapping
2. NoSQL
3. Big Data


BEFORE STARTING…. • Are relational tables the most efficient way to manage data? • Do companies like Facebook, Twitter really use traditional relational DBMS to manage data?


WAYS TO ACCESS DATABASE • Using a GUI based DBMS • Using a console based DBMS • Using database embedded with applications (most important).


THE BRIDGE: JDBC •Standard Java API for database-independent connectivity between the Java programming language and a wide range of databases. •JDBC provides a flexible architecture to write a database independent applications that can run on different platforms and interact with different DBMS without any modification. •JDBC includes APIs for each of the task commonly associated with database usage: Making a connection to a database. Creating SQL statements. Executing SQL queries in the database. Viewing & modifying the resulting records.

JDBC Pros of JDBC • Clean and simple SQL processing • Good performance with small data • Very good for small applications • Simple syntax so easy to learn Cons of JDBC • Complex if it is used in large projects • Large programming overhead • No encapsulation • Hard to implement MVC concept • Query is DBMS specific

The Problem

The Problem • • • • Mapping member variables to columns Mapping Relationships Handling data types (esp. Boolean) Managing changes to object state

The Problem Relational Object Mapping!

Saving without ORM • • • • • Database Configuration The Model Object Service method to create the model object Database Design DAO method to save the object using SQL queries

The ORM Way • JDBC Database Configuration – ORM specific Configuration • The Model object – Annotations • Service method to create the model object – Use the ORM framework API API • Database Design – Not Needed ! • DAO method to save the objects using SQL queries – Not Needed !

THE ONLY DISADVANTAGE • Boilerplate code => XML configuration files => XML system files => Extra classes like POJO, etc.

NoSQL: THE NAME • SQL: In general, “Traditional Relational DBMS”. • Past decade: RDBMS isn’t the best solution. • NoSQL: “No SQL”=> Not using traditional RDBMS

ISSUES WITH RDBMS • Primary issue: big package, has all the features, but sometimes we don’t need all of them: COMPROMISES • Convenient • Multi-user SIMILAR • Safety • Persistent BOOSTS • Reliable • MASSIVE (big data) • Efficient

NoSQL SYSTEMS Alternative to traditional RDBMS Pros • Flexible Schema • Quicker/ Cheaper to setup • Massive scalability: handle big data • Relaxed Consistency: higher performance & availability Cons • No declarative query language: more programming • Relaxed Consistency: fewer guarantees

Example: Social-Network Graph Each record: User ID1, User ID2 … Separate records: User Id, name, age, gender … A B I G H C F D K J E L

Example: Social-Network Graph • TASK: Find all friends of given users. • TASK: Find all friends of friends of given user. • TASK: Find all women friends of men friends of given user. • TASK: Find all friends of friends of…. friends of given user.

INCARNATIONS OF NoSQL • MapReduce Framework: OLAP (big operations) • Key-Value Store: OLTP (small operations) • Document Stores • Graph database systems

MapReduce Framework • Originally from Google, open source: Hadoop. • Two main functions: 1. Map: divides the problem into sub problem. 2. Reduce: operates upon the sub problems and combines output to give record. • Current implementations: 1. Hive: SQL like language 2. Pig: statement language

Graph Database Systems •Data Model: nodes and edges. •Nodes may have properties. •Edges may have labels or roles. •Example: neo4j, FlockDB, Pregel Friends ID: 3 ID: 1 Friends Likes Likes ID: 2

AGAIN, SOME QUESTIONS… • What is the maximum file size you’ve dealt so far? • What is the maximum download speed you get? • How much time required to just transfer data?

What is Big Data? • Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. • From the beginning of recorded time until 2003,  We created 5 billion gigabytes (exabytes) of data. • In 2011, the same amount was created every two days • In 2013, the same amount of data is created every 10 minutes. THIS IS “BIG DATA”

What is Big Data?-FINALLY.. • Big- Data’ is similar to ‘Small-data’ but bigger • But having data bigger it requires different approaches: – Techniques, tools, architecture • With an aim to solve new problems – Or old problems in a better way

Type of Data • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data – Social Network, Semantic Web (RDF), … • Streaming Data – You can only scan the data once

What to do with these data? • Aggregation and Statistics – Data warehouse and OLAP • Indexing, Searching, and Querying – Keyword based search – Pattern matching (XML/RDF) • Knowledge discovery – Data Mining – Statistical Modeling


Big Data Analytics Technologies • NoSQL: non-relational database solutions such as Hbase, Cassandra, MongoDB, Riak, CouchDB, and many others. • Hadoop: It is an ecosystem of software packages, including MapReduce, HDFS, and a whole host of other software packages.

Summarizing… • Key enablers for the appearance and growth of ‘Big-Data’ are: + Increase in storage capabilities + Increase in processing power + Availability of data


