advertisement

Schema Design

60 %
40 %
advertisement
Information about Schema Design
Technology

Published on February 21, 2014

Author: mongodb

Source: slideshare.net

advertisement

Schema Design Mike Friedman Perl Engineer & Evangelist, MongoDB

Agenda • What is a Record? • Core Concepts • What is an Entity? • Associating Entities • General Recommendations

All application development is Schema Design

Success comes from Proper Data Structure

What is a Record?

Key → Value • One-dimensional storage • Single value is a blob Key • Query on key only • No schema • Value cannot be updated, only replaced Blob

Relational • Two-dimensional storage (tuples) • Each field contains a single value Primary Key • Query on any field • Very structured schema (table) • In-place updates • Normalization process requires many tables, joins, indexes, and poor data locality

Document • N-dimensional storage _id • Each field can contain 0, 1, many, or embedded values • Query on any field & level • Flexible schema • Inline updates * • Embedding related data has optimal data locality, requires fewer indexes, has better performance

Core Concepts

Traditional Schema Design Focus on data storage

Document Schema Design Focus on data use

Another way to think about it What answers do I have? What questions do I have?

Three Building Blocks of Document Schema Design

1 – Flexibility • Choices for schema design • Each record can have different fields • Common structure can be enforced by application • Easy to evolve as needed

2 – Arrays Multiple Values per Field • Each field can be: – Absent – Set to null – Set to a single value – Set to an array of many values • Query for any matching value – Can be indexed and each value in the array is in the index

3 - Embedded Documents • An acceptable value is a document • Nested documents provide structure • Query any field at any level – Can be indexed

What is an Entity?

An Entity • Object in your model • Associations with other entities Referencing (Relational) Embedding (Document) has_one belongs_to has_many embeds_one embedded_in embeds_many has_and_belongs_to_ma ny MongoDB has both referencing and embedding for universal coverage

Let's model something together How about a business card?

Business Card

Referencing Contacts Addresses { { } “_id”: , “name”: “title”: “company”: “phone”: “address_id”: , ”, , , } “_id”: , “street”: “city”: “state”: ”, “zip_code”: “country”: , , ,

Embedding Contacts { “_id”: , “name”: “title”: “company”: “address”: { “street”: “city”: “state”: , “zip_code”: “country”: }, “phone”: } , , , , , ,

Contact • • • • name company title phone Address • • • • street city state zip_code Relational Schema

Contact • • • • name company adress address • Street • street • City • city • State • State • Zip • zip_code • title • phone Document Schema

Contact Contact • • • • name company title phone Address • • • • street city state zip_code • name • company • adress address • Street street • City city • State state • Zip zip_code • title • phone How are they different? Why?

Schema Flexibility { “name”: “title”: “company”: “address”: { “street”: “city”: “state”: , “zip_code”: }, “phone”: { “name”: “url”: “title”: , “company”: “email”: “address”: { “street”: “city”: “state”: , “zip_code”: } “phone”: “fax” , , , , , } } , , , , , , ,

Example

Let’s Look at an Address Book

Address Book • What questions do I have? • What are my entities? • What are my associations?

• • • • name location web bio • name N 1 N 1 1 Thumbnail s • mime_type • data Contacts 1 • • • N • • type street city state zip_code Phones • name 1 • company • title 1 1 1 Portraits • mime_type • data Addresses Groups Twitters 1 N • type • number Emails N • type • address Address Book Entity-Relationship

Associating Entities

• • • • name location web bio • name N 1 N 1 1 Thumbnail s • mime_type • data Contacts • • • N • • type street city state zip_code Phones • name 1 • company • title 1 1 1 Portraits • mime_type • data Addresses Groups Twitters 1 N • type • number Emails N • type • address 1 One to One

One to One Schema Design Choices contact • twitter_id 1 1 twitter Contact • twitter twitter • May save a fetch? contact twitter 1 1 • contact_id Redundant to track relationship on both sides • Both references must be updated for consistency 1

One to One General Recommendation • Full contact info all at once – Contact embeds twitter • Parent-child relationship Contact • twitter – “contains” • No additional data duplication • Can query or index on embedded field – e.g., “twitter.name” twitter 1

• • • • name location web bio • name N 1 N 1 1 Thumbnail s • mime_type • data Contacts • • • N • • type street city state zip_code Phones • name 1 • company • title 1 1 1 Portraits • mime_type • data Addresses Groups Twitters 1 N • type • number Emails N • type • address 1 One to Many

One to Many Schema Design Choices contact • phone_ids: [ ] 1 N phone • phones phone N • Not possible in relational DBs • Save a fetch? contact Contact phone 1 N • contact_id Redundant to track relationship on both sides • Both references must be updated for consistency

One to Many General Recommendation • Full contact info all at once – Contact embeds multiple phones • Parent-children relationship – “contains” Contact • phones phone N • No additional data duplication • Can query or index on any field – e.g., { “phones.type”: “mobile” } – Exceptional cases… • Scaling: maximum document size is 16MB

• • • • name location web bio • name N 1 N 1 1 Thumbnail s • mime_type • data Contacts • • • N • • type street city state zip_code Phones • name 1 • company • title 1 1 1 Portraits • mime_type • data Addresses Groups Twitters 1 N • type • number Emails N • type • address 1 Many to Many

Many to Many Traditional Relational Association Join table Groups • name X GroupContacts • group_id • contact_id Use arrays instead Contacts • • • • name company title phone

Many to Many Schema Design Choices group • contact_ids: [ ] N N contact group • contacts contact group contact • groups N group N contact N N • group_ids: [ ] Redundant to track relationship on both sides • Both references must be updated for consistency Redundant to track relationship on both sides • Duplicated data must be updated for consistency

Many to Many General Recommendation contact • Depends on use case group N N • group_ids: [ 1. Simple address book ] • Contact references groups 2. Corporate email groups • Group embeds contacts for performance • Exceptional cases – Scaling: maximum document size is 16MB – Scaling may affect performance and working set

Groups Contacts • name N • name • company • title twitter N 1 1 Portraits • mime_type • data • • • • addresses N 1 name location web bio thumbnail 1 • mime_type • data • • • • • type street city state zip_code phones N • type • number emails N • type • address Document model - holistic and efficient representation

Contact document example { “name” : “Gary J. Murakami, Ph.D.”, “company” : “MongoDB, Inc.”, “title” : “Lead Engineer”, “twitter” : { “name” : “Gary Murakami”, “location” : “New Providence, NJ”, “web” : “http://www.nobell.org” }, “portrait_id” : 1, “addresses” : , “phones” : , “emails” : }

Working Set To reduce the working set, consider… • Reference bulk data, e.g., portrait • Reference less-used data instead of embedding – Extract into referenced child document Also for performance issues with large documents

General Recommendations

Legacy Migration 1. Copy existing schema & some data to MongoDB 2. Iterate schema design development Measure performance, find bottlenecks, and embed 1. one to one associations first 2. one to many associations next 3. many to many associations 3. Migrate full dataset to new schema New Software Application? Embed by default

Embedding over Referencing • Embedding is a bit like pre-joined data – BSON (Binary JSON) document ops are easy for the server • Embed (90/10 following rule of thumb) – When the “one” or “many” objects are viewed in the context of their parent – For performance – For atomicity • Reference – When you need more scaling – For easy consistency with “many to many” associations without duplicated data

It’s All About Your Application • Programs+Databases = (Big) Data Applications • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs MongoDB = Great Big Data Applications

Thank You Mike Friedman Perl Engineer & Evangelist, MongoDB

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Schema - A Research and Design Firm

Schema is a research and design firm that turns information into action. We focus on data visualization and the design of information.
Read more

About - Schema

About Us. Schema is a design firm for the information age. We design for impact, focusing on information platforms that scale and evolve with organizations.
Read more

Database design - Wikipedia, the free encyclopedia

Database design is the process of producing a detailed data model of database. ... Conceptual schema. Schema refinement Schema refinement of the ...
Read more

Schema Design (@schemadesign) | Twitter

The latest Tweets from Schema Design (@schemadesign). Turning information into action. Seattle, WA
Read more

Schema Design in MongoDB vs Schema Design in MySQL

Percona Consultant Stephane Combaudon elaborates on the key differences between a MySQL database schema design and a MongoDB schema design.
Read more

XML-Schema-Designer

Der XML-Schema-Designer (XSD-Designer) ist ein grafisches Tool, mit dem Sie ein Schemaset auf unterschiedlichen Abstraktionsebenen visualisieren können ...
Read more

Graphic and Web Design Agency in St Albans ... - Schema Design

Schema-Design based in st albans has a simple, honest approach to business. Since 2001 we have earned a reputation for reliability, promptness and quality ...
Read more

Schema-design for SQL Server ... - insidesql.org

Schema-Design für SQL Server: Empfehlungen für Schema-Design mit Sicherheit im Blick
Read more

Sternschema – Wikipedia

Dieses Schema setzt sich aus einer Faktentabelle und ... P. Rob, C. Coronel, K. Crockett: Database systems: design, implementation & management ...
Read more

DbSchema: The Best Database Diagram Designer & Query Tool

DbSchema Diagram Designer and Query Tool. Features interactive diagrams, relational data browse, schema compare and synchronization, query builder, query ...
Read more