Migrating from MongoDB to Neo4j - Lessons Learned

50 %
50 %
Information about Migrating from MongoDB to Neo4j - Lessons Learned

Published on February 18, 2014

Author: seenickcode

Source: slideshare.net


This month we will learn how to use the (somewhat) new 2.0 branch of Michael Hunger's Batch Importer

We'll also have a concise presentation on our experiences at Shindig Labs when we migrated production MongoDB data into Neo4j.


1. key considerations made when choosing to move to a graph database

2. estimating the effort involved in changing our code

3. our data modeling approach

4. how we exported our data

5. how we used the Batch Importer to import our data (step by step)

Others will be sharing their experiences as well and an open discussion will follow.

Meetup Feb 17th, 2014 Migrating from MongoDB to Neo4j 1

Agenda • Intros – name, what you do, interest in Neo4j? • Case Study, Moving from MongoDB – considerations, why and how – steps taken, findings – using the Batch Importer • Group Discussion – experiences from others? source: http://neo4j.rubyforge.org/guides/why_gra ph_db.html

Case Study, Moving from MongoDB source: http://neo4j.rubyforge.org/guides/why_gra ph_db.html

Our Startup – A mobile drink discovery platform: explore new drinks, post photos, learn new facts, follow other drink afficionados (whisky, beer, wine, cocktail experts) 4

Using MongoDB – Pluses for us: • flexible (by far, most substantial benefit) • good documentation • easy to host and integrate with our code – Downsides for us: • lots of collections needed (i.e. for mapping data, many to many relationships) • queries with multiple joins 5

Relying on Redis – Needed to cache a lot in Redis – We cached • user profile • news feed – Too much complexity • another denormalized data model to manage • more difficult to test • increase in bugs and edge cases – Still awesome, but just relied on it too much 6

Evaluating Neo4j – Our goals • simplify data model (less denormalization) • speed up highly relational queries • keep our flexibilty (schemaless data model) – Considerations • how will we host? • will it make our codebase more complex? • support? • easy to troubleshoot production issues? 7

How We Evaluated 1. We set up an instance on Amazon EC2 (though Heroku was still an option as well) 2. Imported realistic production data with the Batch Importer 3. Took our most popular, slowest query and tested it 4. Wrote more example queries for standard use cases (creating nodes, relationships, etc), easy to use? 5. Ran a branch of our code with Neo4j for a month 8

How We Evaluated 1. Made sure we could get good support for the product 2. Determined effort involved in hosting it on Amazon EC2 (though Heroku was also an option) 3. Determined effort needed to import bulk data and change our data model 4. Audited each line of code and made a list of the types of queries we’d need. Estimated effort involved in updating our codebase. 5. Imported production data and took our most popular, slowest query and tested performance. 6. Wrote other more common queries and tested performance more (using Apache Benchmark) 7. Was the driver (this case Ruby) support okay and was it well-written? Would it be maintained years from now? 8. Test it out as a code branch for at least a month 9

Our Findings 1. So far so good (been testing for a few weeks now) 2. Set up an instance on Amazon EC2. Wasn’t that bad. 3. Complex queries were a lot faster 4. Ruby driver (Neography) does the job though not perfect. 5. Planning to use Neo4j’s official Ruby library once they finish version 3.0 (which seems to not require JRuby) 10

Our Findings 6. We needed to create an abstraction layer in the code to simplify reads and write with the database. Wasn’t that bad though. 7. Our data model got a lot more intuitive. No more map collections (yay) 8. We can now implement recommendations a lot more easily when we want to 9. No longer need to rely heavily on Redis and caching 11

Our Findings 10.We think about our data differently now 11. Managing the data model is actually fun 12

Tutorial on Batch Importer 1. Our example involves real data 2. We will be using Ruby to generate .CSV files representing nodes and relationships 3. Beware, existing documentation is “not good” to put it lightly 4. Using the 2.0 version! (Precompiled binary) https://github.com/jexp/batch-import/tree/20 13

Steps 1. Install Neo4j 2. Download a binary version of batch importer 3. Batch Importer requires .CSV files. One type of file will import nodes, another will import relationships 4. Decide on fields that make nodes unique 1. ex: user has a username, a drink has a name 2. makes the process of mapping node relationships later a lot easier too 14

.CSV Format for Nodes • Tab separated columns • Importing Nodes – node property names in first row – format is <field name>:<field type> (defaults to String) – all rows after that are corresponding property values • Importing Relationships – sepate .CSV file, source node’s unique field in first col, target node’s unique field in second col, the word “type” in the 3rd column – since we’re arleady using unique index on nodes, it’s easy to relate them! – can import multiple relations between two types of nodes in the same .CSV file 15

Creating Drink Nodes • Example output (tab delimited) 16

Creating Drink Nodes namespace :export do require 'csv' task :generate_drink_nodes => :environment do CSV.open("drink_nodes.csv", "wb", { :col_sep => "t" }) do |csv| csv << ["name:string:drink_name_index", "type:label", "name"] Drink.all.each do |drink| csv << [drink.name, "Drink", drink.name] end end end end 17

Running the Script • Make sure all nodes, relationships deleted from Neo4j – MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r • Stop your Neo4j server before importing • Run the import command (per the binary batch importer we downloaded earlier): – ./import.sh ~/neo4j-community-2.0/data/graph.db user_nodes.csv 18

Creating User Nodes • Example output (tab delimited): 19

Creating User Nodes CSV.open("user_nodes.csv", "wb", { :col_sep => "t" }) do |csv| csv << ["username:string:user_username_index", "type:label", "first_name", "last_name"] User.all.each do |user| csv << [user.username, "User", user.first_name, user.last_name] end 20

User to User Relationships • NOTE: it’s easy to relate users to users since we already have an index set up. • Example output (tab delimited): 21

User to User Relationships CSV.open("user_rels.csv", "wb", { :col_sep => "t" }) do |csv| csv << ["username:string:user_username_index", "username:string:user_username_index", "type"] User.all.each do |user| user.following.each do |other_user| csv << [user.username, other_user.username, "FOLLOWS"] end user.followers.each do |other_user| csv << [other_user.username, user.username, "FOLLOWS"] end end end 22

User to Drink Relationships • Example output: 23

User to Drink Relationships CSV.open("user_drink_rels.csv", "wb", { :col_sep => "t" }) do |csv| csv << ["username:string:user_username_index", "name:string:drink_name_index", "type"] User.all.each do |user| user.liked_drinks.each do |drink| csv << [user.username, drink.name, "LIKED"] end user.disliked_drinks.each do |drink| csv << [user.username, drink.name, "DISLIKED"] end user.drink_journal_entries.each do |entry| csv << [user.username, entry.drink.name, "JOURNALED"] end end end 24

Test Your Data • Test with some cypher queries – cheat sheet: http://docs.neo4j.org/refcard/2.0 – ex: MATCH(n:User)-[r:FOLLOWS]-(o) WHERE n.username='nickTribeca' RETURN n, r limit 50 • Note: you must limit your results or else the Data Browser will become too slow to use 25

That’s the Tutorial • You can always migrate data yourself without the batch importer – ie. script to query MongoDB data and insert it to Neo4j in real time using your API • Using the Batch Importer is really fast though • Have found it faster to write and less error prone than writing my own script 26

Group Q&A • Thanks for coming • @seenickcode • nicholas.manning@gmail.com for questions • Want to present? Let me know. 27

Add a comment

Related presentations

Related pages

Migrating from MongoDB to Neo4j - Lessons Learned

1. Meetup Feb 17th, 2014 Migrating from MongoDB to Neo4j1. 2. Agenda • Intros – name, what you do, interest in Neo4j?• Case Study, Moving from ...
Read more

Migrating from Relational Databases to MongoDB - Technology

... MTV and Cisco have migrated successfully from relational databases to MongoDB. ... MongoDB to Neo4j - Lessons Learned. Migrating from MySQL to MongoDB ...
Read more

Migrating from Relational Databases to MongoDB - Technology

Migrating from Relational Databases to MongoDB ...
Read more

Migrating from Relational Database to NoSQL at Nokia

... the complexity involved and the lessons learned ... java mariadb migration mongodb neo4j nhibernate open ... Migrating from Relational Database to ...
Read more

Mongodb | LinkedIn

Optimizing MongoDB: Lessons Learned at Localytics. 80,576 Views. ... Migrating from Relational Databases to MongoDB. ... Neo4j. NoSQL. CouchDB. View the ...
Read more

Database Migration tutorials and videos - Part 2

migration Archive. Migrating ... and the lessons learned along the ... headline hibernate index java mariadb migration mongodb neo4j nhibernate open source ...
Read more

Migrating to Java 8 - YouTube

... present their strategies & experiences with migrating your team to Java 8 ... SeaJUG June 2013 MongoDB and Neo4j ... Lessons Learned ...
Read more

Migrating from MySQL to MongoDB at Wordnik - Documents

1.MongoSF 4/30/2010From MySQL to MongoDB Migrating a Live Application Tony Tam. 2. What is Wordnik Project to track language like GPS for English
Read more