Published on December 17, 2013
Apache Crunch ● What is it ? ● How does it work ? ● Why use it ? ● Hadoop MapReduce pipelines ● Scrunch ● Joins www.semtech-solutions.co.nz firstname.lastname@example.org
Apache Crunch – Pipe line ● Crunch is based on Google's FlumeJava ● Provides a Java based API for M/R pipelines ● It uses an MST ( multiple serializable type ) data model ● Good for processing complex data types ● Better for “non tuple” data types i.e. – Images – Audio – Seismic data www.semtech-solutions.co.nz email@example.com
Apache Crunch – Pipe line ● What is a Map Reduce Pipe line ? – Map – Shuffle – Reduce – Combine ● Arranged in sequence and / or in parallel ● Potentially very long chains www.semtech-solutions.co.nz firstname.lastname@example.org
Apache Crunch – Scala ● Scrunch is a Scala wrapper for Apache Crunch ● Reduced code ● Functional and OO styles ● Uses type inferencing for Map / Reduce ● Incorporates Java Materialize functionality ● Includes REPL ( read eval print loop ) www.semtech-solutions.co.nz email@example.com
Apache Crunch – Joins ● Details of Joins available in Crunch – Inner / Outer like SQL joins – Same with Left / Right / Full joins – MapSide join is an in memory join www.semtech-solutions.co.nz firstname.lastname@example.org
Apache Crunch – Performance ● A light weight API that runs efficiently ● Crunch is a thin veneer on top of Map Reduce ● Two implementations available – – ● Hadoop Writeables Avro Avro implementation much faster www.semtech-solutions.co.nz email@example.com
Apache Crunch – API ● Data Model ● Operators – Pipeline – DoFn – MRPipeline – CombineFn – MemPipeline – FilterFn – Pcollection – Joins – Ptable – Cartesian – PgroupTable – Sort – Source – Secondary Sort – Target – Pobject – Emitter – BloomFilters – PType www.semtech-solutions.co.nz firstname.lastname@example.org
Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – email@example.com ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems
Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...
In this presentation we will describe our experience developing with a highly dyna...
Presentation to the LITA Forum 7th November 2014 Albuquerque, NM
Un recorrido por los cambios que nos generará el wearabletech en el futuro
Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...
A short introduction to Apache Crunch. What is it and how does it simplify and aid the creation of Hadoop pipelines ?
Apache Crunch Pipelines ... This feature is not available right now. Please try again later.
A short introduction to Apache Crunch. What is it and how does it simplify and aid the creation of Hadoop pipelines ? – PowerPoint PPT presentation
This guide is intended to be an introduction to Crunch. Introduction. Crunch is used for processing data. Crunch builds on top of Apache Hadoop to provide ...
Apache Crunch User Guide Introduction to Crunch. Motivation; Data Model and Operators; Data Processing with DoFns. DoFns vs. Mapper and Reducer ...
Scrunch A Scala Wrapper for the Apache Crunch Java API Introduction¶ Scrunch is an experimental Scala wrapper for the Apache Crunch ...
Introduction to Crunch. ... Code samples in this documentation are licensed under the Apache License, Version 2.0. ...