Cloudera federal summit

50 %
50 %
Information about Cloudera federal summit

Published on March 20, 2014

Author: mattcarroll148



Briefing on the DoDIIS Apps Engine, an open-source platform designed to make it easy for an enterprise to transition to the Cloud.

EzBake A Secure Apps Engine for DoDIIS Matthew Carroll, GM of 42six February 06, 2014

Outline 2! Why Build an Apps Engine? The Architecture What’s Next

The DoDIIS App Conundrum 3! Budget Cuts •  There is not enough money to transition over 400+ apps within DIA (business and mission) •  Outsourcing IaaS to C2S and GovCloud needs to be monitored for cost reimbursable over time •  Application elasticity is critical to understanding true costs of ownership and maintenance •  Data is a much bigger cost than expected •  Need to consolidate systems engineering support Technology migration is not simple •  Most apps are CRUD based; write a report, find a report •  Security business logic is baked into each app •  Number one question: why can’t I choose the technology that best fits my app? Not a Big Data problem….yet •  On the order of TBs at best •  Highly connected but not big Security is the ultimate killer of time •  Most time is spent meeting PL3 needs and encrypting traffic

Analytics are great but… 4! THE COMMUNITY DRIVE TO ANALYTICS AND ENRICHMENT ENGINES HAS LEFT DIA PLAYING CATCH UP IN ITS MIGRATION OF APPS TO A COMMON PLATFORM. 1.  Make it as easy as possible for any legacy app to transition 2.  Will not dictate technologies 3.  Provide standards for security and access to datastores 4.  The platform must deploy across multiple brokers, i.e. EC2, OpenStack, VMware and be completely transparent to the app team

Where to start? 5! Starting was hard. But it became very clear that several epics were essential to migrate applications in an efficient manner: 1.  Streaming of data into applications must be done in a standard way. Velocity and size of data is not as much as a factor to DIA as is the method to which the data is consumed and distributed. To answer this a stream-based data interface must be built to support the nexus of data distribution within the environment, we call this Frack. 2.  Everyone likes the concept of migrating to NoSQL but it becomes unmanageable from a DevOps perspective if everyone picks their own database for their own use cases. Furthermore, the point is to be multi-tenant. So we created datasets, a means to expose indexing patterns instead of explicit databases, exposed through a common security layer. 3.  Too much time is spent on baking in non-application specific logic into each application vice supporting a common service tier. In order to build standards around common service-based functions we built Services.

Integration vs. Engineering 6! Of the major issues identified early on in the project the most hindering of issues was the deployment model. •  App teams are spending 80% of their time integrating to new database and new services vice building application functionality •  Applications would each follow their own System Installation Procedure (SIP) by which each would deploy their own software •  Scale was defined through provisioning of machines vice true automated elasticity •  Start developing within 1 hour and deploy capability within 30 days

EzBake 7! EzBake provides an integrated way to compose the different elements of your application: collecting, processing, storing, and querying data. •  Focus on application logic •  Simple API that leverages complex, distributed frameworks •  Easy to use local development kit •  Deploy in minutes •  Framework is accredited, applications inherit accreditation •  Subscription-based data-feed-model •  Automated elasticity •  Design for failure

The Components 8! The core of the platform is pure open-source solutions and is broken into the following primary components: •  Streaming Ingest (Frack): This is the interface for building data flow topologies which abstracts the physical stream processor •  Common Services (Procedures): Scaled and commonly used thrift services, typically utilized during streaming ingest •  Data Persistence (Dataset): These are our indexing patterns, called Datasets, exposed as Thrift services and abstracts the physical databases •  Query: Both direct access to Datasets and Aggregate Query across the various Datasets •  Security: Both at the data persistence and user access layers •  Batch Analytics: MapReduce abstractions that allow input from Datasets and output to Datasets and will leverage the GovCloud DataCloud •  Deployment: Currently use OpenShift for automated deployment but plan to migrate to Docker + YARN

Technology Agnostic 9! •  Instead of a jack-of-all-trades indexing for free text search, geospatial search, etc use mission specific indices for specific application logic needs •  Focus on storage patterns vice database specific operations thereby enforcing data access standards across the enterprise •  Allow for new cartridges for web frameworks including node.js, python, Ruby, etc. Each app has their own needs and it is not on the platform builder to force the team into a particular technology, rather offer a solution to meet the use case

The Architecture 10!

Sharing 11! •  Sharing is exposed via the Common Services and the Aggregate Query •  The intent of the Common Services is to expose any functionality currently ingrained within stove-piped applications. By exposing that functionality as a service, other applications can leverage it, instead of application teams writing the same logic over and over again, such as entity extraction, date normalization, etc. •  The Common Services are wrapped in Thrift services, scaled out on the virtual infrastructure deployed through OpenShift •  The Aggregate Query is in development for delivery in EzBake v2.0, the current design will extend Impala to expose the EzBake Datasets as input for the distributed query engine •  App teams will expose “intents” within the Datasets for which they can respond, like a “person”, “place”, or “event” and the Impala engine will query plan and aggregate the results back to the requestor Sharing is the key component of EzBake in order to achieve cost savings and provide agility for the application developer

Security 12! •  Datasets are where the bulk of the security occurs, applying row level security to the data based on the user’s authorization string •  Row level security must be implemented in different ways, to support multiple types of datastores, for example, for the term dataset, which is ElasticSearch, we included a filter plugin that applies the boolean logic check at query time •  Embedding security across the platform allows the application teams to streamline their accreditation process Built-in from the start, EzBake implements security across all features.

Metering and Monitoring 13! •  Javascript API for web apps, Thrift API for services and REST for others •  Improve application usability/usefulness by examining analytics on usage patterns •  Diagnose issues with system, services and apps •  Determine cost allocation based on what agencies and organizations are using the system Data driven decisions

Timeline 14!

What’s Next 15! EzBake provides an integrated way to compose the different elements of your application: collecting, processing, storing, and querying data. •  Distributed query via Impala (Intents are coming) •  Apache Spark integration (dynamic ranking) •  Graph support - Titan •  Change YARN to control Docker •  Upgrade to CDH5 •  Extend Apache Sentry

Questions! Contact Us! Matthew Carroll GM, 42six @mcarroll_

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Cloudera Sessions

Apache/2.2.15 (CentOS) Server at Port 80
Read more

ATARC Federal Big Data Summit | Fed Summits

The ATARC Federal Big Data Summit is a Federal IT symposium in Washington, D.C. hosted by the Advanced Technology Academic Research Center (ATARC).
Read more

apache hive Archives - Page 32 of 34 - Cloudera ...

Cloudera invites you to the ... Cloudera Training for Apache Hadoop Surrounding Hadoop Summit 2011. May ... Adopting Apache Hadoop in the Federal ...
Read more

Cloudera Archives -

The 5th Annual Cloudera Federal Forum will be ... During the Hadoop Summit last ... Eddie Garcia is chief security architect at Cloudera, a leader in ...
Read more

Collection: Cloudera

Cloudera FCE Technical Summit 2015. 73 photos OSCON 2015. 152 photos Accumulo Summit 2015. 99 photos 2015 ApacheCon NA. 56 photos 2015 Cloudera Federal ...
Read more

25 Feb Cloudera Federal Forum in Tysons Corner: Amazing ...

25 Feb Cloudera Federal Forum in Tysons Corner: Amazing agenda filled with lessons learned and best practices. ... DevOps Summit Power Panel ...
Read more

EMC Public Sector

About EMC Public Sector; Contact; Post navigation ... It’s not too late to register for the Cloudera Federal Forum, ...
Read more

FedCyber 2015 Annual Summit - FedCyber Threat Expo

FedCyber 2015 Annual Summit. Tyson's Corner Marriott 8028 Leesburg Pike ... As with all past FedCyber Events, federal government employees attend for free.
Read more