Published on March 16, 2014
Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 7 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC BY 4.0
Agenda You can always find the latest version of this document at http://bit.ly/1fyOSnN Week 7 Overview Discussions Learning Path Activities Assignment Submission Adaptive Learning References Citation “Action is the foundational key to all success” - Pablo Picasso
Social Discourse: Discuss about IBM Watson. Continue building R-COP and Modern Data Platforms-COP Learning plan: Read about MapReduce, Lambda Architecture, Google Query Activities: Continue Hortonworks Tutorials. Explore Google Public Datasets and BigQuery Assignment 7: Perform queries on Baseball Statistics dataset DSE 400 - Week 7 at a glance
Discussion: Watch Ken Jennings: Watson, Jeopardy and me, the obsolete and share your thoughts/reflections on the evolving domain of “Cognitive Computing” Inline with our Open Innovation model, we are expanding our Social Discourse mode to Linkedin, Facebook and Google+ Discussions on SONO will continue as planned on DSE 400 Jump Pad. This will allow more choice for participants. We are hoping this will result in the increased social engagement. Check out Language R and Modern Data Platforms Communities of Practice (COPs) to help you increase your competence in R, Machine Learning, Hadoop ecosystem and other platforms. Reach out to Olivia Ramirez, Ellen Brock or Manju Rupani if you want to contribute to these communities. Social Engagement - Week 7 SONO Linkedin Facebook Google+
Read Practical illustration of Map-Reduce (Hadoop-style), on real data by Dr. Vincent Granville Read Lambda Architecture for Big Data Systems by Michael Walker Read Google BigQuery Tutorial <Optional> Watch Hadoop - The Data Scientist's Dream <Optional> Watch Hadoop MapReduce Example - How good are a city's farmer's markets by Helen Zeng <Optional> Watch Google BigQuery in Ten Minutes Recommended Learning Plan
Activities <Practice> Check out Visualization of the Day at Data Science Central. As the name suggests, it is going be different everyday. Explore the alternative ways of representing this. Could you have presented this in a better way? <Practice> Visit Google Public Data Directory. Explore Greenhouse Gas Emissions by country. How does your country fare per capita wise compared to leading contributors. Also check out IMF World Outlook dataset. Visualize the data on Unemployment rate (this can be found under people category). <Practice> Continue Hortonworks Tutorials on HDP 2.0. We will return to Hadoop and its ecosystem in DSE 502 which will focus on Modern Data Platforms. In the meantime you can also participate in Modern Data Platforms-Community of Practice, contribute to discussions on this subject.
Assignment 7 - Submission Required HDP 2.0 R-SQLDF BigQuery Download Sean Lehman’s baseball statistics dataset. Using either HDP 2.0 (or its equivalent Hadoop platform), or R-sqldf or Google BigQuery compute the following. a) group the data contained in Batting table showing maximum runs every year b) similarly group the data contained in Batting table showing average runs every year c) display maximum runs for each year and the associated player (last_name and first_name) using Batting and Master tables in combination (i.e. by joining Batting and Master tables) You may reach out to Rachel Fleming <email@example.com> if you have any difficulties with the assignments or looking for more challenging assignments or activities.
Submission in PDF format is required Recommended Deadline: Saturday, 11:59 PM your local time. If you can’t submit your assignment in time, please complete it and turn it in ASAP. While there is no penalty for late submission, it will help you focus on next week’s lessons if you turn in assignments in time. Mail Assignment 7 to <firstname.lastname@example.org> with DSE 400 > Assignment 7 in the subject line. Submit a single PDF document showing your queries and result samples. Include screenshots as necessary. Naming convention DSE 400 - Assignment 7 - Your Full Name is required for your document for the sake of consistency. No document links should be sent. Just one single PDF document, and Only in PDF format is accepted.
Adaptive Learning Options Data Scientist Enablement program Maturity Composite Score * Proficiency Certificate Level 5 > 90 Innovating Capability Black Belt Level 4 > 80 and <= 90 Architectural Capability Green Belt Level 3 > 70 and <= 80 Solutioning Capability Yellow Belt Level 2 > 60 and <= 70 Basic Understanding Completion Level 1 <= 60 Basic Familiarity Audit * Composite score is computed taking into consideration of performance of participants in assignments, activities, projects, social engagement, collaboration, team development, publications and advanced research etc. in all 4 modules of DSE program
References, Resources and Additional Reading 17 short tutorials all Data Scientists should read (and practice). Dr. Granville. Data Science Central Hadoop Illuminated. Kerzner and Maniyam. Hadoop Illuminated LLC 2013 Hadoop Definitive Guide. 3rd Edition. Tom White. O’Reilly Publications. 2012 Programming Hive. Capriolo et. al. O’Reilly Publications. 2012 Mapreduce: Simplified Data Processing on Large Clusters. Dean and Ghemavat. Google 2004 [MIT OCW] How to Process, Analyze and Visualize Data. Marcus & Wu. 2012 The Modern Data Architecture for Predictive Analytics Big Data - Hadoop, Hive, Pig and Hbase video collection Language R-Community of Practice Modern Data Platforms-Community of Practice Data Science Enablement playlist
Citation Content that appears as is, on this document only, is under Creative Commons License CC BY 4.0 This license may not necessarily apply to other material referenced here in this document. Baseball dataset used in this week’s activities and assignment is attributed to Sean Lehman. This dataset is adapted under Creative Commons Licence 3.0 Content from IBM, Hortonworks, Google, Data Science Central and O’Reilly Media etc. is excluded from the above Creative Commons License.
For More Information Week 7 discussions take place during this week on DSE 400 forums on Linkedin, Facebook, Google+ and SONO. There is also an active Q&A session for everyone's benefit. Also check out Language R- Community of Practice if you would like to advance your competence in R or if you would like to contribute to this community. <Mentoring On Demand> You may reach out to Rachel Fleming <email@example.com> if you have any difficulties with the assignments or looking for more challenging activities. If you need a mentor or someone to help you accelerate along the DSE program, you may reach out to Vishal Kumar <wishall. firstname.lastname@example.org> or Ligia Buzan<email@example.com> We welcome questions, thoughts and suggestions. Post these in the right forums/discussions or write to us at <firstname.lastname@example.org> You can always find the latest version of this document and other DSE 400 roadmaps at http://bitly. com/bundles/o_4ldaljhta4/1
Thank You The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis, but it has no power of anticipating any analytical revelations or truths. Its province is to assist us in making available what we are already acquainted with. - Ada Lovelace
Data Scientist Enablement DSE 400 ... Data Scientist Enablement Roadmap ... Data scientist enablement dse 400 week 7 roadmap.
1.Data Scientist Enablement DSE 400 ... Data scientist enablement dse 400 week 7 roadmap. Data scientist enablement dse 400 week 6 roadmap.
1.Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 6 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In ...
1.Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 3 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In ...
This was followed in 2009 with the development of the "Change" program, ... data centers, personnel ... in defining a pragmatic adoption roadmap including ...
Reviews and analyzes data from all sources, e.g., ... (DSE) is an advanced ... Associate Research Scientist ...
Теперь у меня есть свой сайт, который я буду наполнять всякими интересностями и ...
... Who is hiring? (April 2015) ... In your first week you'll ship at least one customer ... computer vision, data scientist, android, iOS, web (front ...