advertisement

Data scientist enablement dse 400 week 2 roadmap

25 %
75 %
advertisement
Information about Data scientist enablement dse 400 week 2 roadmap

Published on March 16, 2014

Author: MohanBavirisetty

Source: slideshare.net

advertisement

Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 2 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC BY 4.0

Agenda You can always find the latest version of this document at http://bit.ly/1dVHJwO Week 1 Recap Week 2 At a Glance Discussions Required Reading Practice Assignments and Submission Looking ahead References Citation Acknowledgement A strong will, a settled purpose, and an invincible determination can accomplish almost anything. - Thomas Fuller

During week 1 you were able to Understand Data Science is and articulate what Data Scientists do on day-to-day basis Installed R and R-Studio Explored UCI Machine Learning Repository Import Housing Dataset into R Explored SONO and participated in Discussions DSE 400 - Week 1 Recap

Discussions: Fuss about Big Data. Statistical sampling etc. Optional Q&A Reading plan: Read Chapters 4-7 from An Introduction to Data Science R for Machine Learning by Allison Chung Activities: Play with spreadsheets, continue research on Data Viz. tools, connect with local groups etc. Assignment 2: Download Haberman dataset from UCI Machine Learning Repository into your R-Studio environment and visually describe this dataset. DSE 400 - Week 2 at a glance

Discussion 1: What’s all this fuss about Big Data? How would you go beyond talking about 3 or 4 Vs of Big Data? Volume, Variety, Velocity, and Veracity (by the way veracity means trustworthiness of this data). How about Value? Do the people talk about it in the context of Big Data? Share your thoughts. Discussion 2: “Statistics is defined as the discipline of using data samples to support claims about populations.” Comments? These discussions are required. These will be posted sequentially. If you have access to SONO you are encouraged to participate in these discussions. There will also be an Optional Q&A For the sake of simplicity and ease of navigation, please do not create additional threads. Social Engagement on SONO - Week 2 http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1002

SONO or SOKNO (Social Knowledge platform) is chosen for the DSE program to enable Social Engagement, Collaboration as well as Knowledge Dissemination which are all important to an Open initiative like this. We understand that many of you may be initially having navigational issues. To ease things, here are some tweaks SONO team and the DSE community are developing, as we speak. To enter a Knowledge Cell, login first then use the full url to enter right KC. For week 2 you would use the following link http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1002 Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on these urls. Once you are in a KC click on threads to go to the current discussions. We certainly appreciate your patience during this transitory phase. SONO Tweaks

Read Chapters 4-7 from An Introduction to Data Science Read R for Machine Learning by Allison Chung (Sections 1 to 3.4, pages 1-5 ) <Optional> Introduction to Probability and Statistics Using R (Chapters 1-3) <Optional> Read Chapter 2-7 from Think Stats: Probability and Statistics for Programmers If you are unfamiliar with basic Statistical concepts or if you need a quick refresher on this topic, please refer to Statistics Playlist by Khan Academy Week 2 - Recommended Reading Plan Data Scientists (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. - Josh Wills

<Practice> Given the following dataset, find manually the mean, median, mode, variance and standard deviation for this population. { 3, 15, 17, 18, 20, 20, 12, 20, 20, 16, 17, 12, 4, 7, 15, 20, 12, 6, 1, 20 } Also try using a spreadsheet (such as Excel or Google Spreadsheet) to find the above measures for the same dataset. <Practice> Math is Fun. Learn what Relative Frequency Distribution is. Try the example at the bottom of the this page. <Community Outreach> <Optional> Explore and connect with your local R Group (or Data Scientist/Big Data groups) and check out their projects, talks and seminars that might interest you. Also discuss with them how you can engage with them and help them out in their endeavors. Activities

<Optional> If you are not fully happy with the statistical functionality of your familiar spreadsheet package, download PSPP free statistical analysis tool from SourceForge and play with it. <Optional> <Advanced> Import Housing Data into R-Studio and describe it statistically. You may need packages like pastecs which let you use stat.desc function. <Optional> Register for Big Data in Motion This is a free online webinar scheduled for Jan 30, 2014, 1 PM EST. Attendance is optional but recommended. Need more? Reach out to our Research Scholar Ms. Rachel Fleming < Rachel@emodern.biz> and ask for more activities and challenges. Activities - contd ...

Assignment 2 - Submission Required Assignment: Download Haberman Survival dataset from UCI Machine Learning Repository. Import this dataset into your R-Studio. Generate three graphic representations: Histogram, Scatter Plot and Box Plot , as depicted above. Refer to R for Machine Learning by Allison Chung before you attempt this assignment. Image credit: R for Machine Learning by Allison Chung <Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming <rachel@emodern.biz> if you have any difficulties with this assignment.

Submissions Deadline Saturday, 11:59 PM your local time. Mail Assignment 2 to <datascience400@gmail.com> Submit a PDF document of the screenshots of your R- Studio workspace showing the three visualizations discussed. Use the naming convention: DSE 400 > Assignment 2 > Your Full Name for your document. No document links should be sent. Please add DSE 400 > Assignment 2 in the subject line.

Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes, Recommendations and Boosting algorithms . Refer to R for Machine Learning by Allison Chung Watch Caltech Machine Learning Videos on Youtube Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc. Week 8 Ethics, Privacy and Building Data Products. DSE 400 - Weeks 3-8 ahead

References, Resources and Additional Reading An Introduction to Data Science Think Stats: Probability and Statistics for Programmers Statistics Playlist by Khan Academy R for Machine Learning by Allison Chung Introduction to Hypothesis Testing Single Sample Hypothesis Testing Part 1 and Part 2 R for Beginners by Emmanuel Paradis R - Reference Cards Introduction to R Playlist (Video Collection) on Youtube Caltech Machine Learning Playlist on Youtube [MIT OCW] Prediction: Machine Learning and Statistics from MIT Sloan School of Management,

Citation The dataset titled Haberman's Survival Data used here for Assignment 2 comes from UCI Machine Learning Repository Donor for Haberman's Survival Data: Tjen-Sien Lim (limt@stat.wisc.edu). It was added UCI Machine Learning Repository on March 4, 1999 R for Machine Learning by Allison Chung is recommended by MIT Course Prediction: Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE 400 as per OCW guidelines. Content that appears as is on this document only, is under Creative Commons License CC BY 4.0 This license may not necessarily apply to other material referenced here in this document.

For More Information Presentation deck for DSE 400 > Week 1 Roadmap can be found at http://bit.ly/1hC5wAV Week 2 discussions take place during this week on SONO DSE 400 Week 2 <Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming <rachel@emodern.biz> if you have any difficulties with the assignments. We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <datascience400@gmail.com> You can always find the latest version of this document at http://bit.ly/1dVHJwO

Fun@Work Geographic distribution of clicks for Week 1 Roadmap

Fun@Work Open Source Humor

We thank our community of committed and passionate volunteers, experts, educators, innovators, benefactors, advisers, advocates, mentors and supporters We are also grateful to the outstanding support and encouragement from SONO team as well as other organizations like MIT Sloan of Management, IBM, HortonWorks, R-Project, Creative Commons, Open Courseware Consortium, Stanford University, Caltech, O’Reilly Publications and Data Science Central etc. Acknowledgement

Thank You

Add a comment

Related pages

Data scientist enablement dse 400 week 2 roadmap - Documents

Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 2 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration ...
Read more

Data scientist enablement dse 400 - week 1 roadmap - Documents

Data scientist enablement dse 400 week 2 roadmap. Data scientist enablement dse 400 week 8 roadmap. ... Data Scientist Enablement roadmap 1.0.
Read more

Data Scientist Enablement Roadmap 1.0 - Google Slides

Data Scientist Enablement Roadmap. ... DSE 400. On Demand Data Integration. ...
Read more

DSE 400 Fast Track to Data Science, Free Online Course

Data Scientist Enablement (DSE) ... A roadmap for Data Scientist Enablement program. ... DSE 400 Fast Track to Data Science Free Online Course ...
Read more

Manoj Channa | LinkedIn

View Manoj Channa’s professional profile on LinkedIn. ... Data Scientist Enablement ... DSE 400 - Fast Track to Data Science;
Read more

Avaya - Customer & Team Engagement Solutions – Business ...

Enablement Services. Deployment ... They implemented an Avaya solution which allows data to be transferred ... approximately 400 beds and care for nearly ...
Read more