Barga Data Science lecture 1

50 %
50 %
Information about Barga Data Science lecture 1

Published on April 24, 2016

Author: rsbarga

Source: slideshare.net

1. Deriving Knowledge from Data at Scale

2. Deriving Knowledge from Data at Scale

3. Deriving Knowledge from Data at Scale

4. Deriving Knowledge from Data at Scale

5. Deriving Knowledge from Data at Scale

6. Deriving Knowledge from Data at Scale Will

7. Deriving Knowledge from Data at Scale

8. Deriving Knowledge from Data at Scale Will Not

9. Deriving Knowledge from Data at Scale rsbarga@gmail.com

10. Deriving Knowledge from Data at Scale this is important… again important…

11. Deriving Knowledge from Data at Scale

12. Deriving Knowledge from Data at Scale relative scale Profile Yourself, upload to dropbox for Lecture 1 in PDF or Word

13. Deriving Knowledge from Data at Scale What kind of things does a data scientist do?...

14. Deriving Knowledge from Data at Scale Dilbert Jan 5, 2000 Define “Data Scientist”

15. Deriving Knowledge from Data at Scale By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meets Columbo – starry eyed explorers and skeptical detectives. Monica Rogati (LinkedIn) Search Trends for “Data Scientist” A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product. Hilary Mason (Bit.ly)

16. Deriving Knowledge from Data at Scale Computer Science

17. Deriving Knowledge from Data at Scale 65% of enterprises feel they have a strategic shortage of data scientists, a role many did not even know existed 12 months ago…

18. Deriving Knowledge from Data at Scale

19. Deriving Knowledge from Data at Scale

20. Deriving Knowledge from Data at Scale

21. Deriving Knowledge from Data at Scale

22. Deriving Knowledge from Data at Scale

23. Deriving Knowledge from Data at Scale

24. Deriving Knowledge from Data at Scale

25. Deriving Knowledge from Data at Scale

26. Deriving Knowledge from Data at Scale 10 Important Ideas 10 Important Ideas Each will be a topic of at least one lecture

27. Deriving Knowledge from Data at Scale #1 Interdisciplinary Data Science critical component of your success going forward

28. Deriving Knowledge from Data at Scale #2 Democratization of Machine and Statistical Learning Algorithms using the algorithms understand their meaning and potential impact

29. Deriving Knowledge from Data at Scale #3 Build a solid foundation of good coding practices

30. Deriving Knowledge from Data at Scale #4 Data Strategy thinking in terms of a data strategy is a useful paradigm

31. Deriving Knowledge from Data at Scale #5 Little Data

32. Deriving Knowledge from Data at Scale #6 The Space between the Data Set and the Algorithm

33. Deriving Knowledge from Data at Scale #7 Being Human

34. Deriving Knowledge from Data at Scale #8 Causation or Causality, Correlation and Experiments

35. Deriving Knowledge from Data at Scale #9 Feedback Loop

36. Deriving Knowledge from Data at Scale #10 Causing the Future Prediction Causation not only capable of Predicting the Future, but also of Causing the Future

37. Deriving Knowledge from Data at Scale

38. Deriving Knowledge from Data at Scale

39. Deriving Knowledge from Data at Scale

40. Deriving Knowledge from Data at Scale

41. Deriving Knowledge from Data at Scale

42. Deriving Knowledge from Data at Scale

43. Deriving Knowledge from Data at Scale My perspective…

44. Deriving Knowledge from Data at Scale Building Predictive Models Business Insights 1 2 34 5 Note: This is a variant of the Cross-Industry Standard Process for Data Mining (CRISP-DM)

45. Deriving Knowledge from Data at Scale My Process Model

46. Deriving Knowledge from Data at Scale Define Objective Access and Understand the Data Pre-processing Feature and/or Target construction 1. Define the objective and quantify it with a metric – optionally with constraints, if any. This typically requires domain knowledge. 2. Collect and understand the data, deal with the vagaries and biases in the data acquisition (missing data, outliers due to errors in the data collection process, more sophisticated biases due to the data collection procedure etc 3. Frame the problem in terms of a machine learning problem – classification, regression, ranking, clustering, forecasting, outlier detection etc. – some combination of domain knowledge and ML knowledge is useful. 4. Transform the raw data into a “modeling dataset”, with features, weights, targets etc., which can be used for modeling. Feature construction can often be improved with domain knowledge. Target must be identical (or a very good proxy) of the quantitative metric identified step 1.

47. Deriving Knowledge from Data at Scale Feature selection Model training Model scoring Evaluation Train/ Test split 5. Train, test and evaluate, taking care to control bias/variance and ensure the metrics are reported with the right confidence intervals (cross-validation helps here), be vigilant against target leaks (which typically leads to unbelievably good test metrics) – this is the ML heavy step.

48. Deriving Knowledge from Data at Scale Define Objective Access and Understand the data Pre-processing Feature and/or Target construction Feature selection Model training Model scoring Evaluation Train/ Test split 6. Iterate steps (2) – (5) until the test metrics are satisfactory

49. Deriving Knowledge from Data at Scale Access Data Pre-processing Feature construction Model scoring

50. Deriving Knowledge from Data at Scale Machine Learning Lectures on Top Techniques

51. Deriving Knowledge from Data at Scale Out of Class Reading Week One

52. Deriving Knowledge from Data at Scale Break,10 minutes…

53. Deriving Knowledge from Data at Scale

54. Deriving Knowledge from Data at Scale

55. Deriving Knowledge from Data at Scale

56. Deriving Knowledge from Data at Scale in favor of more information beats better algorithms

57. Deriving Knowledge from Data at Scale in favor more information beats better algorithms 2. You will write data manipulation algorithms

58. Deriving Knowledge from Data at Scale in favor of more information beats better algorithms 2. You will write data manipulation algorithms • Data is surprising enough, need algorithm certainty • Defect count is proportional to line count • Use as high level a language as possible

59. Deriving Knowledge from Data at Scale

60. Deriving Knowledge from Data at Scale

61. Deriving Knowledge from Data at Scale

62. Deriving Knowledge from Data at Scale 3. Latter case: get first 80% and move on to new problem

63. Deriving Knowledge from Data at Scale

64. Deriving Knowledge from Data at Scale

65. Deriving Knowledge from Data at Scale 2. Don’t require a large data set before starting analysis.

66. Deriving Knowledge from Data at Scale 2. Don’t require a large data set before starting analysis. 3. Always try things out on small portions of data first.

67. Deriving Knowledge from Data at Scale 1. Immediate zone: less than 60 seconds • 100s per day 2.Bathroom break zone: less than 5 minutes • 10s per day 3.Lunch zone: less than an hour • 5 per day 4.Overnight zone: less than 12 hours • 1 per day

68. Deriving Knowledge from Data at Scale Fast 1. Immediate zone: less than 60 seconds • 100s per day 2.Bathroom break zone: less than 5 minutes • 10s per day 3.Lunch zone: less than an hour • 5 per day 4.Overnight zone: less than 12 hours • 1 per day

69. Deriving Knowledge from Data at Scale Slow 1. Immediate zone: less than 60 seconds • 100s per day 2.Bathroom break zone: less than 5 minutes • 10s per day 3.Lunch zone: less than an hour • 5 per day 4.Overnight zone: less than 12 hours • 1 per day

70. Deriving Knowledge from Data at Scale

71. Deriving Knowledge from Data at Scale

72. Deriving Knowledge from Data at Scale Stay in the immediate zone.

73. Deriving Knowledge from Data at Scale

74. Deriving Knowledge from Data at Scale

75. Deriving Knowledge from Data at Scale

76. Deriving Knowledge from Data at Scale

77. Deriving Knowledge from Data at Scale Break,10 minutes…

78. Deriving Knowledge from Data at Scale Causal Analysis in Online Display Advertising Dilbert

79. Deriving Knowledge from Data at Scale The Life of a Browser Process. 2. Use observed data to build list of prospects 3. Subsequently observe same browser surfing the web the next day 4. Browser visits a site where a display ad spot exists and bid requests are made 5. Auction is held for display spot 6. If auction is won display the ad 7. Observe browsers actions after displaying the ad 1. Observe people taking actions and visiting content

80. Deriving Knowledge from Data at Scale What Do Advertisers Want? Conversions? 0% 2% 4% 6% 8% 10% 12% 14% RETARGETING M6D PROSPECTING RETARGETING M6D PROSPECTING RETARGETING M6D PROSPECTING CONVERSIONRATE Conversion Rates SAW AD TELECOM COMPANY A TELECOM COMPANY B TELECOM COMPANY C Three different telecoms; Raw conversion deceiving, connecting data to business value); What is the effectiveness of the add?

81. Deriving Knowledge from Data at Scale What Do Advertisers Want? 0% 2% 4% 6% 8% 10% 12% 14% RETARGETING M6D PROSPECTING RETARGETING M6D PROSPECTING RETARGETING M6D PROSPECTING CONVERSIONRATE RELATIVE LIFT: EXPOSED VS. UNEXPOSED USERS DID NOT SEE AD SAW AD 1.05X 2.62X 1.11X 1.31X 0.92X 2.26X TELECOM COMPANY A TELECOM COMPANY B TELECOM COMPANY C Conversions?

82. Deriving Knowledge from Data at Scale . What is the causal effect of display advertising on customer conversion? display advertising Showing/Not showing a browser a display ad. customer conversion Visiting the advertisers website in the next 5 days.

83. Deriving Knowledge from Data at Scale . 1. Ask the right question 3. Translate question into a formal quantity 4. Try to estimate it 2. Understand/express the causal process

84. Deriving Knowledge from Data at Scale What is the effect of display advertising on customer conversion? 1. state question. display advertising Showing/Not showing a browser a display ad. customer conversion Visiting the advertisers website in the next 5 days.

85. Deriving Knowledge from Data at Scale 2. express causal process. O = (W,A,Y) ~ P0 W – Baseline Variables A – Binary Treatment (Ad) Y – Binary Outcome (Purchase)

86. Deriving Knowledge from Data at Scale Data Structure: Our Viewers. CHARACTERISTICS (W) TREATMENT (A) CONVERSION (Y) Color Sex Head Shape Ad No Ad No Yes

87. Deriving Knowledge from Data at Scale 3. define quantity. E[YA=ad] – E[YA=no ad] E[YA=ad]/E[YA=no ad] Additive Impact Relative Impact

88. Deriving Knowledge from Data at Scale 4. estimate quantity. 1. A/B testing 2. Modeling Observational Data

89. Deriving Knowledge from Data at Scale Hard to get right… Since we can not both treat and not treat the SAME individuals. Randomization is used to create “EQUIVALENT” groups to treat and not treat. 3.4 per 1,000 1.6 per 1,000

90. Deriving Knowledge from Data at Scale . 1. Cost of displaying PSAs to the control (untreated group). 2. Overhead cost of implementing A/B test and ensuring that it is done CORRECTLY. 3. Wait time necessary to evaluate the results. 4. No way to analyze past or completed campaigns.

91. Deriving Knowledge from Data at Scale . Estimate The Effects in the Natural Environment (Observed Data) Use the results of a normal campaign. Red people don’t convert so unlikely to see ad. Blue and Grey with round heads are good converters so more likely to see advertisements. So we have a bias in the presentation and hence the results

92. Deriving Knowledge from Data at Scale “ ” Need to adjust for the fact that the group that saw the advertisement and the group that didn’t may be very different.

93. Deriving Knowledge from Data at Scale . 1. When can we estimate it? Necessary conditions: • no unmeasured confounding (need to account for all) • experimental variability/positivity (present to all groups) 2. Be VERY careful with data collection • Define cohorts and follow them over time 3. Estimation techniques • Unadjusted • Adjust through gA • MLE (max likelihood estimation) estimate of QY • Double robust combining gA and QY • TMLE (targeted maximum likelihood estimation) Two are conditional probabilities… 4. Many tools exist for estimating binary conditional distributions • Logistic regression, SVM, GAM, Regression Trees, etc. P(W) P(A|W) P(Y|A,W) QW QY gA

94. Deriving Knowledge from Data at Scale :

95. Deriving Knowledge from Data at Scale That’s all for tonight….

Add a comment

Related pages

Automatic Generation of Workflow Provenance - Springer

We argue that workflow provenance data ... Volume 4145 of the series Lecture Notes in Computer Science pp 1-9. Automatic Generation of Workflow Provenance.
Read more

Automatic capture and efficient storage of e-Science ...

... Science Gateway Workshops 2013; Virtual Issue: International Conference on Performance Engineering (2013) Virtual Issue: Emerging Computational ...
Read more

Automatic Generation of Workflow Provenance (PDF Download ...

... in Lecture Notes in Computer Science ... Roger S. Barga 1 and Luciano A. ... Data provenance [5], [6], ...
Read more

dblp: 22. SSDBM 2010: Heidelberg, Germany

SSDBM 2010: Heidelberg, Germany. ... Lecture Notes in Computer Science 6187, ... A Framework for Moving Sensor Data Query and Retrieval of Dynamic ...
Read more

Client + Cloud: Evaluating Seamless Architectures for ...

Science is becoming data-intensive, ... Client + Cloud: Evaluating Seamless Architectures for Visual Data Analytics in the ... Barga, R.S ...
Read more

Persistent Client-Server Database Sessions

Roger S. Barga, David B. Lomet, Thomas Baby, ... 1 Introduction ... (Open Data Base Connectivity) ...
Read more

Machine Learning | Microsoft Azure

Get started now with Azure Machine Learning for powerful cloud-based analytics, ... 1-800-867-1389 United States: ... If you have data science skills, ...
Read more