Published on April 2, 2014
Big Data Analytics, R&D Robert Andrew Stevens, CFA John Deere
Disclaimer The information, views, and opinions contained in this presentation are those of the author and do not necessarily reflect the views and opinions of John Deere
Outline = Favorite Quotes 1. ―when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind‖ 2. ―it takes all the running you can do, to keep in the same place‖ 3. ―The future is already here – it’s just not evenly distributed‖ 4. ―The essence of strategy is the timing of the sunk cost commitment‖ 5. ―Americans can always be counted on to do the right thing...‖
―when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind‖ ―I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.‖ Lecture on ―Electrical Units of Measurement‖ (3 May 1883), published in Popular Lectures Vol. I, p. 73; quoted in Encyclopaedia of Occupational Health and Safety (1998) by Jeanne Mager Stellman, p. 1992http://en.wikiquote.org/wiki/William_Thomson http://en.wikipedia.org/wiki/Lord_Kelvin William Thomson, 1st Baron Kelvin 1824–1907 a.k.a.: Lord Kelvin Occupation: mathematical physicist and engineer
What is Analytics? Turning Data into Decisions Production, Assembly, Inspection Distribution Consumers Consumer Research Design and Redesign Receipt and Test of Materials Tests of Process, Machines, Methods, Costs Suppliers of Materials and Equipment * Deming, W.E. Out of the Crisis,1986 (p. 4) Production Viewed as a System * Take Action!
The Road to Earlier Discovery and Shorter Decision Cycles
Big Data in R&D at John Deere Primarily machine data: CAN and GPS Volume: immeasurable Velocity: fast and furious Variety: nothing is the same Value: TBD
―it takes all the running you can do, to keep in the same place‖ The Red Queen's race is an incident that appears in Lewis Carroll's Through the Looking-Glass and involves the Red Queen, a representation of a Queen in chess, and Alice constantly running but remaining in the same spot. ―Well, in our country,‖ said Alice, still panting a little, ―you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.‖ ―A slow sort of country!‖ said the Queen. ―Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!‖ http://en.wikipedia.org/wiki/Red_Queen's_race http://en.wikipedia.org/wiki/Lewis_Carroll Charles Lutwidge Dodgson 1832–1898 Pen name: Lewis Carroll Occupation: Writer, mathematician, Anglic an cleric, photographer, artist
The Problem/Opportunity Data generated Data analyzed Data captured and stored [Remember: DIKW = Data Information Knowledge Wisdom ?]
Ideally, if nothing changes… Today Transition Vision
But the data generated might grow faster than we can manage [Ever hear of ―The Internet of Things‖ ?] Today Transition Vision
So, maybe we should try to do something like this… [―If you want to get somewhere else, you must run at least twice as fast as that!‖] Today Transition Vision
A Solution: Data Science • Applies everywhere • Practical/feasible? • In R&D? http://www.dataists.com/2010/09/the-data-science-venn-diagram
Data Science in R&D 1. Multidisciplinary Investigations (25%) 2. Models and Methods for Data (20%) 3. Computing with Data (15%) 4. Pedagogy (15%) 5. Tool Evaluation (5%) 6. Theory (20%) Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics , ISI Review, , 69, 21-26. W. S. Cleveland, 2001. http://www.stat.purdue.edu/~wsc/papers/datascience.pdf
―The future is already here – it’s just not evenly distributed‖ — William Gibson, quoted in The Economist, December 4, 2003 http://www.economist.com/printedition/2003-12-06 http://en.wikipedia.org/wiki/William_Gibson William Gibson 1948–
CERN: Solving the Mysteries of the Universe with Big Data The Large Hadron Collider Computing Challenge • Data volume – High rate large number of channels 4 experiments – 15 PetaBytes of new data each year 30 PB in 2013 • Overall compute power – Event complexity Nb. events thousands users http://openlab.web.cern.ch/sites/openlab.web.cern.ch/files/presentations/Jarp_Big_Data_Boston_final.pdf (09/12/13)
The Scientific Method 1. Formulation of a question 2. Hypothesis 3. Prediction 4. Testing 5. Analysis http://en.wikipedia.org/wiki/Scientific_method An 18th-century depiction of early experimentation in the field of chemistry
―The essence of strategy is the timing of the sunk cost commitment‖ Verbal communication during UIUC MBA Strategic Management class http://www.amazon.com/Economic-Foundations-Strategy- Organizational-Science/dp/1412905435 http://business.illinois.edu/facultyprofile/faculty_profile.aspx?ID=99 Professor of Business Administration and Caterpillar Chair of Business University of Illinois at Urbana- Joseph T. Mahoney 1958–
What happens to Q as P 0? • Change ―Household‖ to ―Firm‖ • Change ―chocolate‖ to ―software‖ • Now what happens to Q as P 0? • How could that happen in a Big Data Analytics, R&D context?http://catalog.flatworldknowledge.com/bookhub/reader/2992?e=coopermicro-ch07_s01 Figure 7.1 The Demand Curve of an Individual Household
The One-Day MBA http://www.engineeringtoolbox.com/cash-flow-diagrams-d_1231.html http://en.wikipedia.org/wiki/Net_present_value F0 = Sunk cost investment • Assuming Ft does not decrease* for t > 0, what happens to NPV as F0 0? • How could that happen in a Big Data Analytics, R&D context? • What are the implications for strategy?
Avoid Sunk Cost Commitments and Vendor Lock-in with Open Source • Apache: http://www.apache.org/ – Hadoop, Hive, Mahout, Pig, Spark… • GRASS GIS: http://grass.osgeo.org/ • Java: http://www.java.com/ + Cassandra • Julia: http://julialang.org/ • Perl: http://www.perl.org/ • Python: http://www.python.org/ • R: http://cran.us.r-project.org/ + RHIPE • Scala: http://scala-lang.org/ + Scalding • SQL: – http://www.mysql.com/ – http://www.postgresql.org/ + PostGIS
―Americans can always be counted on to do the right thing...‖ ―...after they have exhausted all other possibilities.‖ Also famous for: ―We shall never surrender‖ ―peace in our time‖ And many others relevant to The War on Data http://www.quotedb.com/quotes/2313 https://en.wikipedia.org/wiki/Winston_churchill Sir Winston Churchill 1874–1965 Profession: Member of Parliament , statesman, soldier, journalist, historian, author, painter
Tips for winning The War on Data Teamwork Statistics Partner with IT Learn-Do-Teach Replenish your toolbox Math
Pop Quiz What are the 3 most important things in Real Estate? 1. Location 2. Location 3. Location What are the 3 most important things in Statistics? 1. Look at the data 2. Look at the data 3. Look at the data … especially for Big Data Analytics: 1. Look at the data before you analyze it: Exploratory Data Analysis (EDA) 2. Look at the data while you analyze it: model diagnostics 3. Look at the data after you analyze it: visualization and communication
Other Survival Tips • Visualization and Communication – Tools: R & Rmd, Ggobi, Tableau, ArcGIS/GRASS… – Presentations: Tell them 3X, 5Ws • Collaboration: working as a team – File and code version control – Google's R Style Guide • Reproducible Research best practices – Avoid errors by Potti (Duke) and Rogoff & Reinhart (Harvard) • http://en.wikipedia.org/wiki/Anil_Potti • http://en.wikipedia.org/wiki/Reinhart-Rogoff
Summary = Favorite Quotes 1. ―when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind‖ 2. ―it takes all the running you can do, to keep in the same place‖ 3. ―The future is already here – it's just not evenly distributed‖ 4. ―The essence of strategy is the timing of the sunk cost commitment‖ 5. ―Americans can always be counted on to do the right thing...‖ ―Those who cannot remember the past are condemned to repeat it.‖ – George Santayana
Q & A
Contact Information E-mail: firstname.lastname@example.org (business) email@example.com (personal) LinkedIn: http://www.linkedin.com/pub/robert- andrew-stevens-cfa/6a/a04/315 Twitter: https://twitter.com/RobertAndrewSt3 GitHub: https://github.com/robertandrewstevens
×Close Share Loras College 2014 Business Analytics Symposium | Gebhard Rainer: Building a Culture of Analytics
×Close Share Loras College 2014 Business Analytics Symposium | Colleen McKenna: Measuring Social Engagement
The leading global voice in enterprise innovation providing access to cutting edge content across 7 distinct channels including Big Data, Analytics, ...
2014 Business Analytics Symposium. Loras College held it’s 2014 Business Analytics Symposium on March 26-27 in Dubuque, ...
Sports analytics are changing fast ... Loras College 2014 Business Analytics Symposium | Dan Conway: Sports... 965 Views. Alteryx.
The company is adamant that it never knows the identity of the people driving by its billboard, although it admits that its partners do have that information.
From where are the new cybersecurity threats coming and how can they be prevented?
Loras College Center for Business Analytics Announces Call for Entries for 2015 Innovation Award. 2/9/2015 .