Data Science Symposium @ NIST

40 %
60 %
Information about Data Science Symposium @ NIST

Published on March 5, 2014

Author: rheimann04



Poster session with Max Watson at the inaugural Data Science Symposium @ NIST

A Blended Approach to Big Data Analytics Richard Heimann & Max Watson Data Tactics Corporation A Blended Approach to Data Science and Big Data Analytics: ! The Blended Approach to Big Data Analytics and Data Science is an acknowledgment that data science and big data analytics is more than just algorithm development, it requires deployment. Deployment is traditionally thought of in terms of the machine, rather than the user. The utility of an analytic however, is in its ultimate use mitigated by its deployment and presentation to users. Data Science requires two elements to deploy successful analytics. Hard Elements = Objective pattern discovery and recognition of precise and accurate patterns minus pattern paradoxes. Soft Elements = Sensitive to user experiences and workflow, allowing users to subjectively evaluate patterns for their uniqueness, unexpectedness, actionability and novelty. Analytics promises wisdom, but has at times provided trivial pattern discovery, pattern paradoxes, and/or overwhelming number of patterns - ultimately confusing users and leaving its potential unfulfilled. A Blended Approach delivers a nontrivial process of identifying valid, novel, potentially useful, and ultimately comprehensible knowledge from big data systems that can be used by domain experts to support crucial intelligence decisions. • Nontrivial: Complex computations are required to expose novel insights into big data. This includes analytics pluralism (many algorithms). • Novel: The discovered patterns should be new to the organization and understood to be meaningful by users. • Useful: Analytic frameworks aid the development of valid algorithms. Analysts of such analytics should be able to act upon these patterns to make better decisions. • Comprehensible: New patterns should be understandable and thereby improve understanding. • Valid: Algorithms need to be designed and developed by competent data scientist to ensure reliability and validity. The key components to a Blended Approach to Data Science and Big Data Analytics: ! 1) Objective and Subjective Pattern Discovery -- A Blended Approach to Analytics 2)Interactive Analytics + Enhanced Visualization = Intelligent Data Analysis; Shiny Users are enabled with mutable analytical elements and allowed to tweak parameters, refine the method, visualize the effect, and interpret the subsequent changes. Figure 1 Data Science can often generate hundreds, maybe thousands of patterns. The task of pattern recognition really becomes one of determining the most useful patterns from those that are trivial. One of the most efficient ways to do this is by allowing users to engage more with algorithms. The challenge, is to address the subjective/objective jointly, in a hybrid model and mediate the dichotomy by using techniques that can reduce the knowledge acquisition bottleneck. The evaluation of hybrid modes has to yield good results. Pattern paradox detection lies midway between the subjective and objective measures. Goal directed analysis with user enrichment versus data directed analysis is intimately related to the use of subjective and objective measures. Figure 2 Figure 3 Figure 4 Shiny is a web based presentation layer for the statistically programming language R and enables the sought interaction between users and a given analytic. E.g. Figure 1: Discontinuities is a change detection algorithm used to detect breaks e.g. social media Figure 2: a unique implementation of topic models coined topic graphs - treating network analysis and topic discovery jointly. Figure 3: a density based cluster analysis is used to classify data e.g. log files & netflow data. Figure 4: a supervised-by-supervised outlier detection algorithm, coined LUBaP. How does the Blended Approach fit into the Data Science ecosystem? ! No Free Lunch (NFL) for Data Science suggests that analytics must be designed for a specific type of problem and performs no better than any other when averaged over all possible problem sets. There is no such thing as a general purpose algorithm across all problems. The lesson is that the elegance of analytics lies, at times, in its inelegance. Overlapping solutions therefore, or a pluralist approach, may be optimally fitted within the blended approach. ! The blended approach is a mixture of objective and subjective pattern discovery, facilitated by interactive analytics as well as overlapping solutions. An example would be the overlapping of two solutions to analyze Data D with Analytics A and Analytics B where A provides some insight to smooth pattern detection like a summary analytics and B offers some insight to rough patterns in the data such as outlier detection. These two methods offer unique insights and may at times validate each other. Users would understand both structural patterns and structural breaks in D. NFL shows us that this may be the best environment for data science. ! The Blended Approach unlocks the power of data science. Data Science still approaches the problem by assuming there's a best way to solve a problem, but ignore alternate solutions and, most egregiously, ignore the user. The lesson is to abandon the question "What is the cleverest way to solve the problem?" in favor of "Are there multiple, overlapping ways to solve this problem?” ! Data Scientists that have massive amounts of data without massive amounts of clue are going to be displaced by data scientists that have less data but more clue.

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

NIST Data Science Symposium - National Institute of ...

Given the explosion of data production, storage capabilities, communications technologies, computational power, and supporting infrastructure ...
Read more

Presentations - 2014 NIST Data Science Symposium

Presentations for the NIST Data Science Symposium that took place on March 4-5 2014
Read more

NIST Data Science Symposium - CNI: Coalition for Networked ...

On November 18-19, the US National Institute of Standards and Technology (NIST) is going to be hosting an interesting Data Science Symposium that is ...
Read more

NIST Data Science Symposium, Mar 4-5 - KDnuggets

NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY (NIST) Data Science Symposium. March 4-5, 2014 NIST Campus in Gaithersburg, MD SUMMARY: Given the ...
Read more

Data Science Symposium @ NIST - Technology

Poster session with Max Watson at the inaugural Data Science Symposium @ NIST
Read more

Data Science Symposium 2013 -

Story. Proceedings. Data Science @ the NIST Data Science Symposium 2014. This symposium was scheduled for last November 18-19, 2013, but the Federal ...
Read more

NIST Data Science Symposium, Mar 4-5 -

NIST is forming a cross-cutting data science program, focused on system benchmarking and rigorous measurement. Register for NIST Data Science Symposium by ...
Read more


This is the Symposium of the well-established series of conferences on thermophysical properties. ... Data Correlation ... Properties for Materials Science ...
Read more

Uncertainty quantification | NIST

NIST Menu. Topics Expand or ... Center for Nanoscale Science and Technology; NIST Center for Neutron Research; Research Test Beds; ... Data. Chemistry WebBook;
Read more