Exploratory Data Analysis

Information about Exploratory Data Analysis

Published on February 15, 2014

Author: thinrhino



Talk given by me at Gnunify 2014 on Exploratory Data Analysis

Exploratory Data Analysis Aditya Laghate Twitter: @thinrhino 1

Who am I? • A pseudo geek • Freelance software consultant • Wildlife photographer Twitter: @thinrhino 2

Agenda • • • • Data gathering Data cleaning Usage of classic unix tools Data analysis Twitter: @thinrhino 3

Data Gathering • Public data websites o o • Social websites o o • Blogs / websites /etc via scrapping Twitter: @thinrhino 4

Data cleaning • Eg: openrefine o OpenRefine (ex-Google Refine) is a powerful tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase o Twitter: @thinrhino 5

Classic Unix Tools • sed /awk • Shell scripts • GNU parallel o Examples: o cat rands20M.txt | awk '{s+=$1} END {print s}’ o cat rands20M.txt | parallel --pipe awk '{s+=$1}END{print s}' | awk '{s+=$1} END {print s}’ o wc -l bigfile.txt o cat bigfile.txt | parallel {print s}' Twitter: @thinrhino --pipe wc -l | awk '{s+=$1} END 6

Data Analysis Twitter: @thinrhino 7

Questions @thinrhino Twitter: @thinrhino 8

