Ychao 20140720

43 %
57 %
Information about Ychao 20140720

Published on July 20, 2014

Author: yuanchao

Source: slideshare.net

Description

用 Python 玩 LHC 公開數據 - 挑戰 Atlas Higgs Machine Learning Challenge
COSCUP 2014

PythonPython 也可以玩也可以玩 LHCLHC 數據數據 教你用教你用 PythonPython 挑戰挑戰 Higgs MLHiggs ML Yuan CHAO ( 趙元 ) (National Taiwan University, Taipei, Taiwan) COSCUP 2014/07/19-20

我是誰? Yuan CHAO (John) YChao ...

研究員 高能物理 使用 OSS 做研究 ...

大型強子對撞型加速器? Large Hadron Collider!

Organisation Européenne pour la Recherche Nucléaire CERN Switzerland

Meyrin Canton de Genève Border of CH/FR

LHC 週長 : 27 公里 地下 50~150 M

LHCLHC WWWWWW 的出生地的出生地 !!!!!! SERN 晚上黑壓壓,沒有 24hr 超市 ...

LHCLHC CERN Ski ClubCERN Ski Club

LHCLHC ( 農地 ... 不能徵收 )

11 Atlas DetectorAtlas Detector A Toroidal LHC ApparatusA Toroidal LHC Apparatus 超環面儀器超環面儀器 通用型偵測器通用型偵測器

12 緊湊渺子線圈緊湊渺子線圈 CMS DetectorCMS Detector Compact Muon SolenoidCompact Muon Solenoid 緊湊渺子線圈緊湊渺子線圈 通用型偵測器通用型偵測器 3.8

13 緊湊渺子線圈緊湊渺子線圈 CMS DetectorCMS Detector Compact Muon SolenoidCompact Muon Solenoid A general purposed detectorA general purposed detector 3.8

Learning to discover 來自 Atlas 的戰帖 https://www.kaggle.com/c/higgs-boson http://higgsml.lal.in2p3.fr/files/2014/04/documentation_v1.5.pdf

尋找希格斯粒子 Atlas Higgs ML Challenge https://www.kaggle.com/c/higgs-boson $13,000 & 876 teams

Learning to discover 來自 Atlas 的戰帖 提供 250000 筆模擬訓練數據 與 550000 筆測試用數據 https://www.kaggle.com/c/higgs-boson/leaderboard

聽說有沈不住氣的同事第二天就跑去打槍了 ... https://www.kaggle.com/c/higgs-boson/leaderboard

什麼是希格斯粒子? 對稱性破壞?? 質量的來源??? http://en.wikipedia.org/wiki/Higgs_boson

19 Big News on 2012/07/04Big News on 2012/07/04 Discovery of a New BosonDiscovery of a New Boson with Masswith Mass ~125 GeV~125 GeV CERN-HI-1207136_92

20 Congrats to prof. Englert and Higgs!

什麼是希格斯粒子? BBHNNK? 捕獲野生的 P. Higgs http://en.wikipedia.org/wiki/Higgs_boson BEGHHK? P. Higgs is ASGC prof. C.-C. Lin's advisor.

22 標準模型 簡介標準模型 簡介 Standard ModelStandard Model ~10-18 m 宇宙的尺度 http://htwins.net/scale2/ ~10-1 m 膠子光子 W/Z 子 重力子 強作用力強作用力電磁力電磁力 弱作用力弱作用力 重力重力 夸 克 輕 子 奈米 =10-9 m

23 標準模型 簡介標準模型 簡介 Standard ModelStandard Model http://atlas.kek.jp/sub/photos/Physics/PhotoPhysicsSM.htm 強 子 輕 子 媒 介 子 無 法 單 獨 存 在 The "God-dammed" particle! 構成 pingooo@FNAL

今天物理 到此為止 ... 重點放在怎麼玩數據

機器學習? What & Why?

如何訓練機器? Supervised vs. Unsupervised Learning

Supervised Learning

Supervised Learning

Supervised Learning 徵音梅林音源處理 Vowel detection N U E O I A mei-ka-keng-ken-lian zhun-xi-lai-sou-pian N U E O I A

Unsupervised Learning The Google Cat @ ICML'12 Deep Learning Trained on 16K cores Done in 3 days Over 10M YouTube stills http://arxiv.org/abs/1112.6209

LHC Data meets Machine Learning

電子化之前都靠人工 http://en.wikipedia.org/wiki/Cloud_chamber

數位化 讓電腦自動處理 大量的數據

質子團每秒通過 四千萬次 (40MHz) 平均每次有 15 個對撞

真正有意義的對撞約 只有百萬分之一

攏係靠電腦選的!

37 檢視檢視 KaggleKaggle 挑戰數據挑戰數據 Data files provided on the Kaggle website:Data files provided on the Kaggle website: Training datasetTraining dataset InIn CSVCSV formatformat 250000 events250000 events ID +ID + 30 features30 features WeightedWeighted events!!!events!!! Class label: s, bClass label: s, b Test datasetTest dataset 550000 events550000 events Same formatSame format random_submissionrandom_submission Sample for evaluationSample for evaluation AMS MetricAMS Metric Python script for competition evaluation metricPython script for competition evaluation metric https://www.kaggle.com/c/higgs-boson/data

38 ROOTROOT RROOTOOT OObject-bject-OOrientedriented TToolkitoolkit Data Analysis toolData Analysis tool Written in C++ (millions of lines)Written in C++ (millions of lines) Open sourceOpen source Integrated C++ interpreterIntegrated C++ interpreter File formatsFile formats I/O handling, graphics, plotting,I/O handling, graphics, plotting, math, histogram binning, eventmath, histogram binning, event display, geometric navigationdisplay, geometric navigation Powerful fitting (RooFit) andPowerful fitting (RooFit) and statistical (RooStats) packagesstatistical (RooStats) packages In use by most of HEP experimentsIn use by most of HEP experiments Standard tool for producing physicsStandard tool for producing physics results at LHCresults at LHC New tools for model creation andNew tools for model creation and combinationscombinations http://root.cern.ch/drupal/

39 pyROOTpyROOT RROOTOOT OObject-bject-OOrientedriented TToolkitoolkit Python binding for ROOTPython binding for ROOT 就算你不是慣就算你不是慣 CC 也沒問題!也沒問題! All the booking and plottingAll the booking and plotting functions have correspondingfunctions have corresponding python bindingspython bindings You can also use the sameYou can also use the same data structure as used to be in C++data structure as used to be in C++ http://root.cern.ch/drupal/

40 TMVATMVA Multi-variate analysis tool-kitMulti-variate analysis tool-kit Based on supervised learningBased on supervised learning Embedded in ROOTEmbedded in ROOT Easy training and testingEasy training and testing Providing various classifiersProviding various classifiers Linear Discriminant (LD)Linear Discriminant (LD) Artificial Neural Networks (NN)Artificial Neural Networks (NN) Boosted Decision Trees (BDT)Boosted Decision Trees (BDT) ...... http://tmva.sourceforge.net /

41 pyTMVApyTMVA Multi-variate analysis tool-kitMulti-variate analysis tool-kit 用用 PythonPython 也可以!也可以! Providing various classifiersProviding various classifiers Linear Discriminant (LD)Linear Discriminant (LD) Artificial Neural Networks (NN)Artificial Neural Networks (NN) Boosted Decision Trees (BDT)Boosted Decision Trees (BDT) ...... http://tmva.sourceforge.net /

42 Input VariablesInput Variables

43 Input VariablesInput Variables

44 Input VariablesInput Variables

45 Input VariablesInput Variables

46 Input VariablesInput Variables

47 Correlation MatrixCorrelation Matrix

48 TMVA OutputsTMVA Outputs TMVA by default takes ½ of sample for training and the other ½ for performance tests.

49 TMVA OutputsTMVA Outputs TMVA by default takes ½ of sample for training and the other ½ for performance tests.

50 還有什麼工具?還有什麼工具? Pure Python ToolsPure Python Tools SciPy (NumPy, Matplotlib)SciPy (NumPy, Matplotlib) Scientific computing with PythonScientific computing with Python Interactive operation withInteractive operation with IPythonIPython Creating & manipulating dataCreating & manipulating data Matlab-like plotting withMatlab-like plotting with MatplotLibMatplotLib SciKit-LearnSciKit-Learn Machine learning in PythonMachine learning in Python Cooperate with SciPy, NumPy,Cooperate with SciPy, NumPy, matplotlib...matplotlib... Multi-class classificationMulti-class classification RegressionRegression ClusteringClustering And more...And more...

51 視覺化函式庫視覺化函式庫 MatplotLib --MatplotLib -- 提供類似提供類似 matlabmatlab 語法的繪圖工具語法的繪圖工具

Fork Me on GitHub! https://github.com/yuanchao/pyHiggsML

You could also Win a Prize!!! 也許你有機會與近三千位不認識的人成為論文共同作者

Open Data Open Access Open Source 研究成果開放取用 取之於民、與民享之

以上 謝謝

Remerci de Votre Attention

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...” - Dan Ariely (Duke)

58 Installing ROOTInstalling ROOT Get the ROOT binary for UbuntuGet the ROOT binary for Ubuntu Go to here:Go to here: http://sourceforge.net/projects/cernrootdebs/http://sourceforge.net/projects/cernrootdebs/ Download the i386/x86_64 package:Download the i386/x86_64 package: Click on "Files" → "32bits!" → "root_5.32.00_i386.deb"Click on "Files" → "32bits!" → "root_5.32.00_i386.deb" Open a terminalOpen a terminal Type in the following commands:Type in the following commands: $ cd Download/$ cd Download/ $ sudo dpkg -i root_5.32.00_i386.deb$ sudo dpkg -i root_5.32.00_i386.deb ← use your passwd!← use your passwd! $ sudo apt-get install libssl0.9.8$ sudo apt-get install libssl0.9.8 $ sudo apt-get install libjpeg62$ sudo apt-get install libjpeg62 $ source /opt/root/bin/thisroot.sh$ source /opt/root/bin/thisroot.sh ← you can put in ~/.bashrc← you can put in ~/.bashrc You can run root now:You can run root now: $ root -l$ root -l ← " -l" means no splash window← " -l" means no splash window root [0]root [0] TBrowser tTBrowser t ← make sure no error messages← make sure no error messages

LHCLHC LHCLHC 確認希格斯粒子與標準模型相容確認希格斯粒子與標準模型相容 ...... 尚未發現微觀黑洞或超對稱的存在尚未發現微觀黑洞或超對稱的存在 ...... http://cdsweb.cern.ch/record/1428128?ln=en

Add a comment