20150522_Example_PyData_use-cases_in_astronomy_research

50 %
50 %
Information about 20150522_Example_PyData_use-cases_in_astronomy_research

Published on May 22, 2015

Author: SamuelHarrold

Source: slideshare.net

1. A research group’s use- cases for PyData tools Samuel Harrold Astrophysics PhD Student, UT Austin 2015-05-22 @ Continuum Analytics, Austin, TX

2. Motivation ● In 2011: ○ Research group mostly used bash scripts, awk, Fortran, IDL, IRAF. ○ Pipeline was tightly coupled with old computers, cameras, camera software.

3. Motivation ● In 2011: ○ Research group mostly used bash scripts, awk, Fortran, IDL, IRAF. ○ Pipeline was tightly coupled with old computers, cameras, camera software. ● Goals for new computers and camera: ○ Make pipeline loosely coupled, cross-platform. ○ Develop skills for non-academic job market.

4. Motivation ● In 2011: ○ Research group mostly used bash scripts, awk, Fortran, IDL, IRAF. ○ Pipeline was tightly coupled with old computers, cameras, camera software. ● Goals for new computers and camera: ○ Make pipeline loosely coupled, cross-platform. ○ Develop skills for non-academic job market. ● Led research group in adopting Python tools.

5. ● Conflict of interest: Engineering vs publishing papers. ● To adopt best practices from industry, science needs more tools that lower the entry barrier. ○ Example: It’s hard to mine your data if you don’t know how to create a database. Summary

6. Outline ● Motivation ● Use-cases ● FAQ from researchers

7. Use of some PyData tools ● Anaconda: Environment management. ● IPython Notebooks: Copy-paste code share. ● scikit-image: Detecting stars. ● pandas: Data organization. ● statsmodels, emcee: Robust statistics. ● astropy, astroML: Astronomy-specific.

8. Use-case: Star brightness vs time ● “Time-series photometry.” ● Objective: ○ Extract relative brightness of stars from images during acquisition. https://github.com/ccd-utexas/tsphot

9. Use-case: Star brightness vs time ● Status: ○ Developed to be good enough for internal use, but not made robust for distribution. ○ Conflict of interest: engineering vs publishing papers https://github.com/ccd-utexas/tsphot

10. Use-case: Data mining platform ● Objective: ○ Predict which unobserved white dwarf stars pulsate. ■ What stars are there? From catalogs. ■ Which stars are published (non)pulsators? From papers. ■ Which stars are unpublished (non)pulsators? From our data. http://www.slideshare.net/SamuelHarrold/20140409-harrold-dataminingdemostellarseminar

11. Use-case: Data mining platform ● Status: ○ Shut down due to under-use. ■ Users preferred grep + Excel rather than pandas. ■ Users didn’t want to maintain MySQL database. ○ Conflict of interest: engineering vs publishing papers http://www.slideshare.net/SamuelHarrold/20140409-harrold-dataminingdemostellarseminar

12. Use-case: Reproducible research ● Objective: ○ Compute the physical quantities of a binary star system from time-series photometry. https://github.com/stharrold/Harrold_2015_SDSSJ1600; https://pypi.python.org/pypi/binstarsolver

13. Use-case: Reproducible research ● Status: ○ Citable code on GitHub with DOI from zenodo.org. ○ Distributable code published to PyPI. ○ Conflict of interest: engineering vs publishing papers https://github.com/stharrold/Harrold_2015_SDSSJ1600; https://pypi.python.org/pypi/binstarsolver

14. FAQ from researchers ● Questions: ○ “Why don’t you use ___?” ○ “How does this help publish more papers?” ○ “Why should I learn another language?”

15. FAQ from researchers ● Questions: ○ “Why don’t you use ___?” ○ “How does this help publish more papers?” ○ “Why should I learn another language?” ● Answers: ○ “Look how quickly I can do ___.” ○ Examples justify taking time to learn new skills. ○ NSF Data Management and Sharing requirements: https://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp ○ TIOBE code popularity index: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html ○ Jake VanderPlas’s blog post on data science and academia: https://jakevdp.github.io/blog/2014/08/22/hacking-academia/

Add a comment