Coding & Best Practice in Programming in the NGS era

45 %
55 %
Information about Coding & Best Practice in Programming in the NGS era
Science

Published on May 7, 2014

Author: flxlex

Source: slideshare.net

Description

A talk I gave at the SeqAhead Scientific Meeting 2014 "NGS Data after the Gold Rush", May 7th 2014

Coding & Best Practice in Programming Why it matters so much in the NGS era Lex Nederbragt Norwegian Sequencing Centre and Centre for Evolutionary and Ecological Synthesis lex.nederbragt@ibv.uio.no @lexnederbragt OK

Who am I @lexnederbragt flxlexblog.wordpress.com

How I became a bioinformatician

2007: a grant GS FLX from Roche/454 Genome Analyzer from Solexa/Illumina ? Let’s try them out!

Specimen • Planktothrix rubescens NIVA CYA 98 • Cyanobacteria • (blue-green algae)

Planktothrix Half a million reads Average length 260 nt 10 million reads 33 nucleotides each Perl

Planktothrix Newbler SHARCGS Assembly Half a million reads Average length 260 nt 10 million reads 33 nucleotides each

Atlantic cod genome project 850 million bases (Mbp )‘Wild-caught’ GS FLX from Roche/454

Atlantic cod genome project phase 1

Cod genome project phase 2 From Wikimedia commons, user Sagar Joshi

In summary From flickr, user lesterpubliclibrary

Challenges in the next-generation sequencing era

High-throughput sequencing Phase 1: more is better Phase 2: smaller is better Phase 3: single-molecule Phase 4: nanopores

Democratization of sequencing MinION 512 nanopores 150mb/hour Up to 6 hours $900

Sequencing cost Thanks to Matt Clark (TGAC), modified from http://bit.ly/1iiajcS 454 & polony Solexa & SOLiD HiSeq HiSeq X Ten GAII End of the gold rush?

More more more Data Software Mathias Bigge, Ricordisamoa, others (wikimedia commons) TCTCCTAACAACCCCCcACACACACACACTGGTA CTGATGCCATTCTGCTTTACACCTATACACATCA TATACATtATACACACACACACACACACACAACA CTCTCCTAACCCACACACACTGGTACAGATGCCA GTCTGCTTAACACCTACGCACGTATTATACACAC ACACACACAACGCTCTCCTAACCCACACACACAC CAGTCTGCTTTAAACCTACACACATATTATACAA ACGAGTTGGTGACGTAAGGTTGATAAGGGATATT GGTAAGGGTTAAGGGTAGGGTTGGTGTTAGGGGC AAGGGTTAGGGTTAGTGTAAGGGGTAAGGGTTAG TGTAaGGAGTAAGGGTTAGTGTAAGGGGTTAGTG TTATTGTAAGGGGCTAGTGTTAGTGTTAGTGTTC AGGGTTAGTGTTAGGGGTAGGGTTAATgTTTAGG GTAATGTTTAGGGTTAGGGGTATGGGTTAGTGCT AGGGGTCAGGGTTAGTGTTAGGGTTAGACAACCC ACCTGAGAGAACCAGTGCGATGCCGCCGCAGGCG TTGGGCGAGGACATGGAGGTGCCGTTCATCAGCT GGGTCCCCCGGAGGGTCCAGTTGGGGACGGAGGC GATGGCTCCCCCCGGAGCGCTGATGCTGACCCCC AGGGCGCCGTCGATGCTGGGTCCCCGAGACGACC AGGTGTACTGGTTGGCCGGGAGCTTCTCCCTCAG GGAGTACTCCGCCACCATCATGTCGGGGGTCACG TAGGCCCCAACCCCTGGGGACAGACGGAGCGCGT TACACACCTCAACCCCTTACCCTCGGAGCCTACA

Software Constant stream of new software http://wwwdev.ebi.ac.uk/fg/hts_mappers 88 short-read mappers

Software Constant stream of new software http://neidetcher.com/ubuntu_package_dependency.html InstallationJudging quality Wikimedia commons, user Thebestofall007

Do we need to be worried?

Do we need to be worried? Self-taught bioinformaticians ACCCCCcACACACACACACTGGTACTGATGCC ACACCTATACACATCATATACATtATACACAC ACACAACACTCTCCTAACCCACACACACTGGT GTCTGCTTAACACCTACGCACGTATTATACAC AACGCTCTCCTAACCCACACACACACCAGTCT TACACACATATTATACAAACGAGTTGGTGACG AAGGGATATTGGTAAGGGTTAAGGGTAGGGTT GCAAGGGTTAGGGTTAGTGTAAGGGGTAAGGG GAGTAAGGGTTAGTGTAAGGGGTTAGTGTTAT TAGTGTTAGTGTTAGTGTTCAGGGTTAGTGTT TTAATgTTTAGGGTAATGTTTAGGGTTAGGGG TGCTAGGGGTCAGGGTTAGTGTTAGGGTTAGA GAGAGAACCAGTGCGATGCCGCCGCAGGCGTT ATGGAGGTGCCGTTCATCAGCTGGGTCCCCCG TTGGGGACGGAGGCGATGGCTCCCCCCGGAGC ACCCCCAGGGCGCCGTCGATGCTGGGTCCCCG GTGTACTGGTTGGCCGGGAGCTTCTCCCTCAG GCCACCATCATGTCGGGGGTCACGTAGGCCCC GACAGACGGAGCGCGTTACACACCTCAACCCC AGCCTACATAACCCAACCCTCTGGAGACGGCA AGTCAGAAATAGaGCTGACCGATTCATCAAAT lot’s of data lot’s of software recipe for disaster?

Correctness of results http://www.it.bton.ac.uk/staff/je/java/jewl/tutorial/tutorial.html

Reproducibility doi:10.1038/sj.embor.7401143 A reproducibility crisis?

Reproducibility and reusability http://upload.wikimedia.org/wikipedia/commons/4/48/Recycle.jpg

What it boils down to

My (given) title Coding & Best Practice in Programming Why it matters so much in the NGS era Why it matters so much in science Next-generation sequencing specific?

Diagnostic sequencing Wikimedia commons, user Bill Branson

Diagnostic sequencing

Diagnostic sequencing

Solutions

Solutions Flickr: http://farm4.staticflickr.com/3319/3265787219_bfbc654b5e_o.jpg Wikimedia commons

Best practices 10.1371/journal.pbio.1001745

Best practices Automate repetitive tasks Wikimedia commons, user Pzucchel

Best practices Coding styles, variable naming etc def test_seq: def sequence_is_DNA:

Best practices Use version control https://www.atlassian.com/git/workflows

Best practices From my own work: $ cd scripts $ ls blat_parse4.pl old_versions snps_flanks_2_fastq.pl $ ls old_versions/ blat_parse2.pl blat_parse_attemp1.pl blat_parse.pl.bak blat_parse.pl blat_parse3_backup.pl blat_parse3.pl

Best practices test, test, test def test_zero: assert run_the_function(0) == 0 Assert x > 0, ”cannot handle negative numbers"

Best practices Document well

Best practices Collaborate http://howdoitradestocks.com/wp-content/uploads/2011/12/share-ideas1.jpg

khmer, a 'case study'

khmer Crusoe et al. doi: 10.6084/m9.figshare.979190Michael Crusoe Titus Brown

khmer https://github.com/ged-lab/2013-paper-ssspe

khmer Integrated code coverage analysis The “GitHub Flow” model of code review Semantic versioning Continuous integrationIntegration and acceptance testing

Beyond best coding practices

Benchmarks http://assemblathon.org/

Benchmarks http://www.genome.org/cgi/doi/10.1101/gr.131383.111

Benchmarks http://www.genomeinabottle.org/ ~8300 10ug vials of DNA for NA12878

(Assembly) validation

(Assembly) validation Assembly doi:10.1186/1471-2105-15-126

Reproducibility ‘platforms’ usegalaxy.org taverna.org.uk/ pythonhosted.org/Sumatra/

Action points

Action points Attend a software Carpentry Boot Camp http://software-carpentry.org/

Action points Look for signs of best practice

Action points Look for signs of best practice during peer review nature.com

Action points Benchmarking/validation

Action points Develop (under)graduate curriculum

My goal today Flickr: http://farm4.staticflickr.com/3319/3265787219_bfbc654b5e_o.jpg

Add a comment

Related presentations

How organisms adapt and survive in different environment.

Aplicación de ANOVA de una vía, modelo efectos fijos, en el problema de una empres...

Teori pemetaan

Teori pemetaan

November 10, 2014

learning how to mapping

Libros: Dra. Elisa Bertha Velázquez Rodríguez

Materi pelatihan gis

Materi pelatihan gis

November 10, 2014

learning GIS

In this talk we describe how the Fourth Paradigm for Data-Intensive Research is pr...

Related pages

Coding & Best Practice in Programming: Why it matters so ...

Coding & Best Practice in Programming: ... EMBnet.journal Article Tools. ... Why it matters so much in the NGS era.
Read more

Coding & Best Practice in Programming: Why it matters so ...

Coding & Best Practice in Programming: Why it matters so much in the NGS era on ResearchGate, the professional network for scientists.
Read more

Why it matters so much in the NGS era

Coding & Best Practice in Programming Why it matters so much in the NGS era Lex Nederbragt Norwegian Sequencing Centre and Centre for Evolutionary and ...
Read more

Alexander Johan Nederbragt (University of Oslo, Oslo) on ...

Article: Coding & Best Practice in Programming: Why it matters so much in the NGS era
Read more

Java Programming: Principles of Software Design - Duke ...

Java Programming: Principles of Software Design from Duke University. In this course, you will write programs in Java to solve real­ world problems that ...
Read more

Rare-disease genetics in the era of next-generation ...

incorporation of NGS into clinical practice for patients ... Rare-disease genetics in the era ... coding portion of the human genome ...
Read more

Scientific Meeting 2014 "NGS Data after the Gold Rush ...

Scientific Meeting 2014 "NGS Data after the Gold Rush" & Management ... Coding & Best Practice in Programming: Why it matters so much in the NGS era, ...
Read more

Easy | CodeChef

The Best Box. J7: 3422. ... Our programming contest judge accepts solutions in over 35+ programming languages. Preparing for coding ... Programming Tools ...
Read more