6 1

47 %
53 %
Information about 6 1

Published on November 19, 2007

Author: Julie


advertisement An Introduction to Perl for Bioinformatics – Part 2:  An Introduction to Perl for Bioinformatics – Part 2 Will Hsiao Simon Fraser University Department of Molecular Biology and Biochemistry Outline:  Outline Session 1 Review of the previous day Perl – historical perspective Expand on Regular Expression General Use of Perl Expand on Perl Functions and introduce Modules Interactive demo on Modules Break Session 2 Use of Perl in Bioinformatics Object Oriented Perl Bioperl Overview Interactive demo on Bioperl Introduction to the Perl assignment Perl in Bioinformatics:  Perl in Bioinformatics Case to point 1: Human Genome data exchange “How Perl saved the Human Genome Project” Lincoln Stein (1996) Different sequencing centres all have different data format Perl allowed various genome centres to exchange and communicate data with each other Introduces a project to produce modules to process all known forms of biological data (Bioperl) Perl in Bioinformatics:  Perl in Bioinformatics Case to point 2: Ensembl Much of Ensembl is written in Perl Ensembl has an extensive Perl API - allow you to access Ensembl database directly from your perl code Case to point 3: GMOD – Generic Model Organism Database a joint effort by model organism system databases (worm, fly, corn, rat, yeast, E. coli, arabidopsis, rice) to develop reusable components suitable to be adapted for other biological databases Written mostly in Java and Perl Bioinformatics Spectrum:  Bioinformatics Spectrum Math Biology Computer Science Software/ data analysis Perl JAVA C/C++ Perl for bioinformatics in your lab:  Perl for bioinformatics in your lab Scripting automation of repetitive analyses parse results obtained from other programs Wrapping accessing others programs (e.g. BLAST) through Perl Web CGI’ing Develop an interactive web page to your lab Create web forms Bioperl Overview:  Bioperl Overview The Bioperl project – Comprehensive, well documented set of Perl modules Last stable release 1.4.0 (developer 1.5.1) A bioinformatics toolkit for: Format conversion Report processing Data manipulation Sequence analyses and more! Written in object-oriented Perl What are objects?:  What are objects? Examples of objects in real life: Cars, dogs, dishwashers… Objects have ATTRIBUTES and ACTIONS Some attributes of a dog: Color of fur Height Owner’s Name Weight Tail position Some actions of a dog: Bark Walk Run Eat Wag tail What are programming objects?:  What are programming objects? Borrows from the concept of real life objects sub dye_fur{} sub eat{ } sub wag_tail{ } $fur_color $weight $tail_position Attributes are stored as variables Actions are implemented as functions A Program Dog Object Object Exercise:  Object Exercise Pair up with your neighbour (2-3 people) In the next 2-3 minutes, come up with as many attributes and actions (aka methods) of a DNA sequence object E.g. attributes of a DNA sequence object: $length=300, $percent_GC=50% E.g. methods of a DNA sequence object: Translate_to_protein, remove_polyA_tail Share with the class Objects belong to Classes:  Objects belong to Classes If we take all your suggestions and design a generic template. We can then use this template to create objects. This template is called a Class An “instance of a class” is called an object DNA sequence object 1 DNA sequence object 2 DNA sequence object 3 DNA sequence object 4 DNA Sequence Class How do we interact with an object?:  How do we interact with an object? WOOF POLO Polo is the name of my dog We have to refer to an object by its name Interact with a program object:  Interact with a program object $Polo sub dye_fur{} sub eat{ } sub wag_tail{ } $fur_color $weight $tail_position A Program Dog Object WOOF $Polo is the name of a program dog object A name is a reference:  A name is a reference Objects have unique names (labels) You refer to an object by its unique name This unique name that you give to an object is called a “reference” Reference in Perl:  Reference in Perl A reference is a scalar (simple) variable that refers to a chunk of memory Stored in that memory can be another variable or an object $array_ref Memory My Program Reference to an object:  Reference to an object $var{SwissProt_ID} $var{name} $var{length} $var{souce} $var{@journal_articles} $var{%domain_location} sub new{…} sub return_ID{…} sub get_domain{…} A protein object $my_protein Memory $my_protein is called a “reference” to an object (in this case a protein object) To access the attributes and methods of the protein object, you have to go through its reference (i.e. $my_protein) Objects have inherent functions that are useful These inherent functions also have specific names My Program Object Oriented Programming:  Object Oriented Programming What is O-O Programming? Simple answer: a way to organize code so it interacts in certain ways and follows certain rules Long answer: to be found in books on O-O Why O-O Programming? Provides well defined framework Promotes certain good practice such as code reuse, abstraction, cleaner design, etc. Does have certain trade-offs (e.g. O-O Perl is usually slower than declarative Perl) Designing good object classes requires forethoughts and skills To use an object:  To use an object Find out which class you need and learn about the class by reading its documentation Make the class available to your program Create a new object of the class Start using the object by modifying its attributes and calling its methods Example of using objects:  Example of using objects Task: I have a sequence file in Genbank format that I want to convert to EMBL format How many objects do you think we need to accomplish the task above? 1. Find the Objects you need:  1. Find the Objects you need Objects that we need: an object that read in sequences from a file an object that represents a sequence record an object that write sequences to a file Sequence File Input Object EMBL Genbank Sequence Object Sequence File Output Object Memory Example of using objects:  Example of using objects Solution: I remember that Bioperl provides this functionality. So first I’ll take a look at the Bioperl documentation Website: Bioperl Documentation demo:  Bioperl Documentation demo Go to the webpage and navigate to SeqIO doc Pay attention to 1) the name of the module 2) Synopsis (code examples) 3) Description 4) list of methods Slide26:  Click Slide27:  List of Modules by Class Complete List of Modules by Name 2. Make the object class available:  2. Make the object class available In perl, classes are implemented as object-oriented modules To include a class, simply use the module E.g. use Bio::SeqIO Note the name of the module is case sensitive By using Bio::SeqIO, my program automatically gain access to any modules included in Bio::SeqIO 3. Create an object:  3. Create an object Make up a name for my object reference (e.g. $seq_input) Create the object by calling the object class’s “new” method every class has a “constructor” method to create an object of that class constructor method is often called “new” use single arrow operator to call methods Assign the object to the object reference You can give the object you are about to create some initial attributes (e.g. the file name of my sequence record, the format of the record) my $seq_in Bio::SeqIO->new = ( -file => “myGBrecord”, -format => “genbank”); 4. Call object’s methods?:  4. Call object’s methods? We’ve seen the -> (single arrow) operator for calling a class method (e.g. new) The same operator is used for calling an object method E.g. to ask $seq_in object to get a sequence record from your Genbank sequence file my $seq_record = $seq_in->next_seq(); Putting it all together:  Putting it all together #!/usr/bin/perl –w use strict; use Bio::SeqIO; my $seq_in = Bio::SeqIO->new( -file => “myGBrecord”, -format => “genbank”); my $seq_out = Bio::SeqIO->new( -file => “>myEMBLrec”, -format => ‘EMBL’); my $seq_record = $seq_in->next_seq(); $seq_out->write_seq($seq_record); Create a new Bio::SeqIO object and initialize some attributes More Bioperl modules:  More Bioperl modules Bio::SeqIO: Sequence Input/Output Retrieve sequence records and write to files Converting sequence records from one format to another Bio::Seq: Manipulating sequences Get subsequences ($seq->subseq($start, $end)) Find the length of the object ($seq->length) Reverse complement a DNA sequence Translate a DNA sequence ….etc. Bio::Annotation: Annotate a sequence Assign journal references to a sequence, etc. Bio::Annotation is associated with an entire sequence record and not just part of a sequence (see also Bio::SeqFeature) Some more Bioperl modules:  Some more Bioperl modules Bio::SeqFeature: Associate feature annotation to a sequence “features” describe specific locations in the sequence E.g. 5’ UTR, 3’ UTR, CDS, SNP, etc Using this object, you can add feature annotations to your sequences When you parse a genbank file using Bioperl, the “features” of a record are stored as SeqFeature objects Bio::DB::GenBank, GenPept, EMBL and Swissprot: Remote Database Access You can retrieve a sequence from remote databases (through the Internet) using these objects Even more Bioperl modules:  Even more Bioperl modules Bio::SearchIO: Parse sequence database search reports Parse BLAST reports (make custom report) Parse HMMer, FASTA, SIM4, WABA, etc. Custom reports can be output to various formats (HTML, Table, etc) Bio::Tools::Run::StandAloneBLAST: Run Standalone BLAST through perl By combining this and SearchIO, you can automate and customize BLAST search Bio::Graphics: Draw biological entities (e.g. a gene, an exon, BLAST alignments, etc) Bioperl Summary:  Bioperl Summary For Online documentation: For this workshop: Tutorial: HOWTOs: Modules: Literature: Stajich et al., The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002 Oct;12(10):1611-8. PMID: 12368254 Bioperl mailing list: Best way to get help using Bioperl Very active list (upwards of 10 messages a day) Use with caution: things change fast and without warning (unless you are on the mailing list…) Interactive demo on Bioperl:  Interactive demo on Bioperl Open your laptop! Open a terminal window Type cd ~/perl_two Type gedit ./ Let’s go over the example together Summary for Session 2:  Summary for Session 2 Perl is a popular language in bioinformatics because: it handles text well It has great user base and support (e.g. Bioperl) Bioperl is a large collection of object oriented perl modules for many biological data analyses an object is a collection of attributes and methods You have to access an object through its reference a reference is a name Perl Documents:  Perl Documents In-line documentation POD = plain old documents Read POD by typing perldoc <module name> E.g. perldoc perl, perldoc Bio::SeqIO On-line documentation http:/ Books Learning Perl (the best way to learn Perl if you know a bit about programming already) Beginning Perl for Bioinformatics (example based way to learn Perl for Bioinformatics) Programming Perl (THE Perl reference book – not for the faint of heart) Additional Book References:  Additional Book References Perl Cookbook 2nd edition (quick solutions to 80% of what you want to do) Learning Perl Objects, References & Modules (for people who want to learn objects, references and modules in Perl) Perl in a Nutshell (an okay quick reference) Perl CD Bookshelf, Version 4.0 (electronic version of the above books – best value, searchable, and kill fewer trees) Mastering Perl for Bioinformatics (more example based learning) CGI Programming with Perl (rather outdated treatment on the subject... Not really recommended) Perl Graphics Programming (if you want to generate graphics using Perl; side note – Perl is probably not the best tool for generating graphics) Introduction to the Assignment Part A:  Introduction to the Assignment Part A Goals: To convert passive knowledge to active skills To write some simple perl programs by yourself Consists of 2 modules Write a program to convert the temperature from F to C Write a program to count the frequencies of bases in a sequence (sequence MAN1.fasta can be downloaded from Day6 wiki) Introduction to the Assignment Part B:  Introduction to the Assignment Part B Goals: To see the power of Perl in bioinformatics To see how some common bioinformatics tasks are done using Perl Consists of 3 modules Download E. coli O157:H7 proteins in FASTA format Use Regular Expression to find a protein motif Run BLAST on all proteins in the proteome (>5000 BLAST runs) Introduction to the Assignment Part B:  Introduction to the Assignment Part B Most of the code is given to you, you just have to modify them (in total, no more than 15 lines of new code!!) You are not expected to know everything in the scripts. It takes time to learn a new language TAs and your CS team mates will help you, don’t wait until last minute to ask for help Remember, you still have to hand in your own version of the assignment! No copying! Acknowledgements:  Acknowledgements Thanks to Sohrab Shah and Sanja Rojic (CS, UBC) for a wonderful collaborative work on the lecture/lab material Some ideas of this lecture is borrowed from Lincoln Stein’s workshop (

Add a comment

Related presentations

Related pages

iOS 6.1.6 für iPhone 3GS - Download - CHIP

iOS 6.1.6 für iPhone 3GS Deutsch: Sie wollen auf Ihrem iPhone 3GS wirklich alle Apple-Funktionen nutzen? Dann benötigen Sie das aktuelle Betriebssystem ...
Read more

Feiertage am 6. Januar - insb. Feiertage am 6.1.2017

Übersicht: Alle Feiertage am 6. Januar 2017, 2018 etc. und in welchem Bundesland in Deutschland der 6.1. ein (gesetzlicher) Feiertag ist.
Read more

1&1 - DSL, Hosting, Mobile Internet, Domain, Server

1&1 Basic Windows: 6 Monate für 0,- €/Monat, danach 6,99 €/Monat. Abrechnungszeitraum 1 Monat. Mindestvertragslaufzeit 12 Monate. Einmalige ...
Read more

iOS 6.1 Jailbreak für iPhone 5 & 4S, iPad mini, iPad 4 & 3 ...

Der iOS 6.1 Jailbreak steckt in den Startlöchern! In Kürze wird der ultimative untethered iOS 6.1 Jailbreak vom Team evad3rs erwartet, der praktisch alle ...
Read more

§ 6 EStG Bewertung -

(1) Für die Bewertung der einzelnen Wirtschaftsgüter, die nach § 4 Absatz 1 oder nach § 5 als Betriebsvermögen anzusetzen sind, gilt das Folgende: 1. ...
Read more

6. Januar – Wikipedia

Der 6. Januar (in Österreich und Südtirol: 6. Jänner) ist der 6. ... 1673: James Brydges, 1. Duke of Chandos, britischer Edelmann, Bauherr und Mäzen;
Read more

Gopal 6.1: Auto-Hi-Fi & Navigation | eBay

Finden Sie tolle Angebote auf eBay für Gopal 6.1 in Navigationssoftware. Verkäufer mit Top-Bewertung.
Read more

iOS 6.1 Software-Update - Official Apple Support

Dieses Update enthält folgende Verbesserungen und Fehlerbehebungen: LTE-Unterstützung für weitere Netzbetreiber (die vollständige Liste finden Sie ...
Read more

VW Golf 6 - – Gebrauchtwagen und Neuwagen

Volkswagen Golf VI als Neu- oder Gebrauchtwagen – Kaufen oder verkaufen Sie Ihren VW Golf 6 bei – Deutschlands größtem Fahrzeugmarkt.
Read more

Vokabeltrainer, Vokabellernprogramm, Lernsoftware, AZ6-1 ...

Vokabeltrainer AZ6-1 ist ein neues und erfolgreiches Vokabellernprogramm, basierend auf der 5-Fächer Lernkartei und nutzbar für Vokabeln aller Art.
Read more