eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsing using Java lanaguage

50 %
50 %
Information about eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML...
Technology

Published on December 1, 2008

Author: wpodgorski

Source: slideshare.net

Description

Presentation describes modern ways of parsing XML documents using Java language. It shows different approaches to the same problem, their capabilities, advantages, disadvantages and their comparison. Moreover, we can learn what to expect from Java 7 in context of XML.

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... eXtensible Markup Language APIs in Java 1.6 Simple and efficient XML parsing using Java lanaguage Wojciech Podg´rski o http://podgorski.wordpress.com April 8, 2008 Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Presentation outline 1 Introduction What is parsing Diffrent ways of parsing documents 2 XML API’s in Java SAX DOM StAX 3 Capabilities and performance comparison 4 CASE STUDY: Parsing Really Simple Syndication (RSS) doc 5 What next? Alternatives to API’s, Java SE 7.0 features 6 Summary 7 Further reading... Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Parsing definition Parsing, more formally called syntactic analysis is the process of analyzing a sequence of tokens to determine grammatical structure with respect to a given formal grammar. Source: http://en.wikipedia.org/wiki/Parsing Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... We can distinguish three main models of parsing XML documents. Each one of them differs with mechanism of traversing between the nodes and idea of processing XML data. Those models are: SAX - Simple API for XML DOM - Document Object Model StAX - Streaming API for XML Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... We can distinguish three main models of parsing XML documents. Each one of them differs with mechanism of traversing between the nodes and idea of processing XML data. Those models are: SAX - Simple API for XML DOM - Document Object Model StAX - Streaming API for XML Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... We can distinguish three main models of parsing XML documents. Each one of them differs with mechanism of traversing between the nodes and idea of processing XML data. Those models are: SAX - Simple API for XML DOM - Document Object Model StAX - Streaming API for XML Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... We can distinguish three main models of parsing XML documents. Each one of them differs with mechanism of traversing between the nodes and idea of processing XML data. Those models are: SAX - Simple API for XML DOM - Document Object Model StAX - Streaming API for XML Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... That’s not all! There are other approaches, which won’t be described in this presentation. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... That’s not all! There are other approaches, which won’t be described in this presentation. JAXB - Java XML Binding API Technology providing ability to marshal Java objects into XML and the reverse, i.e. to unmarshal XML elements back into Java objects. Working on top of another parser (mostly streaming parsers). Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Javolution Library providing real-time StAX-like implementation which does not force object creation and has smaller effect on memory footprint/garbage collection, using eg. lookup tables for retriving and reusing data. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison What is parsing CASE STUDY: Parsing Really Simple Syndication (RSS) doc Diffrent ways of parsing documents What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Javolution Library providing real-time StAX-like implementation which does not force object creation and has smaller effect on memory footprint/garbage collection, using eg. lookup tables for retriving and reusing data. VTD-XML - Virtual Token Descriptor for XML Collection of efficient processing technologies, centered around a non-extractive and ‘document-centric‘ parsing technique called VTD. Supports random access’ and XPath Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... SAX as a processing model While describing SAX, firstly it should be considered as a specific processing mechanism, rather then simple API. SAX represents event-driven architecture. It means, that parser would perform an operation each time when a particular event will occur. To handle these occurences, user defines a number of callback methods, which will be called when parser is notified about encountered element. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Figure: Top-down parsing in SAX API Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... In Java language, SAX API is a collection of classes and interfaces, which should be implemented while constructing XML parser. Package containing this collection is: org.xml.sax.* Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Figure: org.xml.sax.* package class diagram Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Basic class structure 1 // D e c l a r e document URI 2 S t r i n g xmlURI = ” h t t p : / / e x a m p l e . com/ r e p o r t . xml ” ; 3 4 // C r e a t e r e a d e r i n s t a n c e 5 XMLReader r e a d e r = XMLReaderFactory . createXMLReader ( ) ; 6 7 // S e t i m p l e m n t a t i o n c l a s s o f C o n t e n t H a n d l e r 8 r e a d e r . s e t C o n t e n t H a n d l e r ( new MyContentHandler ( ) ) ; 9 10 // R e s o l v e document s o u r c e 11 I n p u t S o u r c e i n p u t S o u r c e = new I n p u t S o u r c e ( xmlURI ) ; 12 13 // P a r s e document 14 reader . parse ( inputSource ); Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Diffrent SAX implementations 1 // X e r c e s i m p l e m e n t a t i o n 2 XMLReader r e a d e r = 3 new o r g . a p a c h e . x e r c e s . p a r s e r s . SAXParser ( ) ; 4 5 // JAXP i m p l e m e n t a t i o n 6 SAXParser p a r s e r = S A X P a r s e r F a c t o r y . newSAXParser ( ) ; 7 XMLReader r e a d e r = p a r s e r ; 8 9 // P i c c o l o i m p l e m e n t a t i o n 10 XMLReader r e a d e r = new com . b l u e c a s t . xml . P i c c o l o ( ) ; Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Other SAX features SAX provides number of interfaces for correct data handling. Some of them, not only process the content of document, but also it’s structure. Interfaces such as: ErrorHandler EntityResolver DTDHandler Analyze also structure of the document, for possible errors, entity links or elements describing other elements. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Advanced SAX features I SAX API is considered as very flexible solution. Mainly because it can be configured by properites and features. 1 void setProperty ( S t r i n g propertyID , Object value ) ; 2 void setFeature ( String featureID , boolean state ) ; Properties and features modify parser behaviour while processing document. For example, we can validate if document is well-formed XML file, or validate it against the schema related to it. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Advanced SAX features II Among many other interesting SAX features, one is very important and radically extends SAX capabilities. Interface XMLFilter allows to create a cascade of parsers, each for a different processing operation. It greatly accelerates parsing as a one piece. Figure: Cascade processing using XMLFilter interface Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... What SAX cannot do... I Q: Why do we need other mechanisms, if SAX is so good? Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... What SAX cannot do... I Q: Why do we need other mechanisms, if SAX is so good? A: SAX has some serious limitations due to his sequential data access. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... What SAX cannot do... II SAX parse data from beginning to end. It doesn’t allow to go back. And also got some other negative issues.: it is unable to modify content or structure of document it cannot access specific or random elements it cannot access sibling elements it is not serializable Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... What SAX cannot do... II SAX parse data from beginning to end. It doesn’t allow to go back. And also got some other negative issues.: it is unable to modify content or structure of document it cannot access specific or random elements it cannot access sibling elements it is not serializable Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... What SAX cannot do... II SAX parse data from beginning to end. It doesn’t allow to go back. And also got some other negative issues.: it is unable to modify content or structure of document it cannot access specific or random elements it cannot access sibling elements it is not serializable Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... What SAX cannot do... II SAX parse data from beginning to end. It doesn’t allow to go back. And also got some other negative issues.: it is unable to modify content or structure of document it cannot access specific or random elements it cannot access sibling elements it is not serializable Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... What SAX cannot do... II SAX parse data from beginning to end. It doesn’t allow to go back. And also got some other negative issues.: it is unable to modify content or structure of document it cannot access specific or random elements it cannot access sibling elements it is not serializable Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... What SAX cannot do... II SAX parse data from beginning to end. It doesn’t allow to go back. And also got some other negative issues.: it is unable to modify content or structure of document it cannot access specific or random elements it cannot access sibling elements it is not serializable So it seems, that it is useless. THAT’S NOT TRUE! (comparison section). Every issue mentioned above can be resolved by SAX complement... Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... DOM as a processing model Document Object Model is based on a whole different idea. It doesn’t parse document and react to specific events (though it is able to), instead of this it builds up a tree based on documents structure, and store it in memory as an object. Due to this, every node in this tree is always available and can be accessed later on, many times. Moreover, strucutre stored in memory, can be easily transformed in many ways. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... DOM architecture I DOM, in contrary to SAX, is a standard developed by W3C1 . Due to standarization it has strict architecture divided into levels, each containing required and optional modules. To claim to support a level, an application must implement all the requirements of the claimed level and the levels below it. There are 3 levels, the newest (DOM 3) has been developed in 2004 and is the current release of the DOM specification. Every level has it’s core, which is a root element for other modules (figure) 1 Refernce to the standard could be found on W3C sites Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Figure: Document Object Model architecture (Adapted from original W3C specification) Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... In Java language, DOM has a different structure than SAX. Almost every class representing Document Object Model implements interfaces inherited from org.w3c.dom.Node interface. Such framework, allows very simple data manipulation and traversing between nodes contained in tree structure. It is essential to understand how elements are stored in tree (figure). For example if we want to read text data from element A, we should get his child element contatining text, not extract elements A content. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Figure: org.w3c.dom.* package class diagram From [1] Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Basic class structure using Java implementation 1 S t r i n g docURI = ” h t t p : / / e x a m p l e . o r g / n u t r i t i o n . xml ” ; 2 // g e t new D o c u m e n t B u i l d e r F a c t o r y 3 DocumentBuilderFactory docBuilderFactory = 4 DocumentBuilderFactory . newInstance ( ) ; 5 // g e t new D o c u m e n t B u i l d e r 6 DocumentBuilder d o c B u i l d e r = 7 d o c B u i l d e r F a c t o r y . n ew Do c um en t Bu il de r ( ) ; 8 // i n i t i a l i z e document w i t h n u l l 9 Document doc = n u l l ; 10 // p a r s e document 11 doc = d o c B u i l d e r . p a r s e ( docURI ) ; 12 // e x t r a c t r o o t e l e m e n t and 13 // n o r m l i z e w h o l e t r e e ( o p t i o n a l ) 14 doc . getDocumentElement ( ) . n o r m a l i z e ( ) ; Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Accessing elements 1 NodeList elements = n u l l ; 2 // g e t ” f o o d ” e l e m e n t s 3 e l e m e n t s = doc . getElementsByTagName ( ” f o o d ” ) ; 4 f o r ( i n t i =0; i <e l e m e n t s . g e t L e n g t h ( ) ; i ++) 5 // g e t ” Avocado D i p s ” 6 S t r i n g foodName = e l e m e n t s . i t e m ( i ) . getNodeName ( ) ; 7 i f ( foodName . c o n t a i n s ( ” Avocado Dip ” ) ) 8 { 9 NodeList l = elements . item ( i ) . getChildNodes ( ) ; 10 f o r ( i n t j =0; j <l . g e t L e n g t h ( ) ; j ++) 11 // p r i n t o u t c a l o r i e s 12 i f ( l . i t e m ( j ) . getNodeName ( ) . e q u a l s ( ” c a l o r i e s ” ) ) 13 System . o u t . p r i n t l n ( l . i t e m ( j ) . g e t T e x t C o n t e n t ( ) ) ; 14 } Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Modyfing elements 1 ... 2 i f ( l . i t e m ( j ) . getNodeName ( ) . e q u a l s ( ” c a l o r i e s ” ) ) 3 { 4 I n t e g e r c a l =( I n t e g e r ) ( l . i t e m ( j ) . g e t T e x t C o n t e n t ( ) ) ; 5 // i f f o o d a v o c a d o d i p h a s more t h a n 300 c a l . 6 i f ( c a l > 300) 7 { 8 El em e n t a v o c a d o d i p = l . i t e m ( j ) . g e t P a r e n t N o d e ( ) ; 9 // r e p l a c e i t w i t h low f a t f o o d 10 El em e n t newfood=doc . c r e a t e E l e m e n t ( ” LowFatFood ” ) ; 11 doc . r e p l a c e C h i l d ( newfood , a v o c a d o d i p ) ; 12 } 13 } Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Diffrent DOM implementations 1 // X e r c e s DOM i m p l e m e n t a t i o n 2 DOMParser p=new o r g . a p a c h e . x e r c e s . p a r s e r s . DOMParser ( ) ; 3 p . p a r s e ( new I n p u t S o u r c e ( xmlURI ) ) ; 4 Document doc = p . getDocument ( ) ; 5 6 // JDOM DOM i m p l e m e n t a t i o n 7 DOMBuilder b u i l d e r = o r g . jdom . i n p u t . DOMBuilder ( ) ; 8 Document d=b u i l d e r . b u i l d ( new F i l e I n p u t S t r e a m ( xmlURI ) ) ; 9 // i t ’ s o r g . jdom . Document n o t o r g . w3c . dom . Document ! 10 11 // dom4j DOM i m p l e m e n t a t i o n 12 SAXReader r e a d e r = new o r g . dom4j . i o . SAXReader ( ) ; 13 Document document = r e a d e r . r e a d ( xmlURI ) ; 14 // i t ’ s o r g . dom4j . Document n o t o r g . w3c . dom . Document ! Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Advanced DOM features I DOM provides many advanced functionalities with modules specified in standard (mainly level 3 modules). Some of them: MutationEvents module provides methods for changes listining LS, LS-Async modules provides methods for various kinds of serialization Validation module provides methods for real-time validation Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Advanced DOM features II It is important, while using specified API, to check what modules and in what version are implemented. To do this, we can use: 1 boolean hasFeature ( String feature , String v e r s i o n ) ; Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Streaming API for XML - different approach The third approach to processing XML data is based on idea to treat incoming information, about events, as a stream. Streaming API for XML use technique called pull parsing which provides a sequential access to the document adapting iterator design pattern. Associating this with java.util.Iterator is not accidenatial, because part of API implements this interface. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... StAX architecture StAX in Java divides into two (theoretically) seperate APIs: cursor API represented by XMLStreamReader and XMLStreamWriter classes. Maintained as a fast and most efficient solution. event API represented by XMLEventReader and XMLEventWriter classes. Regarded as a simple and and flexible solution. Both are specified in JSR173 and contained in javax.xml.stream.* Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Difference between SAX event-driven architecture Common view as if StAX API is similar to SAX is wrong. SAX architecture provides number of interfaces to handle incoming events. StAX Event API provides methods for iterating through event stream, and proper handling specific occurences. Moreover StAX is symmetric Read/Write API which allows also to modify and store elements. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Basic class structure 1 /∗ C r e a t i n g r e a d e r s . . . ∗/ 2 3 // c r e a t i n g i n p u t f a c t o r y 4 S t r i n g xmlURI = ” h t t p : / / e x a m p l e . o r g / n u t r i t i o n . xml ” 5 S t r i n g R e a d e r s r = new S t r i n g R e a d e r ( xmlURI ) ; 6 XMLInputFactory i f = XMLInputFactory . n e w I n s t a n c e ( ) ; 7 8 // c u r s o r API r e a d e r 9 XMLStreamReader c u r = i f . createXMLStreamReader ( s r ) ; 10 // e v e n t API r e a d e r 11 XMLEventReader e v e n t = i f . c r e a t e X M L E v e n t R e a d e r ( s r ) ; Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Identifying events I Main issue while using StAX is how to identify event which has just occured. There are many ways to do that, most simple is to check the constant connected with an event (cursor API). Constants are declared in XMLStreamConstants interface2 . For example: 1 - START ELEMENT 2 - END ELEMENT 3 - PROCESSING INSTRUCTION And so on... 2 https://java.sun.com/webservices/docs/1.5/api/javax/xml/stream/XMLStreamConstants.html Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Accessing elements by iterator I (cursor API) 1 s t a r t E l e m = XMLStreamConstants . START ELEMENT ; 2 // w h i l e t h e r e i s n e x t e v e n t 3 w h i l e ( cur . hasNext ( ) ) 4 { 5 // c a t c h e v e n t t y p e 6 i n t eventType = cur . next ( ) ; 7 System . o u t . p r i n t l n ( evenType ) ; 8 // i f e v e n t t y p e i s START ELEMENT 9 // p r i n t e l e m e n t s t e x t c o n t e n t 10 i f ( e v e n t T y p e == s t a r t E l e m ) 11 System . o u t . p r i n t l n ( c u r . g e t E l e m e n t T e x t ( ) ) ; 12 } Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Identifying events II In event API identyfing events is a bit different. XMLEventReader Provides methods: 1 XMLEvent n e x t E v e n t ( ) ; 2 boolean hasNext ( ) ; So, to identify catched event, we must analyse XMLEvent object returned from the first method. Once again there are few ways to do that. Getting event type method can be called: 1 i n t getEventType ( ) ; Or we can test if element is certain type, by one of “is“ methods. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Accessing elements by iterator II (event API) 1 // w h i l e t h e r e i s n e x t e v e n t 2 w h i l e ( event . hasNext ( ) ) 3 { 4 XMLEvent e = e v e n t . n e x t E v e n t ( ) ; 5 // i d e n t i f y e v e n t by c a s t i n g ! 6 i f ( e instanceof StartElement ) 7 { 8 // c a s t e v e n t t o s p e c i f i c e l e m e n t 9 StartElement se = ( StartElement ) e ; 10 QName name = s e . getName ( ) ; 11 // p r i n t e l e m e n t name 12 System . o u t . p r i n t l n ( name . g e t L o c a l P a r t ( ) ) ; 13 } 14 } Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Advanced iteration methods Both StAX APIs provides more complex iteration methods. 1 XMLEvent nextTag ( ) ; 2 // o n l y i n XMLEventReader 3 XMLEvent p e e k ( ) ; 4 // o n l y i n XMLStreamReader 5 v o i d r e q u i r e ( i n t t y p e , S t r i n g nsURI , S t r i n g l o c a l N ) ; First method moves cursor omitting events, until the start or end of the element. Second allows to check next event before moving cursor. And third compares cursor position with wanted value. All methods are well documented and should reviewed by reader. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... EventFilters and StreamFilters I StAX API allows to create filtered readers. It’s not necessary to create complex stream handlers to process specific events. Only thing that should be done is implementing one (or both) interface containing singular method. Interfaces: 1 E v e n t F i l t e r ( extends XMLFilter ) 2 S t r e a m F i l t e r ( extends XMLFilter ) Methods: 1 p u b l i c b o o l e a n a c c e p t ( XMLEvent e v e n t ) Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... EventFilters and StreamFilters II Implementing filter is simple: 1 p u b l i c c l a s s C h a r F i l t e r implements E v e n t F i l t e r 2 { 3 p u b l i c b o o l e a n a c c e p t ( XMLEvent e v e n t ) 4 { 5 r e t u r n ( e v e n t . g e t E v e n t T y p e ( ) == 6 XMLStreamConstants . CHARACTERS ) ; 7 } 8 } Filter above will only react to characters elements. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Writing elements I StAX as a symmetric API providing I/O handling is able to write XML data. It provides to interfaces to do that: 1 XMLEventWriter ( e x t e n d s XMLEventConsumer ) 2 XMLStreamWriter Basic difference between them, is that XMLEventWriter has less functionalities. Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Writing elements II 1 // u s i n g XMLStreamWriter 2 OutputStream c o n s o l e = System . o u t ; 3 XMLOutputFactory o f = XMLOutputFactory . n e w I n s t a n c e ( ) ; 4 XMLStreamWriter sw = o f . c r e a t e X M L S t r e a m W r i t e r ( c o n s o l e ) ; 5 sw . w r i t e S t a r t D o c u m e n t ( ” 1 . 0 ” ) ; 6 // c r e a t e document w i t h one meal 7 sw . w r i t e S t a r t E l e m e n t ( ” n u t r i t i o n ” ) ; 8 sw . w r i t e S t a r t E l e m e n t ( ” f o o d ” ) ; 9 sw . w r i t e S t a r t E l e m e n t ( ”name” ) ; 10 sw . w r i t e C h a r a c t e r s ( ” C h o c o l a t e i c e cream ” ) ; 11 sw . w r i t e E n d E l e m e n t ( ) ; 12 sw . w r i t e E n d E l e m e n t ( ) ; 13 sw . w r i t e E n d E l e m e n t ( ) ; 14 sw . writeEndDocument ( ) ; Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... Writing elements III 1 // t h e same u s i n g XMLEventWriter 2 OutputStream c o n s o l e = System . o u t ; 3 XMLEventFactory x e f = XMLEventFactory . n e w I n s t a n c e ( ) ; 4 XMLOutputFactory o f = XMLOutputFactory . n e w I n s t a n c e ( ) ; 5 XMLEventWriter ew = o f . c r e a t e X M L E v e n t W r i t e r ( c o n s o l e ) ; 6 ew . add ( x e f . c r e a t e S t a r t D o c u m e n t ( ”UTF8” , ” 1 . 0 ” ) ) ; 7 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ” n u t r i t i o n ” ) ) ; 8 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ” f o o d ” ) ) ; 9 ew . add ( x e f . c r e a t e S t a r t E l e m e n t ( n u l l , n u l l , ”name” ) ) ; 10 ew . add ( x e f . c r e a t e C h a r a c t e r s ( ” C h o c o l a t e i c e cream ” ) ) ; 11 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ; 12 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ; 13 ew . add ( x e f . c r e a t e E n d E l e m e n t ( ) ; 14 ew . add ( x e f . createEndDocument ( ) ) ; Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison SAX CASE STUDY: Parsing Really Simple Syndication (RSS) doc DOM What next? Alternatives to API’s, Java SE 7.0 features StAX Summary Further reading... XmlPull XmlPull is ancestor of StAX. Although StAX is a popular standard for parsing XML data, XmlPull didn’t retire. Due to its lightweight (JAR file - only 9 kB) XmlPull found applicable for devices with limited memory. It is often used in developing mobile applications. http://www.xmlpull.org/ Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Comparing capabilities I Developing applications processing XML data, always relates with parser choice. Selection of proper API is essential to success of the project. Although choice is not an easy task. Before making decision, ask yourself few questions: What needs to be done (using parser)? Is application platform-dependent? If so, what’s the platform? Is it a distributed system? Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Comparing capabilities I Developing applications processing XML data, always relates with parser choice. Selection of proper API is essential to success of the project. Although choice is not an easy task. Before making decision, ask yourself few questions: What needs to be done (using parser)? Is application platform-dependent? If so, what’s the platform? Is it a distributed system? Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Comparing capabilities I Developing applications processing XML data, always relates with parser choice. Selection of proper API is essential to success of the project. Although choice is not an easy task. Before making decision, ask yourself few questions: What needs to be done (using parser)? Is application platform-dependent? If so, what’s the platform? Is it a distributed system? Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Comparing capabilities I Developing applications processing XML data, always relates with parser choice. Selection of proper API is essential to success of the project. Although choice is not an easy task. Before making decision, ask yourself few questions: What needs to be done (using parser)? Is application platform-dependent? If so, what’s the platform? Is it a distributed system? Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Comparing capabilities II Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Benchmarks I Figures: From http://piccolo.sourceforge.net/bench.html Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Benchmarks II Figures: From http://piccolo.sourceforge.net/bench.html Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Benchmarks III Figures: From http://www.xml.com/lpt/a/1702 Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... Benchmarks IV Figure: From: http://www.ximpleware.com/benchmark1.html Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... CASE STUDY Parsing Really Simple Syndication documents Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... RSS definition RSS is a family of Web feed formats used to publish frequently updated content. An RSS document (which is called a ”feed“ or ”web feed“ or ”channel“) contains either a summary of content from an associated web site or the full text stored as a XML. RSS makes it possible for people to keep up with web sites in an automated manner that can be piped into applications or filtered displays. Source: http://en.wikipedia.org/wiki/RSS Wojciech Podg´rski http://podgorski.wordpress.com o eXtensible Markup Language APIs in Java 1.6

Introduction XML API’s in Java Capabilities and performance comparison CASE STUDY: Parsing Really Simple Syndication (RSS) doc What next? Alternatives to API’s, Java SE 7.0 features Summary Further reading... The initials ”RSS” are used to refer to the following formats: Really Simple Syndication (RSS 2.0) RDF Site Summary (RSS 1.0 and RSS 0.90) Rich Site Summary (RSS 0.91) While creating solution for reading/writing RSS documents we must remember that, RSS is not a standard, and doesn’t have XMLSchema doc descrbing it’s strucutre (or DTD)! Only reference could be found on: http

Add a comment

Related presentations

Related pages

Java Architecture for XML Binding (JAXB) - Oracle

A new Java API called Java Architecture for XML ... The Extensible Markup Language (XML) and Java technology are natural ... Simple API for XML ...
Read more

Java Programming Tutorial - Java & XML

Introduction to XML. XML (eXtensible Markup Language), ... There are two standard APIs for parsing XML documents: SAX (Simple API ... (Java APIs for XML ...
Read more

Extensible Markup Language (XML)

Extensible Markup Language (XML) is a simple, ... The Efficient XML Interchange Working Group is responsible for ... typically exchanged using XML or ...
Read more

Using Binary XML with Java - Oracle Help Center

5 Using Binary XML with Java. This chapter explains how to use Binary Extensible Markup Language (Binary ... Object Model (DOM) and Simple API for XML ...
Read more

Introduction to JAXP - Java API for XML Processing (JAXP ...

The Java API for XML ... JAXP leverages the parser standards Simple API for XML Parsing ... JAXP also supports the Extensible Stylesheet Language ...
Read more

XML - Wikipedia, the free encyclopedia

Extensible Markup Language (XML) ... Simple API for XML. Simple API for XML (SAX) ... List of XML markup languages;
Read more

Learn Java Programming - XML - Blogs - java-forums.org

XML. Extensible Markup Language ... But Java 5.0 (JAXP 1.3 ... Using the JAXP validation framework is fairly simple and efficient.
Read more

Lessons from the Component Wars: An XML Manifesto

... Simple API for XML (SAX ... XML will never replace programming languages such as C or Java. XML will probably ... Extensible Markup Language (XML) ...
Read more

W3C - XML Essentials

XML Technology . XML Essentials; Efficient Interchange; ... What is XML? The Extensible Markup Language ... people using XML for a specific purpose will ...
Read more