JAXP SAX Parser Example

0

In this article we will see an example of SAX parsing. SAX (Simple API for XML) is an event-driven, serial-access mechanism that does element-by-element processing.

Since it loads the elements in chunk, it doesn’t use much of the memory which is why it is a preferred parser for large size XML.

Java provides a full support to implement SAX, we don’t need to add any external jar but Java API for XML Processing (JAXP) allows you to plug in any implementation of the SAX API and override the default one.

SAX Flow

SAX API reads the XML as I/O stream and the framework delivers SAX events as it reads the data. Once a batch of data is read the position, the parser cannot go back to an earlier position or leap ahead to a different position.
Whenever SAX API comes across an event,it calls a callback method from the application. For example, the SAX parser calls one method in your application when an element tag is encountered and calls a different method when text is found.

Here is the SAX parser flow:

SAX Flow

SAX Flow

Difference between SAX and DOM

SAX is an event based parser where it reads data in batches, calls the callback methods from the application as and when it encounters certain events. For example, it calls startDocument() when the parser is at the beginning of the document. It calls startElement() when it encounters opening tag. It next calls characters() and sends in the character data inside the element. It then calls endElement() when it encounters the end of an element. This continues as and when it encounters starting and ending of elements. Finally, the parser calls endDocument().

Since SAX is a stream based parser it requires much less memory than DOM, because SAX does not construct an internal representation of the XML data, as a DOM does.

SAX Parser Factory

In this example, we have eliminated namespace names for simplicity. In order to parse the XML, we need to follow the below steps:

  1. Create SAX Parser Factory
    SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
    
  2. Create SAX Parser
    SAXParser saxParser = saxParserFactory.newSAXParser();
    
  3. We need a content handler which converts the XML into POJO objects.
    Extend DefaultHandler and implement the event handlers.
    The handler most probably will end up implementing callbacks startElement(), characters() and endElement().
  4. Once the parser and content handler is created, all we have to do parse the XML.
    saxParser.parse(SAXParserExample.class.getResourceAsStream("emp.xml"), new EmpXmlHandler());

Let’s start with the example.

Here is the XML we want to parse.

employee.xml:

<?xml version="1.0"?>
<employees>
	<employee id="1">
		<name>Joe</name>
		<age>34</age>
	</employee>
	<employee id="2">
		<name>Sam</name>
		<age>24</age>
	</employee>
	<employee id="3">
		<name>John</name>
		<age>44</age>
	</employee>
</employees>

Here is the POJO representation of the XML.

Employee:

package com.javarticles.sax;

public class Employee {
    private String name;
    private Integer age;
    private Integer id;
    
    Employee(){}
    
    Employee(Integer id, String name, Integer age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public Integer getAge() {
        return age;
    }
    public void setAge(Integer age) {
        this.age = age;
    }
    public Integer getId() {
        return id;
    }
    public void setId(Integer id) {
        this.id = id;
    }
    public String toString() {
        return "Employee(id:"+ id + ", name:" + name + ", age:" + age + ")";
    }
}

In the below custom handler, we implement startDocument(), startElement(), endElement(), characters() and endDocument().

  1. startDocument() – We initialize the employee list
  2. startElement() – We create Employee bean and push it to stack. We also parse the id attribute and set employee’s id.
  3. endElement() – In here, we parse the name and age elements. empStack.peek() returns us the current employee bean. Once employee end element is encountered, we push the constructed employee bean to the parsed employee list.
  4. characters – We construct the element’s value.
  5. endDocument – Once the end of document is encountered, we print the employee beans.

EmpXmlHandler:

package com.javarticles.sax;

import java.util.ArrayList;
import java.util.List;
import java.util.Stack;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class EmpXmlHandler extends DefaultHandler {
    private List empList;
    private StringBuilder currentValue;
    private Stack values = new Stack<>();
    private Stack empStack = new Stack<>();

    public void startDocument() throws SAXException {
        empList = new ArrayList();
        System.out.println("Document parsing started");
    }

    public void endDocument() throws SAXException {
        System.out.println("end document");
        System.out.println("no. of employees: " + empList.size());
        empList.forEach(System.out::print);
    }

    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        values.push(currentValue);
        currentValue = new StringBuilder();
        if (qName.equals("employee")) {
            Employee emp = new Employee();
            emp.setId(Integer.parseInt(attributes.getValue("id")));
            empStack.push(emp);           
        }
    }

    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        if (qName.equals("employee")) {
            empList.add(empStack.pop());           
        } else if (qName.equals("name")) {
            empStack.peek().setName(currentValue.toString());         
        } else if (qName.equals("age")) {
            empStack.peek().setAge(Integer.parseInt(currentValue.toString()));
        }
    }

    public void characters(char ch[], int start, int length)
            throws SAXException {
        currentValue.append(ch, start, length);
    }

}

SAXParserExample:

package com.javarticles.sax;

import java.io.IOException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.SAXException;


public class SAXParserExample {
    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
        SAXParser saxParser = saxParserFactory.newSAXParser();
        saxParser.parse(SAXParserExample.class.getResourceAsStream("emp.xml"), new EmpXmlHandler());
    }
}

Output:

Document parsing started
end document
no. of employees: 3
Employee(id:1, name:Joe, age:34)Employee(id:2, name:Sam, age:24)Employee(id:3, name:John, age:44)

Download the source code

This was an example about SAX Parser.

You can download the source code here: saxParserExample.zip

About Author

Ram's expertise lies in test driven development and re-factoring. He is passionate about open source technologies and loves blogging on various java and open-source technologies like spring. You can reach him at rsatish.m@gmail.com

Comments are closed.