Apache Camel Splitter using Tokenizer Expression

0

Camel Splitter allows you to split a payload into a number of fragments, thus allowing them to be processed individually.
See article on Apache Camel Splitter for an example.
If you are using XPath to split the XML file, it would cause the entire content to be loaded into memory. In this article, we will see how to use the Camel tokenizer to achieve this.

Split XML using Camel Tokenizer Expression

You can split the XML file using the tokenizer expression and tag name as the token.
The payload that we want to split is an XML document with <articles> as the root element containing one or more <article> elements.

articles.xml:

<?xml version="1.0" encoding="UTF-8"?>
<articles>
  <article category="camel" title="Camel Splitter Example">
    <tags>camel,eia</tags>
    <authors>
    <author>John</author>
    </authors>
  </article>
  <article category="camel" title="Camel JMS Component">
    <tags>camel,jms</tags>
    <authors>
      <author>John</author>
      <author>Joe</author>
    </authors>
  </article>
  <article category="spring" title="Spring JdbcTemplate">
   <tags>spring,jdbc</tags>
    <authors>
      <author>Ram</author>
      <author>Dean</author>
    </authors>
  </article>
</articles>

We will use tokenizeXML() method to split the file using the tag name of the child node, in our case we will split it by <article>.

Header header.CamelSplitSize will hold the total number of Exchanges that was splitted.

                    from("direct:author")
                    .log("Find all the authors")
                    .split().tokenizeXML("author")
                    .log("${body} split size: ${header.CamelSplitSize}")                    
                    .end();

CamelSplitXmlTokenizeUsingXPathExample:

package com.javarticles.camel;

import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.camel.CamelContext;
import org.apache.camel.ProducerTemplate;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.impl.DefaultCamelContext;

public class CamelSplitXmlTokenizeExample {
    public static final void main(String[] args) throws Exception {
        CamelContext camelContext = new DefaultCamelContext();
        try {
            camelContext.addRoutes(new RouteBuilder() {
                public void configure() {
                    from("direct:author")
                    .log("Find all the authors")
                    .split().tokenizeXML("author")
                    .log("${body} split size: ${header.CamelSplitSize}")                    
                    .end();      
                }
            });
            ProducerTemplate template = camelContext.createProducerTemplate();
            camelContext.start();
            String filename = "target/classes/articles.xml";
            InputStream articleStream = new FileInputStream(filename);
            template.sendBody("direct:author", articleStream);
        } finally {
            camelContext.stop();
        }
    }
}

Output:

15:01| INFO | MarkerIgnoringBase.java 95 | Find all the authors
15:01| INFO | MarkerIgnoringBase.java 95 | <author>John</author> split size: 5
15:01| INFO | MarkerIgnoringBase.java 95 | <author>John</author> split size: 5
15:01| INFO | MarkerIgnoringBase.java 95 | <author>Joe</author> split size: 5
15:01| INFO | MarkerIgnoringBase.java 95 | <author>Ram</author> split size: 5
15:01| INFO | MarkerIgnoringBase.java 95 | <author>Dean</author> split size: 5

Split XML in Streaming Mode

If your payload is big, you may want to split it in streaming mode which means it will split the input message in chunks. This will be useful to reduce the memory overhead.
Unlike XPath engine which loads the entire XML content into memory tokenizeXML() expression which will iterate the XML payload in a streamed fashion. In order to enable streaming you need to call streaming() in your DSL.

CamelSplitXmlTokenizeStreamingExample:

package com.javarticles.camel;

import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.camel.CamelContext;
import org.apache.camel.ProducerTemplate;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.impl.DefaultCamelContext;

public class CamelSplitXmlTokenizeStreamingExample {
    public static final void main(String[] args) throws Exception {
        CamelContext camelContext = new DefaultCamelContext();
        try {
            camelContext.addRoutes(new RouteBuilder() {
                public void configure() {
                    from("direct:authorStreaming")
                    .log("Find all the authors using tokenizer/streaming")
                    .split().tokenizeXML("author").streaming()
                    .log("${body} split size: ${header.CamelSplitSize}")                    
                    .end();    
                }
            });
            ProducerTemplate template = camelContext.createProducerTemplate();
            camelContext.start();
            String filename = "target/classes/articles.xml";
            InputStream articleStream = new FileInputStream(filename);
            template.sendBody("direct:authorStreaming", articleStream);
        } finally {
            camelContext.stop();
        }
    }
}

In case of stream based splitting, header header.CamelSplitSize is applicable only for the completed exchange.

Output:

15:00| INFO | MarkerIgnoringBase.java 95 | Find all the authors using tokenizer/streaming
15:00| INFO | MarkerIgnoringBase.java 95 | <author>John</author> split size: 
15:00| INFO | MarkerIgnoringBase.java 95 | <author>John</author> split size: 
15:00| INFO | MarkerIgnoringBase.java 95 | <author>Joe</author> split size: 
15:00| INFO | MarkerIgnoringBase.java 95 | <author>Ram</author> split size: 
15:00| INFO | MarkerIgnoringBase.java 95 | <author>Dean</author> split size: 5

Split XML in groups

If you want to have a control on the number of XML chunks split then you can use the group option that allows you to group the N parts together. In this example, we split the XML in group of two so you will see two set of <articles> in one group.

 from("direct:article")
                    .log("Find all the articles in group of 2")
                    .split().tokenizeXML("article", 2)
                    .log("${body}")                    
                    .end(); 

CamelSplitXmlTokenizeByGroupExample:

package com.javarticles.camel;

import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.camel.CamelContext;
import org.apache.camel.ProducerTemplate;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.impl.DefaultCamelContext;

public class CamelSplitXmlTokenizeByGroupExample {
    public static final void main(String[] args) throws Exception {
        CamelContext camelContext = new DefaultCamelContext();
        try {
            camelContext.addRoutes(new RouteBuilder() {
                public void configure() {
                    from("direct:article")
                    .log("Find all the articles in group of 2")
                    .split().tokenizeXML("article", 2)
                    .log("${body}")                    
                    .end();                                        
                }
            });
            ProducerTemplate template = camelContext.createProducerTemplate();
            camelContext.start();
            String filename = "target/classes/articles.xml";
            InputStream articleStream = new FileInputStream(filename);
            template.sendBody("direct:article", articleStream);
        } finally {
            camelContext.stop();
        }
    }
}

Output:

14:58| INFO | MarkerIgnoringBase.java 95 | Find all the articles in group of 2
14:58| INFO | MarkerIgnoringBase.java 95 | <article category="camel" title="Camel Splitter Example">
    <tags>camel,eia</tags>
    <authors>
    <author>John</author>
    </authors>
  </article><article category="camel" title="Camel JMS Component">
    <tags>camel,jms</tags>
    <authors>
      <author>John</author>
      <author>Joe</author>
    </authors>
  </article>
14:58| INFO | MarkerIgnoringBase.java 95 | <article category="spring" title="Spring JdbcTemplate">
   <tags>spring,jdbc</tags>
    <authors>
      <author>Ram</author>
      <author>Dean</author>
    </authors>
  </article>

Split XML Using Tokenize for a Namespace

Suppose your root or parent tag belongs to a namespace, in our example, <articles> belong to namespace https://www.javarticles.com/schema/articles, the child elements can inherit namespaces from a root/parent tag.

articlesNs.xml:

<?xml version="1.0" encoding="UTF-8"?>
<c:articles xmlns:c="https://www.javarticles.com/schema/articles">
  <c:article category="camel" title="Camel Splitter Example">
    <c:tags>camel,eia</c:tags>
    <c:authors>
    <c:author>John</c:author>
    </c:c:authors>
  </c:c:article>
  <c:article category="camel" title="Camel JMS Component">
    <c:tags>camel,jms</c:tags>
    <c:authors>
      <c:author>John</c:author>
      <c:author>Joe</c:author>
    </c:c:authors>
  </c:article>
  <c:article category="spring" title="Spring JdbcTemplate">
   <c:tags>spring,jdbc</c:tags>
    <c:authors>
      <c:author>Ram</c:author>
      <c:author>Dean</c:author>
    </c:authors>
  </c:article>
</c:articles>

You can make the token element inherit the parent’s namespace by the name of the root/parent tag:

from("direct:article")
                    .log("Split by article Element")
                    .split().tokenizeXML("c:article", "c:articles")
                        .log("${body}")                    
                    .end();    

CamelSplitXmlTokenizeNamespaceExample:

package com.javarticles.camel;

import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.camel.CamelContext;
import org.apache.camel.ProducerTemplate;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.impl.DefaultCamelContext;

public class CamelSplitXmlTokenizeNamespaceExample {
    public static final void main(String[] args) throws Exception {
        CamelContext camelContext = new DefaultCamelContext();
        try {
            camelContext.addRoutes(new RouteBuilder() {
                public void configure() {
                    from("direct:article")
                    .log("Split by article Element")
                    .split().tokenizeXML("c:article", "c:articles")
                        .log("${body}")                    
                    .end();                    
                }
            });
            ProducerTemplate template = camelContext.createProducerTemplate();
            camelContext.start();
            String filename = "target/classes/articlesNs.xml";
            InputStream articleStream = new FileInputStream(filename);
            template.sendBody("direct:article", articleStream);
        } finally {
            camelContext.stop();
        }
    }
}

Output:

15:08| INFO | MarkerIgnoringBase.java 95 | Split by article Element
15:08| INFO | MarkerIgnoringBase.java 95 | <c:article category="camel" title="Camel Splitter Example" xmlns:c="https://www.javarticles.com/schema/articles">
    <c:tags>camel,eia</c:tags>
    <c:authors>
    <c:author>John</c:author>
    </c:c:authors>
  </c:c:article>
15:08| INFO | MarkerIgnoringBase.java 95 | <c:article category="camel" title="Camel JMS Component" xmlns:c="https://www.javarticles.com/schema/articles">
    <c:tags>camel,jms</c:tags>
    <c:authors>
      <c:author>John</c:author>
      <c:author>Joe</c:author>
    </c:c:authors>
  </c:article>
15:08| INFO | MarkerIgnoringBase.java 95 | <c:article category="spring" title="Spring JdbcTemplate" xmlns:c="https://www.javarticles.com/schema/articles">
   <c:tags>spring,jdbc</c:tags>
    <c:authors>
      <c:author>Ram</c:author>
      <c:author>Dean</c:author>
    </c:authors>
  </c:article>

Split XML using tokenizer using Spring DSL

You can use <tokenize> element in the Spring DSL to split bodies. The token is provided in attribute token.

applicationContext.xml:

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
        http://camel.apache.org/schema/spring http://camel.apache.org/schema/spring/camel-spring.xsd
       ">
	<camelContext xmlns="http://camel.apache.org/schema/spring"
		xmlns:c="https://www.javarticles.com/schema/articles">
		<route>
			<from uri="direct:article" />
			<log message="Split by article Element" />
			<split streaming="true">
				<tokenize token="c:article" inheritNamespaceTagName="c:articles" xml="true" group="2"/>
				<log message="${body}" />
			</split>
		</route>
	</camelContext>
</beans>

CamelSplitXmlTokenizeUsingSpringDSL:

package com.javarticles.camel;

import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.camel.CamelContext;
import org.apache.camel.ProducerTemplate;
import org.apache.camel.spring.SpringCamelContext;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class CamelSplitXmlTokenizeUsingSpringDSL {
    public static final void main(String[] args) throws Exception {
        ApplicationContext appContext = new ClassPathXmlApplicationContext(
                "applicationContext.xml");
        CamelContext camelContext = SpringCamelContext.springCamelContext(
                appContext, false);
        try {            
            camelContext.start();
            ProducerTemplate template = camelContext.createProducerTemplate();
            camelContext.start();
            String filename = "target/classes/articlesNs.xml";
            InputStream articleStream = new FileInputStream(filename);
            template.sendBody("direct:article", articleStream);
        } finally {
            camelContext.stop();
        }
    }
}

Output:

18:08| INFO | MarkerIgnoringBase.java 95 | Split by article Element
18:08| INFO | MarkerIgnoringBase.java 95 | <c:article category="camel" title="Camel Splitter Example" xmlns:c="https://www.javarticles.com/schema/articles">
    <c:tags>camel,eia</c:tags>
    <c:authors>
    <c:author>John</c:author>
    </c:c:authors>
  </c:c:article><c:article category="camel" title="Camel JMS Component" xmlns:c="https://www.javarticles.com/schema/articles">
    <c:tags>camel,jms</c:tags>
    <c:authors>
      <c:author>John</c:author>
      <c:author>Joe</c:author>
    </c:c:authors>
  </c:article>
18:08| INFO | MarkerIgnoringBase.java 95 | <c:article category="spring" title="Spring JdbcTemplate" xmlns:c="https://www.javarticles.com/schema/articles">
   <c:tags>spring,jdbc</c:tags>
    <c:authors>
      <c:author>Ram</c:author>
      <c:author>Dean</c:author>
    </c:authors>
  </c:article>

Download the source code

This was an example about using Apache Camel tokenize to Split XML.

You can download the source code here: camelSplitXmlTokenizerExample.zip
Share.

Comments are closed.