Saturday, 22 March 2008

Compression and Base64 Encoding in .NET 2.0 and Java platforms.

Introduction

Distributed Enterprise applications typically involve heterogeneous platforms where different layers would be developed using heterogeneous technologies and frameworks. The co-existence and communication between the layers adds to the complexity of the application being developed. The applications will suffer from poor performance when huge amount of data is transferred across layers. This article discusses the usage of message compression technology that can be used in .NET 2.0 and Java based applications to decrease the size of message transferred which increases the overall performance and response time.

Distributed scenario

In this article we will consider a scenario where the client application is a .NET 2.0 smart client which forms the presentation layer. The business layer is developed using the Java technology which includes any business logic required by the presentation layer. The Business Service layer is responsible for responding to .NET client application queries. The Business Service receives requests from the client in the form of Xml messages, performs the necessary processing which may involve interaction with the database and sends the results back to the client in the form of xml messages. The data is transferred across the layers in the form of Xml messages.

The pictorial representation of this scenario is shown below:

Picture-1

Figure - 1

The key factor here is to define the contract between the Client and the Business Service and both parties can understand what each other are talking about. This can be achieved with the help of Xml schema where we define the strict contract for request and response messages.

Request and Response Interface

We will define a simple request and response interface schema that will be used by the .NET client and the Java Business service to construct the Xml data. Both request and response interfaces defines CompressedBase64Clob which is a xs:base64Binary data type. This clob structure holds compressed and base 64 encoded data that can be transferred across layers.

The schema representation is shown below:

Picture-2

Figure - 2

The code snippets for compression and decompression further described in this article don’t consider the intricacies of populating and reading Xml files. An Xml interface is defined in order to give an overall picture and the practical scenario where the message compression can be used.

Compression and Encoding stuffs

Both and .NET and Java have defined Compression packages as part of their libraries which can be used in client and server applications. This article describes compression and decompression of data using GZip implementation. The .NET has implemented GZip in System.IO.Compression.GZipStream class and the Java counter part is present in java.util.zip package. The description of how the GZip algorithm and any implementation are outside the scope of this article.

Encoding is the process of transforming information from one format into another. The opposite operation is called decoding [source: Wikipedia]. In this article we will be using Base64 encoding before the data is being transferred from the client to the Business service and vice versa.

The term "Base64" refers to a specific MIME content transfer encoding. It is also used as a generic term for any similar encoding scheme that encodes binary data by treating it numerically and translating it into a base 64 representation. The particular choice of base is due to the history of character set encoding: one can choose 64 characters that are both part of the subset common to most encodings, and also printable. This combination leaves the data unlikely to be modified in transit through systems, such as email, which were traditionally not 8-bit clean. MIME Base64 uses A–Z, a–z, and 0–9 for the first 62 digits [source: Wikipedia].

Both .NET and Java have standard way of converting to and from Base64 representation.

.NET 2.0 way of Compression and Decompression with Base64 encoding

System.IO.Compression namespace defines GZipStream class which consists of methods and properties to compress and decompress streams. In this article a utility class NCompressor is defined which exposes methods to compress and decompress the data in base64 encoded format.

Class Diagram

The class diagram for NCompressor is given below:

Picture-3

Figure - 3

NCompressor is a static class which exposes methods CompressToBase64String() and DecompressFromBase64String() for compressing and decompressing data.

CompressToBase64String

The code snippet shown below details compressing a string and then encoding the compressed byte array in Base64 format.

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.IO.Compression;

namespace Compression
{
public static class NCompressor
{
private static int MAX_BUFFER_SIZE = 1024;

private static byte[] Compress(byte[] byteArray)
{
MemoryStream inputMemoryStream = new MemoryStream(byteArray);
MemoryStream outputMemoryStream = new MemoryStream();;
byte[] compressedBuffer = null;

using (GZipStream outputGZipStream = new GZipStream(outputMemoryStream, CompressionMode.Compress))
{
byte[] buffer = new byte[MAX_BUFFER_SIZE];
int bytesRead = -1;

while ((bytesRead = inputMemoryStream.Read(buffer, 0, MAX_BUFFER_SIZE)) > 0)
{
outputGZipStream.Write(buffer, 0, bytesRead);
}
}
compressedBuffer = outputMemoryStream.ToArray();

if (outputMemoryStream != null) { outputMemoryStream.Close(); }
if (inputMemoryStream != null) { inputMemoryStream.Close(); }

return compressedBuffer;
}

public static string CompressToBase64String(string data)
{
return Convert.ToBase64String(Compress(Encoding.UTF8.GetBytes(data)));
}
}
}

Code snippet - 1

CompressToBase64String() is a public method which accepts the data in string format. This method calls the private Compress() method to actually compress the byte stream. The string can be converted to byte stream using the Encoding class and with the appropriate encoding that is used in the application. In this instance, I have used UTF8 encoding since the xml data by default uses UTF8 encoding.

Once we have the byte representation of the string, the MemoryStreams are used in order to read and write the compressed data to and from memory. This can be easily tweaked to write into any other stream but this example uses memory stream for illustrative purpose. Create an instance of the GZipStream class pointing to the output memory stream with the Compress as its mode which indicates that the byte array must be compressed. Read the byte array in chunks of 1024 bytes from the InputMemoryStream and write it to the GZipStream which will be stored in the OutputMemoryStream and can be accessed later once the GZipStream is finalised. Once we’re done with reading and writing the source byte array, the compressed buffer can be retrieved from the OutputMemoryStream after the GZipStream is closed. Note that without closing the GZipStream, if any attempt is made to retrieve the compressed byte array would result in invalid data being read which can’t be decompressed later. This is behaviour is by design and Microsoft BCL team claims that the required footer information would be written and the compression will be finalised only after the GZipStream is closed. Finally, the compressed buffer is converted to Base64 representation using Convert.ToBase64String().

DecompressFromBase64String

This process is the reverse of compression where the compressed base64 string representation is passed to retrieve the original string data. The code snippet for decompression is shown below:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.IO.Compression;

namespace Compression
{
public static class NCompressor
{
private static int MAX_BUFFER_SIZE = 1024;

private static byte[] Decompress(byte[] byteArray)
{
MemoryStream inputMemoryStream = new MemoryStream(byteArray);
MemoryStream outputMemoryStream = new MemoryStream();
byte[] decompressedBuffer = null;

using (GZipStream inputGZipStream = new GZipStream(inputMemoryStream, CompressionMode.Decompress))
{
byte[] buffer = new byte[MAX_BUFFER_SIZE];
int bytesRead = -1;

while ((bytesRead = inputGZipStream.Read(buffer, 0, MAX_BUFFER_SIZE)) > 0)
{
outputMemoryStream.Write(buffer, 0, bytesRead);
}
}

decompressedBuffer = outputMemoryStream.ToArray();

if (inputMemoryStream != null) { inputMemoryStream.Close(); }
if (outputMemoryStream != null) { outputMemoryStream.Close(); }

return decompressedBuffer;
}

public static string DecompressFromBase64String(string base64String)
{
return Encoding.UTF8.GetString(Decompress(Convert.FromBase64String(base64String)));
}
}
}

Code snippet - 2

DecompressFromBase64String() is a public method which accepts the compressed and base64 encoded string. This method calls the private Decompress() method by passing the byte array from the base64 string. This uses Convert.FromBase64String() to convert the base64 representation to the byte array format.

Similarly how the compression makes use of MemoryStreams and GZipStream, the decompression method also uses memory and GZip stream in the same way except that it tells the GZipStream to decompress the data instead of compressing it.

Java way of Compression and Decompression with Base64 encoding

Package java.util.zip defines GZIPInputStream and GZIPOutputStream classes which exposes methods to compress and decompress streams. In this article a utility class JCompressor is defined which exposes methods to compress and decompress the data in base64 encoded format.

Class Diagram

The class diagram for JCompressor is given below:

Picture-4

Figure - 4

This example uses Base64 class which is defined in java.util.prefs package. This class is not visible outside the package but the source code is available which can be included in applications. However, make sure no copy right being violated before copying the source code. Any similar implementation of Base64 should also serve the purpose.

compressAndBase64Encode

The code snippet shown below details compressing a string and then encoding the compressed byte array in Base64 format.

package Compression;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

public final class JCompressor {

private static final int MAX_BUFFER_SIZE = 1024;

private JCompressor() {
// Prevent instantiation
}

public static String compressAndBase64Encode(String data) throws IOException {
byte[] unCompressedBytes = data.getBytes();
byte[] compressedBytes = compress(unCompressedBytes);
String base64EncodedString = Base64.byteArrayToBase64(compressedBytes);

return base64EncodedString;
}

private static byte[] compress(byte[] unCompressedBytes) throws IOException {

ByteArrayOutputStream bos = new ByteArrayOutputStream();
GZIPOutputStream out = new GZIPOutputStream(bos);

ByteArrayInputStream in = new ByteArrayInputStream(unCompressedBytes);

byte[] buf = new byte[MAX_BUFFER_SIZE];
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
in.close();

out.finish();
out.close();

byte[] compressedBytes = bos.toByteArray();
return compressedBytes;
}
}

Code snippet - 3

compressAndBase64Encode() is a public method which accepts the data in string format. This method calls the private compress() method to actually compress the byte stream. The string can be converted to byte stream using the getBytes() method from the String class.

This piece of code looks very similar to the compression code written in .NET except that we have different streams for writing and reading the compressed data. ByteArrayOutputStream and GZipOutputStream combination is used to compress the data. Once we have the compressed buffer, Base64.byteArrayToBase64() method is used to convert the compressed byte array into Base64 string representation.

base64DecodeAndUncompress

The code snippet shown below details decompressing a compressed base64 string and then encoding the compressed byte array in string format.

package Compression;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

public final class JCompressor {

private static final int MAX_BUFFER_SIZE = 1024;

private JCompressor() {
// Prevent instantiation
}

public static String base64DecodeAndUncompress(String base64EncodedCompressedString) throws IOException {

byte[] compressedBytes;
compressedBytes = Base64.base64ToByteArray(base64EncodedCompressedString);
byte[] unCompressedBytes = unCompress(compressedBytes);

return new String(unCompressedBytes);
}

private static byte[] unCompress(byte[] compressedBytes) throws IOException {

ByteArrayInputStream instream = new ByteArrayInputStream(compressedBytes);

GZIPInputStream in = new GZIPInputStream(instream);
ByteArrayOutputStream out = new ByteArrayOutputStream();

byte[] buf = new byte[MAX_BUFFER_SIZE];
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
in.close();

out.close();

// turn the compressed stream into a string
byte[] unCompressedBytes = out.toByteArray();
return unCompressedBytes;
}
}

Code snippet - 4

base64DecodeAndUncompress() is a public method which accepts the compressed base64 string parameter. This method calls the private unCompress() method to actually decompress the byte stream. The base64 string is converted to byte stream using the Base64.base64ToByteArray() method from the Base64 class.

This piece of code looks very similar to the decompression code written in .NET except that we have different streams for writing and reading the decompressed data. ByteArrayInputStream and GZipInputStream combination is used to decompress the data. Once we have the decompressed buffer, a new string representation is constructed from the decompressed buffer before returning to the caller.

Conclusion

The compression and decompression solution can be implemented in distributed scenarios where we can avoid huge amount of data being transferred across the network clogging most of the bandwidth. Also if there are any messaging frameworks like MSMQ or JMS involved in transferring the messages, they suffer from performance problems in transferring huge amount of data across the network. By compressing the required data we can achieve performance improvements and reduce bandwidth usage considerably. The approach described in this article uses memory streams for compressing and decompressing the data. If the request and response Xml messages are huge, server side applications may suffer from memory related issues and hence memory stream approach may not be the optimal one. Any alternate approach which involves storing/flushing the data to the disk at appropriate intervals may need to be considered. The environment, data size and any constraints must be thoroughly analysed before implementing this solution.

No comments: