Friday, June 3, 2016

Reading and Writing Binary files in java - Converting Endianness ( between Big-endian byte order and Little-endian byte order)

In this post we will look into working with binary files in java. The end goal of the post will be to create simple java application that reads a binary file and write it back to a different binary file. The difference between the two files would be the byte order. If we read in a file with Big-endian byte order we will write it back with a Little-endian byte order and vice-versa. Lets first understand what is meant by byte order and what Big-endian and Little-endian are. If you are already familiar with this you can skip this section of the post

You can find the complete code for Binary format converter that is explained in this post at GitHub - https://github.com/pulasthi/binary-format-converter

Understanding byte order and Endianness


The endianness refers to the order that bytes are stored when storing multi byte values such as Integers and Doubles this is also known as the byte order. The endianness does not have any meaning when you consider a single byte that is stored, and it is the same for both Big-endian and Little-endian. But when you consider values that span multiple bytes then the byte order is very important since you might read incorrect values if you read assuming a wrong byte order. 

Big-endian

The most significant end is stored first. To understand this lets take an example of a hexadecimal value 324F with the big-endian system this will stored in the memory as 324F that is if we consider memory addresses 32 would be stored in memory address 1 and 4F would be stored in memory address 2.

Little-endian

The least significant end is stored first. Again going by the same example 324F, this would be saved in memory as 4F32 that is 4F would be saved in memory address 1 and 32 would be saved in memory address 2.

Reading From Binary File

The following code reads from the binary files ans stores all the data in ByteBuffer. In order to create the byte buffer we need to allocate the amount of memory it would need to read in the specified file.

FileChannel fc = (FileChannel) Files.newByteChannel(Paths.get(filename), StandardOpenOption.READ);
ByteBuffer byteBuffer = ByteBuffer.allocate((int)fc.size());
byteBuffer.order(ByteOrder.BIG_ENDIAN);
fc.read(byteBuffer);
byteBuffer.flip();


Here the file name gives the path to the binary file we need to read from. After we create a FileChannel we can use that to find out the size of the binary file with the "size" method. we can use this to allocate space for the ByteBuffer. And here we assume that we are reading from a binary file that is written in Big-endian format so we specify the byte order of the buffer to be Big-endian. You can also tell the program to use the native byte order of the machine by ByteOrder.nativeOrder().


ByteOrder.nativeOrder() // use the native byte order
ByteOrder.BIG_ENDIAN // use Big-endian
ByteOrder.LITTLE_ENDIAN // use Little-endian

After the proper byte order is set we can then invoke the FileChannel read method and pass it the byte buffer we created. This method will copy the data in the file that was specified into the byte buffer that we created. After we read in the information it is important to call the flip method in ByteBuffer. This will set the current position of the buffer as the limit of the buffer and take the current position back to 0. This will allow us to now use the byte buffer to write or get what we have read from the file.

Working with Data Types

Since we are working with endianness it will only make sense if we have a multi byte data type in the input binary file. Lets assume that the data type in the binary file is Short ( That is 2 bytes ). And we want to extract all the short values that are in the binary file into an Short array. The following code will allow us to do just that.

Buffer buffer = byteBuffer.asShortBuffer();
short[] shortArray = new short[(int)fc.size()/2];
((ShortBuffer)buffer).get(shortArray);

First we need to convert the byte buffer into a Short buffer. And then create a short array that will hold the values that we get from the short buffer. since we know that each Short has 2 bytes we can just use the size method as before and divide it by 2. And the we can use the get method to copy the values in the buffer into the array we defined. ( Note: Instead of using a Buffer and then casting it to ShortBuffer you can also directly use ShortBuffer ).

For other data types you can approach this step in a similar way. The complete code available at GitHub Repo has handlers for all the data types

Writing Back to file 

Now that we have read the binary file we can do some modifications and write the data back to a separate binary file. The change i will be doing is to change the byte order of the data. If we read a file with Big-Endian format, we write it back in Little-Endian format and vice versa. The following code segment will use the short array that we created and and create a new byte buffer from it. You can use a new byte buffer or use the old one since we do not need it anymore. To save on memory we will use the existing byte buffer.

byteBuffer.clear();
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
ShortBuffer shortOutputBuffer = byteBuffer.asShortBuffer();
shortOutputBuffer.put(shortArray);

FileChannel out = new FileOutputStream(outputfilename).getChannel();
out.write(byteBuffer);
out.close();


To clear the byte buffer we simply call the clear method in ByteBuffer. Then we set the byte order of the buffer. Notice that we this time set it to Little-Endian, this is because we read a file with Big-Endian format and we want to write it back to the new file as a Little-Endian format. As before we get a Short Buffer from the byte buffer and add the short array to the buffer through the put method. Note that since we created to short buffer from the byte buffer adding content to the short buffer essentially means that we will add content to the byte buffer but taking into consideration that the data is of type Short. 

And Finally we can create a new FileChannel to a new output file and write the byteBuffer to the specified file. The complete code of the program is listed below and is also available in GitHUb Repo - binary-format-converter  under Apache 2.0 Licence. This is just a quick code i put together so there may be improvements that can be done, you are welcome to send any improvements via GitHub so i can add the changes to the repo.

Instruction to compile and run the program are in the GitHub repo README.md file. I hope you were able to learn something new from the post.


import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.*;
import java.nio.channels.FileChannel;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

/**
 * Created by pulasthi on 5/31/16.
 */
public class BinaryFormatConverter {
    private static ByteOrder endianness = ByteOrder.BIG_ENDIAN;
    private static int dataTypeSize = Short.BYTES;

    public static void main(String[] args) {
        // args[2] takes values big or little for endianness
        // arg[3] takes one of the primitive type names in lower case
        String file = args[0];
        String outputfile = args[1];
        endianness =  args[2].equals("big") ? ByteOrder.BIG_ENDIAN : ByteOrder.LITTLE_ENDIAN;

        ConvertFormat(file,outputfile,endianness,args[3]);
    }

    private static void ConvertFormat(String filename, String outputfilename, ByteOrder endianness, String dataType) {
        try(FileChannel fc = (FileChannel) Files
                .newByteChannel(Paths.get(filename), StandardOpenOption.READ)) {
            ByteBuffer byteBuffer = ByteBuffer.allocate((int)fc.size());

            if(endianness.equals(ByteOrder.BIG_ENDIAN)){
                byteBuffer.order(ByteOrder.BIG_ENDIAN);
            }else{
                byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
            }
            fc.read(byteBuffer);
            byteBuffer.flip();

            Buffer buffer;
            switch (dataType){
                case "short":
                    buffer = byteBuffer.asShortBuffer();
                    short[] shortArray = new short[(int)fc.size()/2];
                    ((ShortBuffer)buffer).get(shortArray);
                    byteBuffer.clear();
                    byteBuffer = endianness.equals(ByteOrder.BIG_ENDIAN) ? byteBuffer.order(ByteOrder.LITTLE_ENDIAN) :
                            byteBuffer.order(ByteOrder.BIG_ENDIAN);
                    ShortBuffer shortOutputBuffer = byteBuffer.asShortBuffer();
                    shortOutputBuffer.put(shortArray);
                    break;
                case "int":
                    buffer = byteBuffer.asIntBuffer();
                    int[] intArray = new int[(int)fc.size()/4];
                    ((IntBuffer)buffer).get(intArray);
                    byteBuffer.clear();
                    byteBuffer = endianness.equals(ByteOrder.BIG_ENDIAN) ? byteBuffer.order(ByteOrder.LITTLE_ENDIAN) :
                            byteBuffer.order(ByteOrder.BIG_ENDIAN);
                    IntBuffer intOutputBuffer = byteBuffer.asIntBuffer();
                    intOutputBuffer.put(intArray);
                    break;
                case "double":
                    buffer = byteBuffer.asDoubleBuffer();
                    double[] doubleArray = new double[(int)fc.size()/8];
                    ((DoubleBuffer)buffer).get(doubleArray);
                    byteBuffer.clear();
                    byteBuffer = endianness.equals(ByteOrder.BIG_ENDIAN) ? byteBuffer.order(ByteOrder.LITTLE_ENDIAN) :
                            byteBuffer.order(ByteOrder.BIG_ENDIAN);
                    DoubleBuffer doubleOutputBuffer = byteBuffer.asDoubleBuffer();
                    doubleOutputBuffer.put(doubleArray);
                    break;
                case "long":
                    buffer = byteBuffer.asLongBuffer();
                    long[] longArray = new long[(int)fc.size()/8];
                    ((LongBuffer)buffer).get(longArray);
                    byteBuffer.clear();
                    byteBuffer = endianness.equals(ByteOrder.BIG_ENDIAN) ? byteBuffer.order(ByteOrder.LITTLE_ENDIAN) :
                            byteBuffer.order(ByteOrder.BIG_ENDIAN);
                    LongBuffer longOutputBuffer = byteBuffer.asLongBuffer();
                   longOutputBuffer.put(longArray);
                    break;
                case "float":
                    buffer = byteBuffer.asFloatBuffer();
                    float[] floatArray = new float[(int)fc.size()/4];
                    ((FloatBuffer)buffer).get(floatArray);
                    byteBuffer.clear();
                    byteBuffer = endianness.equals(ByteOrder.BIG_ENDIAN) ? byteBuffer.order(ByteOrder.LITTLE_ENDIAN) :
                            byteBuffer.order(ByteOrder.BIG_ENDIAN);
                    FloatBuffer floatOutputBuffer = byteBuffer.asFloatBuffer();
                    floatOutputBuffer.put(floatArray);
                    break;
                case "byte":
                    byteBuffer = endianness.equals(ByteOrder.BIG_ENDIAN) ? byteBuffer.order(ByteOrder.LITTLE_ENDIAN) :
                        byteBuffer.order(ByteOrder.BIG_ENDIAN);
                    break;
            }

            FileChannel out = new FileOutputStream(outputfilename).getChannel();
            out.write(byteBuffer);
            out.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}





pulasthi

Amazon Deals