Basics on ByteBuffers, Charsets and Endianness

Foreword

The aim of these exercises is to manipulate ByteBuffer. As we have not seen any networking primitives, we will use files to read/write bytes.

In these exercises, you can write all your code in the main method. We will do better in the future but it is not the goal of these exercises.

Endianness

We want to write a program StoreWithByteOrder that reads long integers from the keyboard and writes them into a file. This program will take two arguments on the command line:
  • the byte-order to use when writting the longs to the file: LE for little-endian and BE for big-endian;
  • the filename for the file in which to write the longs.

For example, calling java StoreWithByteOrder LE foo.bin will write the longs in the file foo.bin in little-endian.

Starting from the template StoreWithByteOrder.java, write the program StoreWithByteOrder.

Tests

To test your code, you need to know how to run a java program from the command line (i.e., in a terminal). You also need to know how to execute a jar file. You can find detailed instructions here.
Test 1

If you run the following command

% java StoreWithByteOrder BE long-be.bin
1
^D
you should obtain a file long-be.bin containing 7 bytes of value 0 followed by 1 byte of value 1. To see the value of the bytes contained in a file, we provide a tool File2Hex.jar that gives the value of each byte of the file in hexadecimal. Using this tool, you should obtain:
% java -jar File2Hex.jar long-le.bin
00 00 00 00 00 00 00 01

Test 2

If you run the following command

% java StoreWithByteOrder LE long-le.bin
1
^D
you should obtain a file long-le.bin containing 1 byte of value 1 followed by 7 bytes of value to 0. Using this File2Hex.jar, you should obtain:
% java -jar File2Hex.jar long-le.bin
01 00 00 00 00 00 00 00

Reading a file as a text file

We want to write a program ReadFileWithEncoding that takes two parameters:

The program reads the file, decodes it with the given charset and prints the resulting string.

The Java API offers the method String Files.readString​(Path path, Charset cs) that do exactly that. However in this exercise, we ask you to access the file via a FileChannel which only gives the ability to read/write the raw bytes from/to the file and use the method charset.decode to construct the string.

In the template ReadFileWithEncoding.java, write the method stringFromFile using a FileChannel.

You can obtain the size in bytes of a file using the method fileChannel.size().
Warning: Even if the buffer buff has the same size as the file, the method fileChannel.read(buff) does not guarantee to fill buff in one call. You have to call it until the buffer is full (i.e., !buffer.hasRemaining()).

Tests

You can test your program with the file test.txt. You need to download the file using "Save as" and not copy-pasting it. If your code is correct, you should obtain:

% java fr.uge.net.buffers.ReadFileWithEncoding utf8 test.txt
a€
and
% java fr.uge.net.buffers.ReadFileWithEncoding iso-8859-1 test.txt
aâ ¬

There is no magic behind this. The file test.txt contains the 5 bytes 61 E2 82 AC 0A. The file itself as no prefered encoding.

If we decode it using the UTF8 charset, these 5 bytes are interpreted as follows:

61 -> a
E2 82 AC -> €
0A -> line return
The characters in UTF-8 are represented by a variable number of bytes. You can learn here how this feature is achieved.

In the iso-8859-1 charset, each character is coded by one byte and the 5 bytes are interpreted as follows:

61 -> a
E2 -> â
82 -> control caracter that cannot be printed 
AC -> ¬
0A -> line return

Reading from the standard input

In this exercice, we want to write a program ReadStandardInputWithEncoding which reads bytes from the standard input and decodes them in a given charset. This program takes the name of the charset as input.

The program is meant to be used as follows:

$ cat test.txt | java ReadStandardInputWithEncoding utf8

Usually you access the standard input using a Scanner but in this exercise, we ask you to access it as a stream of bytes. We can obtain a ReadableByteChannel corresponding to the standard input with:

ReadableByteChannel in = Channels.newChannel(System.in);

A ReadableByteChannel behaves like a FileChannel when reading except that we do not know in advance the total number of bytes. You know that you have read all of the bytes when the method readableByteChannel.read returns -1. You will need to read in a fixed-size buffer and extend this buffer when it is full.

In the template ReadStandardInputWithEncoding.java, write the method stringFromStandardInput which read all the bytes from the standard input and return the corresponding string.

To increase the size of the buffer, we will create a buffer twice as large and copy the data from the old buffer into the new one.

  • The size of a ByteBuffer can be obtained using the method byteBuffer.capicity().
  • To transfer the data stored in the work-zone of a ByteBuffer src at the beginning of the work-zone of a ByteBuffer dst, we will dst.put(src).

Tests

To test your code, you can use:

% cat test.txt | java fr.uge.net.buffers.ReadStandardInputWithEncoding utf8 

To test that you correctly increase the size of your buffer, you can use the file test2.txt.

% cat test2.txt | java fr.uge.net.buffers.ReadStandardInputWithEncoding utf8 
Check that your output contains all the lines of test2.txt. In particular, you should obtain:
 
% cat test2.txt | wc
   10000   40000 2208890   
% cat test2.txt | java fr.uge.net.buffers.ReadStandardInputWithEncoding utf8 | wc
   10000   40000 2208890