Client HTTP/1.1

The HTTP protocol

This text is a very incomplete summary of the HTTP protocol. You can find more information on the Wikipedia page or in the RFCs for the HTTP protocol RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235, RFC 7236 et RFC 7237.

Requests and responses

In the HTTP protocol, once the TCP connexion is established with the server (by default on port 80), the client sends requests. The most common requests are GET, HEAD et POST. A request GET asks the server to send a resource (for instance a webpage).
The server responds with an HTTP response containing the requested resource.

Request GET

In HTTP 1.1, a GET request is of the form:

GET /index.html HTTP/1.1
Host: igm.univ-mlv.fr

Each line is encoded in ASCII and is terminated by the two characters: CR and LF (Carriage Return, Line Feed; "\r\n" in Java). A request starts with a Request-Line followed by several CRLF terminated lines terminated by and empty CRLF terminated line. The Request-Line is composed of the request method (GET in our example), then the ressource (/index.html in our example) and finally the HTTP version (HTTP/1.1). The other lines of the request have the format field: value. In our example, the field Host has the value igm.univ-mlv.fr. The above request is sent on a TCP connexion established on port 80 of the server hosting the site igm.univ-mlv.fr and asks for the resource /index.html.

Server's response

The server's answer is composed of two parts:

A server's response looks like:

HTTP/1.1 200 OK
Date: Thu, 01 Mar 2018 17:28:07 GMT
Server: Apache
Last-Modified: Thu, 15 Sep 2016 09:02:49 GMT
ETag: "254441f-3d0a-53c881c25a040"
Accept-Ranges: bytes
Content-Length: 15626
Content-Type: text/html

<!DOCTYPE html>
<html>
<head>
...
Each header line is encoded in ASCII and terminated by CRLF. The very first (status) line:
HTTP/1.1 200 OK
gives the version of the protocol used for the response (here HTTP/1.1), then the response code (200) and a textual message corresponding to this code (OK). Next header lines are of the form:
header-field-name: header-field-value
Among other useful information, Content-Length gives the number of bytes of the response "body" (here, 15626): it starts just after the empty line (CRLF) following the last header (CRLF terminated) line. Since in HTTP/1.1 the server doesn't close the TCP connexion after answering a request, this information (Content-Length) is necessary for the client, to identify the end of the response. Another header field, Content-Type, specifies that the resource is in HTML.

When Content-Length is not used, another transfer mode exists, through chunks, specified by this header field:

Transfer-Encoding: chunked
How it works is described there. You don't need to look at this mode of operation by now.

Client HTTP

The aim of this exercise is to code a small HTTP client. Our client will be able to make GET requests and to display the body of the response (if it is in text/plain or text/html) after decoding it in the charset specified in the header. We'll use version 1.1 of the HTTP protocol for requests.

The main difficulty relies on the processing of the server response. The problem comes from the fact that you cannot a priori bound the header size. When you read from the SocketChannel, you cannot know a priori if you'll read enough bytes to get a whole header line, all header lines, or also a part of the request body...

In order to face this difficulties, we give you a skeleton of a class HTTPReader that you will complete to manage server responses.

Start with files HTTPReader.java and HTTPException.java as a basis.

An instance of the class HTTPReader holds two fields: the SocketChannel on which the response has to be read and a ByteBuffer to store read bytes.

All methods of this class must respect these rules:

  • the ByteBuffer must be in write mode before and after the method invocation (its work-zone is not ready to be read, but ready to be filled, if the buffer is not full).
  • it is forbidden to read from the socket channel if some bytes remain in the the ByteBuffer (all its content must be consumed before reading new bytes from the channel).

Implement the readLineCRLF method. You could start testing it with examples of the main method. Next, you will check its behavior against the JUnit tests of the file HTTPReaderTest.java (you will need the file FakeHTTPServer.java).

Write the readHeader method. You have to return an object of the (provided) class HTTPHeader.

To create the object, you will use the factory create that accepts as parameter:
  • a String that is the first (status) line of the response,
  • a map that associates each header field to its value.
    Warning: a response header could contains several header lines for the same header field. For instance
    Set-cookie: x-wl-uid=1smBggFQdYEUGLgg29x3Qr/zAwfq42jdGu0mYszL1+mrt/ABZ8xw43Ise90maJaHGuGvUKVQ+0gM=; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMT
    Set-cookie: session-id-time=2082754801l; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMT
    Set-cookie: session-id=276-2784413-9232431; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMT
    
    is equivalent to a single header field Set-cookie with the three strings concatenated with semi-colon (";") characters.

Implement the readBytes method.

Write a client, HTTPClient that takes as argument a server's address and a resource: the client asks the server for the resource on its 80 TCP port and, if in text/html or text/plain, displays it.
For now, you only consider answers with a header field Content-Length.

Next, take into acount answers received in chunked transfer mode. To do so, implement the readChunks method in class HTTPReader.

Modify your client to manage response with status code 301 and 302.

In order to test 302 status code, you can use :

  getResource("www-igm.univ-mlv.fr","/~carayol/redirect.php")
In order to parse the location header field, you can use the class URL. You can use methods URL.getHost() to get the address of the server and URL.getPath() to get the resource path.