In the HTTP protocol, once the TCP connexion is established with
the server (by default on port 80), the client sends requests. The most common requests are GET, HEAD et POST. A request GET asks the server to send a resource (for instance a webpage).
The server responds with an HTTP response containing the requested resource.
GET
In HTTP 1.1, a GET request is of the form:
GET /index.html HTTP/1.1 Host: igm.univ-mlv.frEach line is encoded in ASCII and is terminated by the two characters: CR and LF (Carriage Return, Line Feed; "\r\n" in Java). A request starts with a Request-Line followed by several CRLF terminated lines terminated by and empty CRLF terminated line. The Request-Line is composed of the request method (
GET in our example), then the ressource (/index.html in our example)
and finally the HTTP version (HTTP/1.1). The other lines of
the request have the format field: value. In our example, the
field Host has the value igm.univ-mlv.fr.
The above request is sent on a TCP connexion established on port 80 of the server hosting the site igm.univ-mlv.fr and asks for the resource /index.html.
The server's answer is composed of two parts:
A server's response looks like:
HTTP/1.1 200 OK Date: Thu, 01 Mar 2018 17:28:07 GMT Server: Apache Last-Modified: Thu, 15 Sep 2016 09:02:49 GMT ETag: "254441f-3d0a-53c881c25a040" Accept-Ranges: bytes Content-Length: 15626 Content-Type: text/html <!DOCTYPE html> <html> <head> ...Each header line is encoded in ASCII and terminated by CRLF. The very first (status) line:
HTTP/1.1 200 OKgives the version of the protocol used for the response (here
HTTP/1.1),
then the response code (200) and a textual message corresponding to this code (OK).
Next header lines are of the form:
header-field-name: header-field-valueAmong other useful information,
Content-Length gives the number of bytes of the response "body" (here, 15626): it starts just after the empty line (CRLF) following the last header (CRLF terminated) line. Since in HTTP/1.1 the server doesn't close the TCP connexion after answering a request, this information (Content-Length) is necessary for the client, to identify the end of the response.
Another header field, Content-Type, specifies that the resource is in HTML.
When Content-Length is not used, another transfer mode exists, through chunks, specified by this header field:
Transfer-Encoding: chunkedHow it works is described there. You don't need to look at this mode of operation by now.
The aim of this exercise is to code a small HTTP client. Our client will be able to make GET requests and to display
the body of the response (if it is in text/plain or text/html) after decoding it in the charset specified in the header. We'll use version 1.1 of the HTTP protocol for requests.
The main difficulty relies on the processing of the server response. The problem comes from the fact that you cannot a priori bound the header size. When you read from the SocketChannel, you cannot know a priori if you'll read enough bytes to get a whole header line, all header lines, or also a part of the request body...
In order to face this difficulties, we give you a skeleton of a class HTTPReader that you will complete to manage server responses.
Start with files HTTPReader.java and HTTPException.java as a basis.
An instance of the class HTTPReader holds two fields: the SocketChannel on which the response has to be read and a ByteBuffer to store read bytes.
All methods of this class must respect these rules:
ByteBuffer must be in write mode before and after the method invocation
(its work-zone is not ready to be read, but ready to be filled, if the buffer is not full).
ByteBuffer (all its content must be consumed before reading new bytes from the channel).
Implement the readLineCRLF method. You could start testing it with examples of the main method.
Next, you will check its behavior against the JUnit tests of the file HTTPReaderTest.java (you will need the file FakeHTTPServer.java).
Write the readHeader method. You have to return an object of the (provided) class
HTTPHeader.
create that accepts as parameter:
String that is the first (status) line of the response,
map that associates each header field to its value. Set-cookie: x-wl-uid=1smBggFQdYEUGLgg29x3Qr/zAwfq42jdGu0mYszL1+mrt/ABZ8xw43Ise90maJaHGuGvUKVQ+0gM=; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMT Set-cookie: session-id-time=2082754801l; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMT Set-cookie: session-id=276-2784413-9232431; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMTis equivalent to a single header field
Set-cookie with the three strings concatenated with semi-colon (";") characters.
Implement the readBytes method.
Write a client, HTTPClient that takes as argument a server's address and a resource: the client asks the server for the resource on its 80 TCP port and, if in text/html or text/plain, displays it.
For now, you only consider answers with a header field Content-Length.
Next, take into acount answers received in chunked transfer mode. To do so, implement the readChunks method in class HTTPReader.
Modify your client to manage response with status code 301 and 302.
In order to test 302 status code, you can use :
getResource("www-igm.univ-mlv.fr","/~carayol/redirect.php")
In order to parse the location header field, you can use the class URL. You can use methods URL.getHost() to get the address of the server and URL.getPath() to get the resource path.