In the HTTP protocol, once the TCP connexion is established with
the server (by default on port 80), the client sends requests. The most common requests are GET
, HEAD
et POST
. A request GET
asks the server to send a resource (for instance a webpage).
The server responds with an HTTP response containing the requested resource.
GET
In HTTP 1.1, a GET
request is of the form:
GET /index.html HTTP/1.1 Host: igm.univ-mlv.frEach line is encoded in ASCII and is terminated by the two characters: CR and LF (Carriage Return, Line Feed; "\r\n" in Java). A request starts with a Request-Line followed by several CRLF terminated lines terminated by and empty CRLF terminated line. The Request-Line is composed of the request method (
GET
in our example), then the ressource (/index.html
in our example)
and finally the HTTP version (HTTP/1.1
). The other lines of
the request have the format field: value
. In our example, the
field Host
has the value igm.univ-mlv.fr
.
The above request is sent on a TCP connexion established on port 80 of the server hosting the site igm.univ-mlv.fr
and asks for the resource /index.html
.
The server's answer is composed of two parts:
A server's response looks like:
HTTP/1.1 200 OK Date: Thu, 01 Mar 2018 17:28:07 GMT Server: Apache Last-Modified: Thu, 15 Sep 2016 09:02:49 GMT ETag: "254441f-3d0a-53c881c25a040" Accept-Ranges: bytes Content-Length: 15626 Content-Type: text/html <!DOCTYPE html> <html> <head> ...Each header line is encoded in ASCII and terminated by CRLF. The very first (status) line:
HTTP/1.1 200 OKgives the version of the protocol used for the response (here
HTTP/1.1
),
then the response code (200
) and a textual message corresponding to this code (OK
).
Next header lines are of the form:
header-field-name: header-field-valueAmong other useful information,
Content-Length
gives the number of bytes of the response "body" (here, 15626): it starts just after the empty line (CRLF) following the last header (CRLF terminated) line. Since in HTTP/1.1 the server doesn't close the TCP connexion after answering a request, this information (Content-Length
) is necessary for the client, to identify the end of the response.
Another header field, Content-Type
, specifies that the resource is in HTML.
When Content-Length
is not used, another transfer mode exists, through chunks
, specified by this header field:
Transfer-Encoding: chunkedHow it works is described there. You don't need to look at this mode of operation by now.
The aim of this exercise is to code a small HTTP client. Our client will be able to make GET
requests and to display
the body of the response (if it is in text/plain or text/html) after decoding it in the charset specified in the header. We'll use version 1.1 of the HTTP protocol for requests.
The main difficulty relies on the processing of the server response. The problem comes from the fact that you cannot a priori bound the header size. When you read from the SocketChannel
, you cannot know a priori if you'll read enough bytes to get a whole header line, all header lines, or also a part of the request body...
In order to face this difficulties, we give you a skeleton of a class HTTPReader
that you will complete to manage server responses.
Start with files HTTPReader.java and HTTPException.java as a basis.
An instance of the class HTTPReader
holds two fields: the SocketChannel
on which the response has to be read and a ByteBuffer
to store read bytes.
All methods of this class must respect these rules:
ByteBuffer
must be in write mode before and after the method invocation
(its work-zone is not ready to be read, but ready to be filled, if the buffer is not full).
ByteBuffer
(all its content must be consumed before reading new bytes from the channel).
Implement the readLineCRLF
method. You could start testing it with examples of the main method.
Next, you will check its behavior against the JUnit tests of the file HTTPReaderTest.java (you will need the file FakeHTTPServer.java).
Write the readHeader
method. You have to return an object of the (provided) class
HTTPHeader.
create
that accepts as parameter:
String
that is the first (status) line of the response,
map
that associates each header field to its value. Set-cookie: x-wl-uid=1smBggFQdYEUGLgg29x3Qr/zAwfq42jdGu0mYszL1+mrt/ABZ8xw43Ise90maJaHGuGvUKVQ+0gM=; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMT Set-cookie: session-id-time=2082754801l; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMT Set-cookie: session-id=276-2784413-9232431; path=/; domain=.amazon.fr; expires=Mon, 31-Dec-2035 23:00:01 GMTis equivalent to a single header field
Set-cookie
with the three strings concatenated with semi-colon (";") characters.
Implement the readBytes
method.
Write a client, HTTPClient
that takes as argument a server's address and a resource: the client asks the server for the resource on its 80 TCP port and, if in text/html or text/plain, displays it.
For now, you only consider answers with a header field Content-Length
.
Next, take into acount answers received in chunked
transfer mode. To do so, implement the readChunks
method in class HTTPReader
.
Modify your client to manage response with status code 301 and 302.
In order to test 302 status code, you can use :
getResource("www-igm.univ-mlv.fr","/~carayol/redirect.php")In order to parse the location header field, you can use the class URL. You can use methods
URL.getHost()
to get the address of the server and URL.getPath()
to get the resource path.