The coding project that accompanies the Bioinformatics lecture is to write a genome assembler capable of reconstructing as much of the target genome as possible from paired-end reads as well as long reads. ===Input Format:=== The input is a file starting with a number x. Then, each line contains either a long read or a pair (separated by whitespaces) of reads. The mean insert size among all paired-end reads is approximately x. All reads contain exclusively the characters 'A','C','G','T'. Reads may contain errors in the form of insertions, deletions, or substitutions, but long reads will have a significantly increased rate of error with respect to short reads. ===Output Format:=== The output is a file containing one contig per line, each containing exclusively the characters 'A','C','G','T'. ===Scoring:=== Your assemblers will be run on a number of test instances generated randomly by the provided read-generator (with possibly different parameters than what you may expect). For each instance, the contigs will be scored by their best alignment into the target genome, as well as their number (the best score will be obtained by a single contig that matches the target genome sequence). Finally, a small portion of the score will be attributed to the running time of your code, and we reserve the right of forcing a (reasonable) timeout on the programs. ===Notes:=== Your assembler should be written in Python or C++, reasonable includes permitted - if in doubt, talk to us :) good luck, have fun coding, and don't hesitate to contact Laurent or Mathias if you're unsure about any circumstance of this project. cheers Laurent & Mathias