logo

0 days, 0 hours and 0 minutes till registration deadline

This time the topic of our new DeepHack hackathon is machine translation (MT). At the hackathon we will tackle the task of semi-supervised neural MT: we'll try to improve a neural MT model with monolingual data. However, our qualification task is traditional supervised MT It serves for familiarising participants with the MT field.

For this task we have decided to pick a well-studied language pair (English-German) and put no restrictions on the used data and model architecture.

Teams

Attention! Each participant should register and upload his/her own solution. If you have developed a solution in a team, each team member should register and upload this solution. After the end of competition you will be asked if you are participating as a team. You can also write an email about your team to info@deephack.me. The whole team is qualified by the lowest score among its participants.

Task

The task is tranlation of IT-texts from English into German, in line with WMT'16 IT translation task: http://www.statmt.org/wmt16/it-translation-task.html

There is an alternative link to download data: OPUS. You should choose En-De language pair. OPUS also contains some additional datasets, which also could be useful.

There is no restriction on training data for the task. You can use the data provided for the WMT'16 IT translation task or any other datasets. Test data consists of novel texts in IT domain.

Metric

We will evaluate submissions with BLEU score.

  • BLEU - the BLEU score measures how many words and ngrams (n consecutive words) overlap in a given translation and a reference translation. The most commonly used BLEU version is BLEU-4, which considers words, bigrams, trigrams and 4-grams. It also uses a penalty for too short traslations.

For local validation you can use the reference implementation of BLEU from MOSES or its Python version from Google.

Submission

Your solution should be a Docker file which mounts two folders input & output and is run by the "docker run -v /path/to/input_data:/data -v /path/to/output:/output -t {image} {entry_point}" command. Submission must be in the form of zip-archive containing file metadata.json. metadata.json should contain the following fields:

  • entry_point - command to run Docker container with
  • image - Docker repo name

input folder contains a file input.txt with the test sentences in the source language (sentences that the model should translate). The model should write translated sentences to output.txt in output folder.

We provide a sample solution. The sample submission is included into this repository as sample_submission.zip.