Translates the input data into one or more intermediate languages before translating it back to the source language. The process introduces changes in the syntax and grammar of the input text without changing the semantics. Because this task uses a deep learning model, Facebook’s M2M-100, to perform translations, a GPU is recommended for fast processing.
If the backtranslation is of poor quality, try increasing the beam size. However, this will consume more memory and take more time. You could also try changing the intermediate languages to use languages that are similar to each other. For example, if your source language is Korean, translating to Chinese and/or Japanese and back might give you better results than translating to Spanish.
Use the synonym substitution job as an alternative if you’re unable to provision the necessary hardware and/or this job is taking too long. Note that the synonym substitution job does not support the same languages.
Supported Languages: Chinese, Dutch, English, French, German, Hebrew, Italian, Japanese, Korean, Polish, Spanish, Ukrainian
Takes in the input text and substitutes some words with synonyms derived from the included wordner/ppdb dictionaries or user-supplied dictionaries. The user-supplied dictionaries must be submitted in the lucene/solr synonym format as shown in the example below.
Example synonyms.txt file:
Simulates typos one might make based on the layout of the keyboard. For example, if typing in English on a QWERTY keyboard layout, they might accidentally replace the “y” with a “t” while typing the word “keyboard” because ”y” and “t” are next to each other on the keyboard. Currently, only QWERTY keyboard layouts are supported.
The user can provide their own keyboard mapping as a JSON file uploaded to the fusion blob store. The JSON file should be in the following format: {“a”:”x”, “b”:”v”, …}
.
Supported Languages: Dutch, English, French, German, Hebrew, Italian, Polish, Spanish, Ukrainian
Randomly splits words by introducing a space “ “
at some random point in the word.
Supported Languages: Dutch, English, French, German, Italian, Polish, Spanish