Animenosekai / translate

A module grouping multiple translation APIs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is the DeepL split text correct?

swfsql opened this issue · comments

I believe that deepl split text should splits the text into an array of text lines.

Currently it appears that the regex doesn't splits by line, and the regex usage doesn't change the text variable.

But I'm not sure if I'm confusing the calls to LMT_split_into_sentences vs LMT_split_text (I noticed that using deepl in the browser, it appears to use the later).

I made a test in the browser, translating "こんにちは世界" and "こんにちは" in two separated lines, and in case it helps, it made the calls:

An option and then a post request to https://www2.deepl.com/jsonrpc?method=LMT_split_text:

{
    "jsonrpc":"2.0",
    "method" : "LMT_split_text",
    "params":{"texts":["こんにちは世界","こんにちは"],
    "commonJobParams":{"mode":"translate"},
    "lang":{
        "lang_user_selected":"JA",
        "preference":{
            "weight":/*..*/,
            "default":"default"}
        }
    },
    "id":/*..*/
}

With the response:

{
    "jsonrpc":"2.0",
    "id":/*..*/,
    "result":{
        "lang":{"detected":"JA","isConfident":true,"detectedLanguages":{"JA":1.0}},
        "texts":[
            {"chunks":[{"sentences":[{"prefix":"","text":"\u3053\u3093\u306B\u3061\u306F\u4E16\u754C"}]}]},
            {"chunks":[{"sentences":[{"prefix":"","text":"\u3053\u3093\u306B\u3061\u306F"}]}]}
        ]
    }
}

Then an option and then a post request to https://www2.deepl.com/jsonrpc?method=LMT_handle_jobs:

{
    "jsonrpc":"2.0",
    "method": "LMT_handle_jobs",
    "params":{
        "jobs":[
            {
                "kind":"default",
                "sentences":[{"text":"こんにちは世界","id":0,"prefix":""}],
                "raw_en_context_before":[],
                "raw_en_context_after":["こんにちは"],
                "preferred_num_beams":1
            },
            {
                "kind":"default",
                "sentences":[ {"text":"こんにちは","id":1, "prefix":""}],
                "raw_en_context_before":["こんにちは世界"],
                "raw_en_context_after":[],
                "preferred_num_beams":1
            }
        ],
        "lang":{"preference":{"weight":{},"default":"default"},"source_lang_computed":"JA","target_lang":"EN"},
        "priority":1,
        "commonJobParams":{"regionalVariant":"en-US","mode":"translate","browserType":1},
        "timestamp":/*..*/
    },
    "id":/*..*/
}

With the response:

{
    "jsonrpc":"2.0","id":/*..*/,
    "result":{"translations":[
        {"beams":[{"sentences":[{"text":"Hello World","ids":[0]}],"num_symbols":3}],"quality":"normal"},
        {"beams":[{"sentences":[{"text":"Hello world","ids":[1]}],"num_symbols":3}],"quality":"normal"}],
    "target_lang":"EN","source_lang":"JA","source_lang_is_confident":false,"detectedLanguages":/*..*/}
}

(for some reason it incorrectly translated both lines to Hello World, but it gave a beam for each line)

I hope this helps in case there is something missing! (I would make more tests but I'm getting "Too Many Requests" error, so I guess my IP was banned for trying invalid requests).

commented

Thanks for reporting this issue !

Seems like #83 closed it. Let me know if you want to reopen 👍

@Animenosekai Thanks!

On a new note, I think the regex is not being used, as it's return is not being stored in any variable, nor it's being used as a function return:

SENTENCES_SPLITTING_REGEX.split(text), None

commented

@Animenosekai Thanks!

On a new note, I think the regex is not being used, as it's return is not being stored in any variable, nor it's being used as a function return:

SENTENCES_SPLITTING_REGEX.split(text), None

Oh that's right, I should look into it !