Animenosekai / translate

I believe that deepl split text should splits the text into an array of text lines.

Currently it appears that the regex doesn't splits by line, and the regex usage doesn't change the text variable.

But I'm not sure if I'm confusing the calls to LMT_split_into_sentences vs LMT_split_text (I noticed that using deepl in the browser, it appears to use the later).

I made a test in the browser, translating "こんにちは世界" and "こんにちは" in two separated lines, and in case it helps, it made the calls:

An option and then a post request to https://www2.deepl.com/jsonrpc?method=LMT_split_text:

{
    "jsonrpc":"2.0",
    "method" : "LMT_split_text",
    "params":{"texts":["こんにちは世界","こんにちは"],
    "commonJobParams":{"mode":"translate"},
    "lang":{
        "lang_user_selected":"JA",
        "preference":{
            "weight":/*..*/,
            "default":"default"}
        }
    },
    "id":/*..*/
}

With the response:

{
    "jsonrpc":"2.0",
    "id":/*..*/,
    "result":{
        "lang":{"detected":"JA","isConfident":true,"detectedLanguages":{"JA":1.0}},
        "texts":[
            {"chunks":[{"sentences":[{"prefix":"","text":"\u3053\u3093\u306B\u3061\u306F\u4E16\u754C"}]}]},
            {"chunks":[{"sentences":[{"prefix":"","text":"\u3053\u3093\u306B\u3061\u306F"}]}]}
        ]
    }
}

Then an option and then a post request to https://www2.deepl.com/jsonrpc?method=LMT_handle_jobs:

{
    "jsonrpc":"2.0",
    "method": "LMT_handle_jobs",
    "params":{
        "jobs":[
            {
                "kind":"default",
                "sentences":[{"text":"こんにちは世界","id":0,"prefix":""}],
                "raw_en_context_before":[],
                "raw_en_context_after":["こんにちは"],
                "preferred_num_beams":1
            },
            {
                "kind":"default",
                "sentences":[ {"text":"こんにちは","id":1, "prefix":""}],
                "raw_en_context_before":["こんにちは世界"],
                "raw_en_context_after":[],
                "preferred_num_beams":1
            }
        ],
        "lang":{"preference":{"weight":{},"default":"default"},"source_lang_computed":"JA","target_lang":"EN"},
        "priority":1,
        "commonJobParams":{"regionalVariant":"en-US","mode":"translate","browserType":1},
        "timestamp":/*..*/
    },
    "id":/*..*/
}

With the response:

{
    "jsonrpc":"2.0","id":/*..*/,
    "result":{"translations":[
        {"beams":[{"sentences":[{"text":"Hello World","ids":[0]}],"num_symbols":3}],"quality":"normal"},
        {"beams":[{"sentences":[{"text":"Hello world","ids":[1]}],"num_symbols":3}],"quality":"normal"}],
    "target_lang":"EN","source_lang":"JA","source_lang_is_confident":false,"detectedLanguages":/*..*/}
}

(for some reason it incorrectly translated both lines to Hello World, but it gave a beam for each line)

I hope this helps in case there is something missing! (I would make more tests but I'm getting "Too Many Requests" error, so I guess my IP was banned for trying invalid requests).

Thanks for reporting this issue !

Seems like #83 closed it. Let me know if you want to reopen 👍

@Animenosekai Thanks!

On a new note, I think the regex is not being used, as it's return is not being stored in any variable, nor it's being used as a function return:

translate/translatepy/translators/deepl.py

Line 122 in 11650ca

SENTENCES_SPLITTING_REGEX.split(text), None

@Animenosekai Thanks!

On a new note, I think the regex is not being used, as it's return is not being stored in any variable, nor it's being used as a function return:

translate/translatepy/translators/deepl.py

Line 122 in 11650ca

SENTENCES_SPLITTING_REGEX.split(text), None

Oh that's right, I should look into it !

Is the DeepL split text correct?