07th-mod / higurashi-patch-compiler

Compiler for Higurashi patches. It's still under development.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Script to apply translation to newer version

enumag opened this issue · comments

There are many people using our patches as a base for a translation to something else than Japanese and English. But our patches are constantly evolving so it would be nice to give them a way to re-apply a translation from an older version to a new one.

Here I want to write down how I think it should work:

1. Parse the scripts of the old translated version

We need to find all the OutputLine() commands, extract some information and save the result into an SQL table with these columns:

  • filename
  • order
  • japanese text
  • translated text

The order column is a number N stating that the line is the Nth OutputLine call in that filename. We'll need it to decide which translation to use when a conflict occurs.

2. Parse and update the scripts from our newest patch

Use the same regular expressions again to parse each OutputLine() in our patch. For each of them do the following:

Find matching translations in the table using the filename and the japanese text. Needs some special handling for the forking files such as zonik_001_vm00_n01.txt - this should look into translations for onik_001 as well.

If there is exactly one match (or more matches but all with the same translation), replace the english text with the translation.

If there is no match add some prefix such as <missing translation> to the english part so that it can be easily found and fixed manually.

If there are two or more matches with different translations use the order of the previous line to select the most likely translation for the conflict.

3. Manually fix the unmatched parts

Use a "find in files" feature in some good editor (I'm doing it in PSPad, but Notepad++ most likely has something too) to find all occurances of <missing translation> and fix them manually.

It might be pretty convenient for translators to have just a simple translation text file, where the order is implicitly the line number, and the line contains the translation of that particular OutputLine text

I think most of the people who come here to ask us for help in translating our scripts don't have as much knowledge about how the patch is made as we do. For that reason, I think that SQL files could be an issue for us because they have a slightly higher barrier for editing. Maybe we could try JSON files instead?

I was ready to also suggest installing a tool for translations in our server, but I think it might not be secure enough if it implies other people uploading their files to our server instead of starting from scratch there. I'm not sure how these tools work, but if that's a viable and secure option, we can also try that.

SQL is only meant as an internal data storage of the script, not something that should be edited manually. The goal was to simply old translated scripts and apply the translation on top of newer patch, not to generate some easy to edit translation map. It's certainly doable but not as easy as an key => value json or yaml because of possible duplicate keys.

I'm an expert in PHP and know a lot about web security so I'm confident I can make a secure upgrade tool on the server given enough time. Sadly I'm not confident about getting that time.

I see, looks like I misunderstood the actual use of this tool. At first, I thought you were going to transform the scripts into some easy to understand structure and then only update the parts where the engine commands were changed, not touching the translation that was already done by the person. But looks like the purpose is other, so my comment doesn't really apply.

I think you should go forward with it and don't worry about the server tool, I was thinking about a completely different thing. A tool where the user can clone and do his work locally will be healthier for the server and for the future of the org.

Not really. I expect most people would not be able to get it up and running on their own so it would be up to me to run it for them on my PC anyway just like I'm running the UI compiler for the vietnamese and korean guys.

Any way we can make that easier? Maybe a python or batch script to automate things a bit?

Well if I write it it will be written in PHP since that's the only language I can work efficiently with. People might be able to install PHP on their own but then I might need some extensions not active by default, a database, composer to install some libraries... it's easy for anyone a bit experienced with PHP but not sure if other people could do it. Dunno how far it can be automated. I could also put everything inside docker but then you would need to install docker and docker-compose which are also not easy to grasp for new people.

In theory I could do everything in memory and not require a database, it would just be slower.

We had great progress on automating the installation of Python and all the dependencies using Chocolatey on Windows. It is certainly manageable if it can be done in command line. If you have some instructions around I can try making it in Python, fully automated (or at least automate all the busywork).

Ok, let's try that once I have the script. It won't be any time soon though.

No problem, other than working with music and the eventual installer improvements I'm pretty much always idle. Call me whenever you want about this.

Thank you for help again)

below might not be directly relevant to the thread, sorry. I realize the tool you're talking about is still necessary, even if you do the below method

I don't really know too much about this, but I was looking at how RenPy handled it the other day.

I think it works by internally assigning each "base language" line a unique ID, then outputting a 'translation file' with those unique IDs, which the translators type their translation into: https://www.renpy.org/doc/html/translation.html

However, even Renpy says this has issues if you modify the original translation by adding/removing lines, as it does not save the IDs in the base script.

If I were to modify the RenPy method, I would assign a unique ID to each line of the translation, and save it as a comment next to the OutputLine for each script file (renpy does not save the values into the "base language" file unless you ask it to) . Then, even if you add/delete a line, each translation line will still be associated with a specific base language line.

I realize this doesn't solve the problem at hand (assigning old translations to the new script), but going forward, you wouldn't have this problem anymore, since each dialogue would have a unique ID even if the script is changed. It should make it easier to edit the translations as well...

How do you currently handle translations which are starting from scratch?

If I were to modify the RenPy method, I would assign a unique ID to each line of the translation, and save it as a comment next to the OutputLine for each script file (renpy does not save the values into the "base language" file unless you ask it to) . Then, even if you add/delete a line, each translation line will still be associated with a specific base language line.

Yeah, this is something I'm considering as well. But I'm not quite sure about polluting the scripts with such comments. I want to first try this tool to see how accurate it can get when matching the lines. If I'm not satisfied with the result I'll most like add the IDs to get better results in the future.