solidity-docs / .github

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Repo Automation: Diff Bot for Translation Updates

franzihei opened this issue · comments

Problem

The Solidity documentation is being translated through community efforts in the solidity-docs organisation. It is hard and overly complicated for the community translators to keep track and catch up with versions and manually update a translation as new Solidity versions get released.

Solution

Automation! Optimally, we would like to have a bot similar to the reactjs-translation-bot, which would create PRs with new content that needs to be translated every time the original documentation is updated.

Task

To semi-automate the process, it would be great if we could set up a bot, similar to the reactjs.org translation bot. The bot creates PRs for new content to be translated from the original English version of the docs. More info on the bot can be found here and this is how a PR from the bot looks like. More context on the process behind the reactjs bot can be found in this issue.

More Details on the Bot Design and Functions

  • Set up a bot that creates PRs with diff of newly released docs to translate.
  • The solidity/docs developbranch shall be used as a source.
  • The PRs should be created in each of the language repositories (currently this means in the repos of the Indonesian, Japanese, Farsi and Portuguese translations).
  • The bot should create a PR once per day per language repository.
  • The bot should be written in JS, Python or Bash.
  • The bot should be part of the solidity-docs GitHub organization and should add PRs to all the different translation repos.
  • The bot should use Github Actions as a CI.

Relevant Links

I think the description in the linked issue is already rather detailed. Personally, I think we should create a new pull request once a day and not only once per release. This way, the translation could be ready for the release already.

There is more information on the bot here: https://reactjs.org/blog/2019/02/23/is-react-translated-yet.html#the-bot

In order to keep track what has been merged and what still needs to be done, I think it is important that the translation repositories are forked off the main solidity repository. If you create such a bot, you can assume that this is the case (even if it might not be currently).

It might be good to run this as a github action.

  1. Since we'll be the ones responsible for maintaining the bot long-term, it should be written in something we can expect people on the team to be familiar with. If it's going to be directly based on the React bot, JS sounds like the default choice but in case the contributor wants to build it from scratch, I think the only other choices should be Python or Bash (or C++ but it does not sound like the right tool for the job). Unless it's very short and simple.
  2. I agree that CI like Github Actions sounds like a good idea. It makes it very easy to trigger actions when PRs are merged and does not require setting up and maintaining a separate machine.
  3. Should the bot be part of the main Solidity repo or have a repo of its own? Being in the main repo would probably simplify deployment/updates (just a PR) but I see that React has a separate repo for this. At least assuming that reactjs.org-translation/scripts/ is the bot source React uses now.
  4. PR once a day sounds good to me. Once per release might result in an overwhelming amount of text.

I think the bot should run in the translation repository, especially when it only runs once a day and thus does not need a "PR merged" trigger.

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 0.26 ETH (859.03 USD @ $3303.96/ETH) attached to it as part of the Solidity fund.

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Workers have applied to start work.

These users each claimed they can complete the work by 264 years, 4 months from now.
Please review their action plans below:

1) willianpaixao has applied to start work (Funders only: approve worker | reject worker).

  1. A new workflow is created. It performs a checkout of the develop branch and then a diff, the target branch needs to be defined.
  2. Also, the workflow trigger needs to be defined, either at a specific time (e.g. everyday at midnight) or upon every merge to develop branch.
  3. GitHub Actions and GitHub API will be used to create the PR and pipeline triggers.
    2) hhio618 has applied to start work (Funders only: approve worker | reject worker).

I did similar tasks in the past.
My plan:
Create a bot similar to the reactjs-translation-bot
Create a GH action to auto-translate solidity/docs on every change and/or daily.
3) the-wunmi has applied to start work (Funders only: approve worker | reject worker).

  • Create a bot that create PRs on translation repos with diff from the official solidity docs repo using Javascript.
  • Create a Github Action workflow that would be triggered daily using a cron schedule event.
  • Create an extendible framework that allows to easily update list of translation repos, in case new repos are forked/created.
    4) maganmk has applied to start work (Funders only: approve worker | reject worker).

I would love to work in this problem.

My plan:

  • Understand the reactjs-translation-bot and sketch how this solution can serve as a template for the solidity docs.
  • Setup the initial bot, with logic for calculating the diff for the newly released doc. Logic will be written in Python, with tests ensuring correct functionality.
  • Utilize GitHub Actions for automated PR creation on each translation repo.
  • Perform end-to-end tests on dummy repos to make sure the bot operates as expected.

I'll continuously update the team on my progress, and will be available to talk if wanted.
5) amitojsingh366 has applied to start work (Funders only: approve worker | reject worker).

I am interested to work on this PR, I will work in the following manner:

  • Have a look at the solidity docs repo and understand how translation works
  • Create a new GitHub action and create a workflow in the English docs repo that uses this action
    • Workflow checks for new additions to the repo
    • Workflow then syncs the new additions with the other repos keeping English as placeholder text

Learn more on the Gitcoin Issue Details page.

Hi,
I time-boxed 4h to implement a Proof of Concept and now I have questions and suggestions.
First, as a Brazilian, I compiled a pt_BR version of the docs, following the Sphinx documentation.

Screenshot_20220112_222302

Modifications needed to be done in the conf.py and docs.sh. The next step would be to create a new workflow that is triggered once a day or once a release, as @chriseth said. That workflow would regenerate the .po files.

Now comes my suggestion, rather than creating a PR, I'd prefer to upload to Crowdin, like the other Ethereum projects do. There you already have hundreds of translators already working on Ethereum projects. All we need is someone to create and add an API Token to the newly created workflow.

That means, all work could (and in my humble opinion should) be done in the original solidity repo. So this organization and repositories would be deprecated.

What do you think?

Hi @willianpaixao,

Context on why we decided to create the translation process how it is set up right now can be found in this issue. There we clearly elaborated why we don't want to use a translation tool etc. There are also obvious reasons for why the community translations are hosted in a different Github org/repo that the core solidity repo. Please be aware that the setup chosen has been carefully evaluated and does not require changing at this point. .po files are also not needed since they don't work well with code examples and code comments.

I think the task is outlined quite clearly above. Let me know if you have specific questions considering the task.

Thanks or the feedback @franzihei, really appreciated it.
I read all the issue's thread, wish it was linked here before.

Ok, IMHO this bot+PR approach is not effective and very hard to maintain, the overhead of branching, creating a PR and pushing adds a layer of obstacle from people that would "just want to translate". I'm myself a Debian doc translator since 2006 and the easiest path brings the most volume of contributions. Please this is my personal opinion and experience and I know that the decisions made on ethereum/solidity/issues/10119 were carefully made and with the best intentions.

That said, I'm more than happy to implement the whole translation feature in Sphinx + Crowdin. No bounty needed.
Not to mention that I have no JS knowledge and the react bot is out of my skills, so there's also that factor.

I understand if you decline my offer so let me know your most honest opinion.

Hi again @franzihei, I apologize for being so annoying. 😅

I got really intrigued by all this and took another round of POC. Here are my findings:

  1. I created a fork of the original repo, so I could test my new pipeline. It took about four attempts due to incomplete documentation of the Crowdin official integration.
  2. Then I created the .pot and .po files through Sphinx. Since they could be considered intermediary files, I committed to another branch. If they will be merged to develop is a decision to be made by the maintainers.
  3. Then I took the liberty to create a project in Crowdin, created API keys to link with my forked repo.
  4. Made a few translations just to generate some data.
  5. Voilà! A PR was created. The title, message and even periodicity can be tweaked, there are many options available.

I didn't go a step further and tried to publish in Read The Docs, but I trust that my previous POC, in the above screenshot would be enough proof that the site generation picks up the translations once the PR is merged and the pipeline builds and publishes new version with the latest translations.

Note also that only two languages were listed and if a number is considered higher, some improvements should be done.

I still wait for your thoughts.

Hi @willianpaixao,

Thank you for this. As I mentioned before there are reasons why we decided against tools like Crowdin and the usage of .po files. One of the reason against Crowdin being that we don't want to force developers to create yet another account on a third party platform etc. We believe most translators of the Solidity docs to be rather technically minded so most of them will appreciate a pure Github based workflow. We have carefully evaluated all other options and came to this conclusion. Furthermore, I personally tried out the Crowdin platform and found that translating in this platform framework was rather cumbersome.

Please respect our decision and thank you for your thoughts nonetheless.

Thank you for your feedback!

@franzihei is the bounty on this still active? if yes I would like to work on it

Work on this bounty has been started and a contributor has been chosen. Thank you all for your applications!

Hello everyone
Besides different directory structures of the original repository and translations, I noticed another difference. ReactJS translation repositories share the same history as the original repository.
In that case, ReactJS can merge all the conflicts into a single commit, and create PR in the translated repository. In our case, this flow will not work. In the effect, PR will create conflict for every line that was already translated. That would be counter-productive for the translators.

Possible solutions are:

  • prepare repositories for translations as exact forks of main repository, including all the files and commits.
  • create custom git flow, suited for solidity-docs. It will be different from ReactJS flow. Possible solution is to use git diff and git am.

Let me know what do you think about that.

I'm not sure which workflow is a better choice here but just wanted to note that if we want to switch to translation repos being full forks of the main repo, the sooner, the better. Converting translation repos to that will be very tedious once we have tons of them. Also, this will probably require squashing their whole history to be doable with reasonable effort.

if we want to switch to translation repos being full forks of the main repo, the sooner, the better. Converting translation repos to that will be very tedious once we have tons of them.

Just so that I understood this correctly,

  • fa, ja and pt are already in the correct format?
  • in and de need to be changed?

Is there anything I can do from my side? I will also adjust it accordingly in the translation guide.

Since there were no translations in de I just updated it. Is it now in the right structure?

@o-lenczyk how do you mean "prepare repositories for translations as exact forks of main repository, including all the files and commits."? I think it would be best if only the docs folder would be needed. It can be in the same structure in the sense that it can be at [lang-name]/docs/[docs files] just like in the solidity repo where it's solidity/docs/[docs files].

I can't comment on which of the solutions is better from a technical PoV.

@franzihei
Just to clarify, I believe there are two different issues here:

  • directory structure: Yes, the only required directory is docs. Forks can remove every other directory and add it to .gitignore.
  • history of commits: ReactJS translation repositories are "full" forks in sense of shared history. For example, hu.reactjs.org has 5434 commits and reactjs.org 5265 commits. They share the same "root". After creating hu.reactjs.org there are only translation commits in the repository. That history allows them to merge one repository into another. Conflict will be created, only when a part of the documentation was modified from both sides.

After creating hu.reactjs.org there are only translation commits in the repository.

Just to clarify more: I assume this means that after creating hu.reactjs.org that repo contains all commits from reactjs.org and it's just that translators only add new commits with translated text on top of that. Then a PR from the bot also includes all the commits added in the reactjs.org since last time. Both the ones that touched the docs and the ones that only changed code.

Is there anything I can do from my side? I will also adjust it accordingly in the translation guide.

What I meant was converting these repos into full forks (i.e. with full commit history of the main repo). It would require replacing each one with a fresh clone of the solidity repo and then using git to cherry pick the commits that add translations from existing repos and put them on top. It would be mostly command-line work and using the editor to solve any resulting conflicts.

@chriseth do you have any preference on the two options outlined here?

We should test how it works, but I would prefer doing it as reactjs does - full fork of full history. What are the downsides?
I think there might be a way to create a fork of a single subdirectory only (instead of the full repository), but including the full history. This way, changes to files outside of the docs directory would be automatically ignored by a merge commit.

@chriseth

We should test how it works, but I would prefer doing it as reactjs does - full fork of full history. What are the downsides?

I would say that only downside is size of repository - 60MB instead of 1,2MB.

I think there might be a way to create a fork of a single subdirectory only (instead of the full repository), but including the full history.

I was researching the topic of forking single directory. One of the option is git filter-branch but this command would rewrite history, and we will have unrelated histories anyway. Another option is git subtree split but it would also create unrelated histories. What do you think about simplest solution: forking ethereum/solidity, removing everything but README and docs/, and adding it to .gitignore?

I'm fine with any way that works: Either a way that does not create conflicts in files we are not interested in, or a script that auto-resolves these conflicts.

@chriseth
I did some tests, and unfortunately .gitignore is not working as well - it does create conflicts for files we are not interested in:

CONFLICT (modify/delete): libsolidity/codegen/ir/IRGeneratorForStatements.cpp deleted in HEAD and modified in f4e02703741f47fd62f0015f6a8d60ec68badf0b.  Version f4e02703741f47fd62f0015f6a8d60ec68badf0b of libsolidity/codegen/ir/IRGeneratorForStatements.cpp left in tree.

I will try some other options like .gitattributes with different merge strategies, or sparse checkout.

Anway, merging changes from upstream to translation repositories with all the directories is straightforward.
In the case of reduced translation repositories, the problem left is in modify/delete conflicts.

it does create conflicts for files we are not interested in:

I think you can't avoid conflicts in the full fork approach. I doubt .gitattributes or a different merge strategy will help. Git simply sees that on one side these files were deleted and on the other they were modified and this can't be reconciled automatically. You have to explicitly resolve conflicts in any files outside of docs/ by running git rm on them.

@cameel
My initial idea was to resolve conflicts inside docs in favor of the original repository, and the rest of the conflicts (deleted files) in favor of the translation repository. Anyway, I tested the solution mentioned by you:

  • git rm everything
  • git add docs/

and it is also working.
Example PR is looking like this: https://github.com/solidity-docs-test/pl-polish/pull/14/files

  • there is no conflict if translated documentation has changed
  • if the translated documentation has changed in the original repository, conflict is committed.

Looks good to me.

If you want some feedback on the implementation, you could create a draft PR from your repo. Then we could comment on specific parts of code.

One thing I'd recommend already would be to include the date in the PR title and branch name. Commit hashes are useful too but less informative at a glance unless you go through the trouble of looking them up. PRs already have timestamps but in some contexts (e.g. e-mail notifications) you only see the title.

Thanks @o-lenczyk!

@cameel @chriseth could you have a look at the PR? --> #12

Sure, I was going to review it today.

The bot PR is looking good and I think we should try it out in the German repo. There are some decisions to make (see below) but I think it would be best to merge the bot in the current form (as soon as the small issues are ironed out) and then refine the design in subsequent PRs.

Design/workflow decisions

  1. Should the bot run from the translation-guide repo and be centrally configured by us or should it be independently installed in each translation repo, giving translators freedom to customize every aspect if they wish?
    • A good middle ground IMO would be a reusable action that is just invoked from the workflow file in each repo. The code would be developed centrally in the action repo but translators could easily adjust parameters, disable the bot or add extra steps.
  2. Bot's README in #12 specifies two alternative workflows for adding new translation repositories. We need to settle on one. See #12 (comment).
  3. Do we want to have an account for the bot so that we can customize it to make it look cool or do we prefer to avoid that? The downside of going with an account is that we need to deal with personal access tokens instead of relying on the token available in CI. See #12 (comment).

Yay that sounds great! My opinion on the open questions:

  1. The middle ground option (reusable action that is just invoked from the workflow file in each repo) sounds like a good plan.
  2. If I understand both options correctly, I think I would prefer option 2, since it sounds slightly easier for people starting a new language. It would also mean that we always have an up-to-date "reference" English version in the solidity-docs org, which I like. It would be updated automatically when the develop branch in ethereum/solidity gets updated, correct?
  3. I would minimize anything that has to do with personal access tokens and additional passwords etc, so I tend towards not making an account. But I don't have a strong opinion here.

@chriseth can you also share your opinion on the 3 open questions?

I think I would prefer option 2, since it sounds slightly easier for people starting a new language.

It actually matters more for the person who will be creating new repos (is it you or @chriseth?). In either variant translators just get a repo where then can immediately start translating, and it's not their problem how that repo got there :)

It would be updated automatically when the develop branch in ethereum/solidity gets updated, correct?

Not on its own but it would be fairly easy to set up github actions to pull new code into it nightly. And it's much simpler than with pulling changes into translation repos because we don't have to worry about conflicts, creating a PR, etc.; we just pull in new commits and call it done. At least as long as we can assume that it's just a template and no one is actually translating in that repo, i.e. it's basically a copy of main repo's docs/ dir with history.

Currently I am creating new repos. But so far I just created them "empty", added the maintainers and they took it from there. I'm fine to do a different work-flow though, as long as I know what I need to do. :)

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


The funding of 0.26 ETH (274.71 USD @ $1056.57/ETH) attached to this issue has been cancelled by the bounty submitter