git-as-svn / git-as-svn

Subversion frontend server for Git repositories

Home Page:https://git-as-svn.github.io/git-as-svn/htmlsingle/git-as-svn.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Design question: why no .gitattributes defaults?

MichaelJCole opened this issue · comments

Hi there, getting this message:

https://bozaro.github.io/git-as-svn/htmlsingle/git-as-svn.html#invalid-svn-props

I'm wondering why git-as-svn doesn't have a set of defaults for different file extensions.

If a user commits something as text, but then change their mind later, does it break anything?

What's the price of setting it wrong?

Thanks!

  1. git-as-svn doesn't store svn props anywhere. They're calculated based on .gitattributes so svn working copy is consistent with git working copy. There are some limitations, we cannot express everything that you can do in .gitattributes (because it is much more expressive than svn:auto-props), but we try.
  2. Now, why we're rejecting commits. If we would allow them, client view of specific revision of a file would diverge from server view of the same revision of the same file. This situation is not supported by svn, the expect file to be the same and working copy will later break in various horrible ways when client will receive delta updates to the file and will suddenly realize that it cannot apply them.

Normally, you just want to set up a gitattributes that marks files as text/binary and then git-as-svn seamlesly translates that to svn users via svn:auto-props, so svn client automagically sets good properties on files and you don't need to set them by hand.

We do not provide default .gitattributes (though we could for example repo) because normally repo is not created by git-as-svn. It either pre-exists or is created via GitLab/Gitea.

Yes, for sure, it can only do what it can do.

For my own work, I don't want to support a user that breaks git-as-svn with an unworkable .gitattributes. So, I shouldn't give them a loaded gun if I don't want to fix their foot.

I'm wondering if I can:

  1. hardcode a default to binary(?) except for some obvious exceptions (.txt, .md, etc)
  2. create a git hook in gitea to reject incompatible changes to .gitattributes

I'm still experimenting with what works and what doesn't in .gitattributes

Do you see any flaws with that approach?

Well, you can't fully break anything until you have git revert. We just tell users to add either text or -text for their file extensions.

But there are two issues:

  1. By default (when there's no .gitattributes entry), Git uses heuristics to determine whether file is binary and needs CRLF<->LF massage. SVN client has a different heuristic. Git applies its heuristic each time file is touched. SVN client applies its heuristic only when file is added to repo. So, we can't get consistent behavior without requiring SVN user to manually set/unset text/binary props so he follows Git heuristic. Additionally, I saw cases where Git heuristic was harmful and broke file contents by treating them as text (and applying EOL conversion) when in fact they were binary.
  2. That's why we recommend * -text at the top of .gitattributes. But, here comes problem 2: we cannot expose this entry via svn:auto-props because svn:auto-props is much dumber and doesn't have hierarchy/override logic like .gitattributes does. See this for how miserably SVN fails when encounters several file masks with conflicting properties that affect the same file.

So. Our workflow is:

  1. * -text at the top of .gitattributes to disable Git heuristic
  2. SVN heuristic is still enabled, so when it detects file as binary, it will automatically add mime-type and you don't need to set props by hand
  3. We add *.foo text to .gitattributes for those files where we want EOL magic. This is mapped to svn:auto-props, so SVN clients automatically handle these files as text.

I'll stress this once again: Git default is "use heuristic based on file contents each time file is manipulated". There is no way to tell SVN client to follow this logic, it has a different heuristic and only applies it when file is added to repo.

Here's a real-world case where Git default heuristic broke files on Windows (git-as-svn is not involved here, just pure Git): asciidoctor/asciidoctor-pdf#1523

Ok, this is really helpful, thank you. My intent is to make some "guard rails" and default behavior so the end user doesn't have to understand it - until they can't get what they want because it's not possible.

If I understand, this issue isn't about how the files are diff'd or stored, but about CRLF<-> LF conversions.

Git can change it's mind. Svn can only set when added. The problem arises in the SVN client when it receives updates from git with implied text conversions it cannot understand.

Setting * -text tells git to not do these conversions

If I check in some "code.cpp" file, it won't don't conversions.

If I later set *.cpp text to enable these conversions:

  1. At some point in the future when code.cpp is "touched", Git will change internal storage to remove CR.
  2. New files work great.
  3. Existing .cpp files will send broken patches.
    Workaround is to checkout the svn repo to new working copy - then re-apply local changes.

If I later remove *.cpp text to disable these conversions (* -text is default):

  1. Git doesn't change storage
  2. It will work for new files and existing files

Am I understanding this right? Or does the problem only arise when .gitattributes uses git's autodetect behavior.

It may also be possible to set core.autocrlf false in the server repository's .git/config file to enforce this, but I don't know what happens if git repos with different configs push between each other.

Thanks again for your help understanding it. I need to work with it a bit more before coming up with a plan, but I think there's a workable solution :-)

You don't need to recheckout svn working copy when modifying .gitattributes. I described a situation that would happen if we allowed svn client to have svn properties inconsistent with what git-as-svn builds based on .gitattributes. So, we're rejecting commits that would otherwise put svn client in inconsistent state.

It may also be possible to set core.autocrlf false in the server repository's .git/config file

This
a) Doesn't affect bare repo
b) Won't affect clients

So, I hope my explanations here on why thing are done the way they are done are understood.

The question about svn properties arises here and there. Last time I tried updating docs in #332 but current issue shows that it is still not enough. Possibly I will try directing users to this conversation next time :)