swcarpentry / git-novice

Version Control with Git

Home Page:http://swcarpentry.github.io/git-novice/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Password authentication for GitHub being phased out

edbennett opened this issue · comments

I was delivering this lesson on Friday and shortly after introducing Remotes in GitHub I got an email from GitHub pointing out that password authentication for command-line operations is being phased out in August, at which point all users will either need SSH keys or personal access tokens, with the latter being "recommended" (for reasons as yet unclear to me).

Previously the lesson avoided teaching SSH keys as it introduces another concept that needs to be taught, and unnecessary cognitive load when trying to simultaneously deal with "what is SSH, a public key, a private key" at the same time as they are dealing with "what is a remote, what is push", whereas entering a username and password is likely to be more familiar. Unfortunately it seems that we no longer have that luxury, so will need to redevelop the "Remotes in GitHub" section to include setting up either a key or an access token, and explain the security implications of this.

Given we're not using any GitHub specific features in the lesson (as written), maybe we should look at alternate hosts (e.g. GitLab, Bitbucket)? Having supplemental lessons covering setting up ssh would be useful, but given the current difficulties getting learners set up, I'm not sure staying with GitHub is worth the effort of additionally setting up ssh keys.

I think that the additional effort of explaining how to generate a Personal Access Token for the course work is small enough. I will prepare a PR that can serve as a basis of discussion - I had already planned to do this for my checkout anyway before @edbennett first brought up the issue.

I suspect that now GitHub has set a direction of travel, GitLab and BitBucket will eventually follow, so changing platforms just to avoid introducing this material would end up creating extra work.

@dokempf Thanks for agreeing to work on this!

So, I have added a PR that solves this issue by going the route of explaining how to generate a personal access token. I personally feel that this is the way to go, because it does not add a lot of cognitive load compared with setting up SSH. I would not switch hosting provider - after all, their reasons for doing this transitions are quite valid! Looking forward for your feedback!

Just a quick idea -- would it make any sense to mention / introduce ssh in the shell lesson? There's already some mention of using shells for working with clusters or other remote systems in there, and needing / wanting to work with remote systems is a relatively common reason for scientists to learn using shells, in my experience.

Thank you for the PR @dokempf !! @jttkim you bring up a good point too. I think this is a good question for us to raise to the curriculum advisory committee. @swcarpentry/curriculum-advisors

The Carpentries curriculum advisory committees were temporarily "deactivated" last year, in response to the additional pressures the pandemic has placed on all our lives. We hope to be able to reactivate those committees soon but, in the meantime, I think your question @jttkim @munkm would be best put to the shell-novice Maintainers via a new issue over at https://github.com/swcarpentry/shell-novice/issues/new/

Just realized I was duplicating this issue with #782. I will disagree however that teaching PATs is better than teaching SSH. Isn't the SWC principle to prefer open source solutions over proprietary (i.e. GitHub) solutions?

Just a quick idea -- would it make any sense to mention / introduce ssh in the shell lesson? There's already some mention of using shells for working with clusters or other remote systems in there, and needing / wanting to work with remote systems is a relatively common reason for scientists to learn using shells, in my experience.

I had written up something last year that I never got around to posting here, about teaching ssh in the Shell lesson. I'm just going to post it here to get it out there, even though it is late to the conversation. Text below.

I would like to propose an alternative path to teaching a traditional two-day SWC workshop, which are centered around some observations I’ve made over teaching several workshops. This chiefly centers around the use of ssh and similar commands for file transfers and authentication across all lessons, but mostly Unix and Version Control.

While Software Carpentry is meant to be discipline agnostic, a principle I agree with, I have noticed that many learners have a need to learn these skills at the point of use. SWC is not always that time. In addition, showing the practical uses for why we teach certain tools is something that many learners say they want more of, and would provide more clarity to the lessons themselves.

I feel software installation is the bane of all workshops, and getting the corresponding practice files is also somewhat painful at times. In workshops where learners did not receive the practice files ahead of time, it always takes some valuable time to get them set up and configured. Ssh with scp seems to be a possible solution to the file setup issues, and provides a practical example for how the material comes together. Instead of spending time downloading files and moving them, why not just setup ssh keys and then download where you want to while connecting to the curriculum?

Most of the internet hosting sites we would use for version control support both https and ssh, and many computing clusters utilize ssh as a means to log in. I feel that in the Unix lesson, I can cover about five of the default seven modules before running out of time in most three hour workshops. In two hours I’m compressed further. This does mean sacrificing something to make this work, but it does allow for concepts to be stacked on top of each other more, which provides additional reinforcement. This could be used to play around with a HPC cluster if one has access to it, or just use a command like scp to download files from a carpentry server. I have done this deviation with ssh to a cluster before in stand-alone workshops.

It is a bit of setup work to cover ssh in the version control lesson; it requires generating a key, connecting that key to the version control site, and doing some other tasks. I do feel that I am able to cover much of the version control lesson, including collaborative exercises, within two hours. Having the normal three gives a lot more buffer for much of the material, and probably gives enough time to set things up with ssh. This probably isn’t possible in my two-hour versions, however. I’d estimate some time needed though, as the version control always has items that come unglued for some users, and they need extra support.

Given we're not using any GitHub specific features in the lesson (as written), maybe we should look at alternate hosts (e.g. GitLab, Bitbucket)? Having supplemental lessons covering setting up ssh would be useful, but given the current difficulties getting learners set up, I'm not sure staying with GitHub is worth the effort of additionally setting up ssh keys.

Given GitHub is owned by Microsoft (not Open Source), and subject to certain export controls, I have also wondered why we do not swap to GitLab or self host on something like Gitea besides the technical debt incurred of moving platforms now. GitHub does appear to behave more and more like a social network of a kind as time goes on, and perhaps that is what we want out of a platform. But, I am going to plus one this suggestion as I have brought it up before.

SSH key is currently more widely used, both for git as well as when using the shell. It is platform agnostic, and may be better to use it instead of personal access tokens.

Given GitHub is owned by Microsoft (not Open Source), and subject to certain export controls, I have also wondered why we do not swap to GitLab or self host on something like Gitea besides the technical debt incurred of moving platforms now. GitHub does appear to behave more and more like a social network of a kind as time goes on, and perhaps that is what we want out of a platform. But, I am going to plus one this suggestion as I have brought it up before.

One feature that makes Github standout is the integration with Zenodo, as mentioned in the hosting chapter. Although its possible to manually archive a release tar/zip file on Zenodo, the automatic archiving of a Github repository is quite useful and less effort. I don't think an equivalent feature is available any other platform? I wonder how much effort it would be to implement it and if Zenodo would be interested in supporting this?

Zenodo + gitlab is covered at zenodo/zenodo#1404, the summary of which is it's a challenge to support self-hosted instances of services (either github or gitlab), so there's not been much progress, though there exists code to support this.

I think that just as with SSH keys, we need to introduce concepts related to SSH, with personal access tokens, we need to introduce the concept of password managers. We can't just tell students "generate a token" and then expect them to know how to keep it:

  1. they won't remember it
  2. they won't be able to type it without typos
  3. they shouldn't save it in plain text on their computer.

Request to introduce SSH keys in HPC Shell hpc-carpentry/hpc-shell#39 Maybe something similar could be used here? A more complete lesson on SSH keys is available at https://arc-lessons.github.io/security/04_sshkeys.html

I agree with the idea of using SSH, but it can be difficult to convey (cognitive load).

Two suggestions, that may depend on the particular instructor circumstances:

  • Github Desktop will automatically acquire a PAT, and with Desktop present, it appears that Git bash will use the resulting PAT as well (without Github Desktop running) - tested on Windows, probably also works on Mac.
  • Use Gitlab instead. While they also require PAT, they provide in the git push message a direct link to how to generate a PAT:
$ git push
remote: HTTP Basic: Access denied
remote: You must use a personal access token with 'read_repository' or 'write_repository' scope for Git over HTTP.
remote: You can generate one at https://gitlab.com/-/profile/personal_access_tokens

Today we finished a workshop with the USGS where we taught Git using the SSH authentication.
This was chosen for a few of the following reasons but not limited to:

  • ~50% of individuals had TFA, necessitating a uniform switch to NATs or SSH
  • Easy transfer between their two platforms (Github and Gitlab)
  • Ample day of support and a central IT department willing to help them in the future

I want to catalogue our experience, which I thought was largely positive, and also critically assess things that came up.
Hopefully this serves to be useful due to the fact that most individuals were using Git Bash and I was teaching from Ubuntu.

I started off the lesson with an explanation of SSH key pairs as a way for your computer to tell Github who it is and that it should be allowed to edit a repo. I tried not to tread too deep into the subject beyond the idea that they come as a pair, one which stays on your computer and one that is public and given to Github.

Command by Command:

  • ls -al ~/.ssh
    • I decided to start the lesson checking to see who had ed25519 keys pre-generated. Several individuals did, and I explicitly asked them to not go through the next step to avoid overwrite. In hindsight, ssh-keygen will ask if you want to overwrite which might be an easier place to catch these cases.
  • ssh-keygen -t ed25519 -C "your_email@example.com"; ls -al ~/.ssh
    • Explaining ssh-keygen seems to be pretty simple once they know keys need to be generated. Oh we need to generate keys? Ok, this generates keys. I stayed away from going in depth about ed25519 since I am not well informed on it and I thought that was unnecessary. Otherwise, since the program handles most of the defaults for you, I could talk about each successive step without worrying what was displayed on each computer.
  • eval $(ssh-agent -s)
    • This was our first big problem. ssh-agent was already running on my computer so I missed this step on the student laptops. It caused a bunch of people to get a "Can't contact ssh-agent" error that I was not expecting. Also, I didn't have a good way to talk about each part of this line, but it seems very necessary. It could be a throw away setup command but I feel uncomfortable about that.
  • ssh-add ~/.ssh/id_ed25519
    • I think this command is also ok to go over. Students were told that you need to inform SSH of what private key to use and ssh-add is at least an intuitive program name.
  • nano ~/.ssh/id_ed25519.pub
    • I used nano to open up the public key file and copy and then walked through the steps of adding it to Github. I explicitly got asked to redo this step a couple times, since it involved a jump from the command line to the browser. It could have been much smoother but doesn't seem to be any worse than flipping between the command line and Github in the "remotes" lesson.

All in all, with debugging, this was a 30 min lesson.
The nice part is that everything works surprisingly smoothly when every step is followed. Learners seemed to enjoy it because it feels like such an accomplishment at the end. It also includes several steps which are easy places for thumbs up thumbs down re-grouping as well.
That said, I do not know if learners came away with firm understanding of what was happening. I had to keep a lot of explanation shallow and without an in person white board I am not sure if I could go more in depth.

Major Errors:

  • Missing the ssh-agent step above is a frustrating error and tripped me up for a good 5 minutes
  • Due a miss specified HOME variable in one learner's Git Bash, we were never able to get their SSH working and had to walk them through NAT usage in a breakout room. I also had to make sure to tell them to use the HTTPS download links which I'm sure confused the other learners.

Again, I think the experience was positive overall and would be inclined to try it again if it seemed like the best option.

@AnthonyOfSeattle This is very helpful. Thanks.

Is it necessary to use ssh-agent for this? Skipping it (and hence also ssh-add) would remove a lot of the conceptual complexity from this. Spending 30 minutes on "understanding SSH keys" seems to be a large diversion in a lesson that already has a lot of conceptual complexity. (I don't want to get into the discussion about GUIs here, but command-line git already makes version control seem a complicated and challenging thing to use.)

The cost of this would be typing a passphrase on each pull/push. This may lead to the key having a slightly shorter passphrase because people don't want to type a long passphrase each time they push/pull, but that is not much less secure than using password authentication with a short password (the caveat being that GitHub can rate limit the latter, whereas you can copy a private key and brute-force it more rapidly). It is more secure than a naked access token, or a passwordless key, the latter of which is more common than it should be. We could then point out that there are ways of automating this that we won't cover, and point to somewhere that talks about setting up keys in more detail. One place it might make more sense to have this is in the Shell Extras lesson (which does talk about SSH keys, but doesn't currently use ssh-agent for them either, so a PR to add this discussion there would be needed).

There is a trade-off to be made here between getting learners to use version control at all, and making those who do to be as secure as possible. I don't think the aim of this lesson is to prepare people to be maintainers of high-value repositories that will make them targets for hackers, so personally I think skipping ssh-agent make sense in exchange for reducing the conceptual load of the lesson. That's just my opinion though; it would be good to get more perspectives.

Possibly instructor notes can mention that ssh-agent can be skipped or mentioned if running out of time.

Returning here to add some more context on the question of including content on SSH & keys in the shell-novice lesson. There are a couple problems with this:

  1. the Shell lesson does not include any content on SSH and connecting to remote machines. (The Instructor Notes for that lesson mention that SSH is omitted intentionally, due to time constraints.) Additionally, it is unclear how such a section should be included without a remote machine to connect to. Many hosting institutions may have a remote machine available for teaching SSH but we cannot assume this would be possible for every workshop. In this regard, the advantage to teaching SSH keys in the Git lesson is that the importance and usage can be explained and demonstrated in the context of connecting/pushing to GitHub.
  2. if a section on SSH and keys were to be added to the shell-novice lesson, it would likely appear late in the lesson. In my experience, Instructors often run out of time before they can reach the later parts of the lesson in SWC workshops. The danger would be that SSH and keys are not covered in the Shell part of a workshop, but are then required in the Git lesson.

For these two reasons I think it would be preferable to cover SSH keys in this git-novice lesson rather than shell-novice.

Personally, I'm inclined to recommend SSH keys. I see it as the more 'general' solution - we're really teaching people to use Git in SWC, not how to use GitHub. Here at the University of Birmingham we have an internal GitLab server that students/staff can use, so I'd be wary of introducing Personal Access Tokens for that reason alone.

With that said - they are more complicated. But PATs require people to be using a password manager and for them to know how to use it. Otherwise they'll just be copying and pasting it into a file and then copying it into the shell, which sort of defeats the point.

In addition, they're useful for more stuff. I don't think we necessarily need to introduce SSH'ing in the SWC material (although I must admit - we do cover this in our course that we give alongside SWC).

I've not used PAT but I've found Github's documentation for setting up SSH keys and adding them to your GitHub account to be quite good. I've helped 3-4 people set it up on their machine (including a Windows user) using that documentation.

Would also like to suggest this issue be made high priority since GitHub will no longer allow accept passwords in mid-aug (~6 weeks from this message) and that will certainly effect workshops being taught in that time frame.

A new episode needs to be made to include this content. I suggest using the commands listed by @AnthonyOfSeattle's comment. It includes ls -al ~/.ssh which is a good introduction to ssh keys. Another change to the lesson because of this content is rewriting the lesson prerequisites. With this content, I think "Some previous experience with the shell is expected, but isn’t mandatory." is no longer accurate for this section.

This is already affecting workshops. What material should be cut or made optional to keep the lesson at the same length?

@kekoziar Given the potential for confusion around ssh-agent that @AnthonyOfSeattle raises that is conceptually completely divorced from this lesson, and the fact that it will take an entire episode (necessitating skipping something, when everything else in this lesson is important and valuable). To elaborate on my comment upthread, the workflow I would propose would be:

  • ssh-keygen -t ed25519
    • "GitHub doesn't let us log in from the terminal with our username and password, instead we need to use keys. This command generates a key that we can use."
  • cat ~/.ssh/id_ed25519.pub
    • "Now we've generated a key, we need to tell GitHub what it is. We give GitHub our public key by copying and pasting this into the relevant page. You never need to look at your private key, and should never share it. If you move to a new machine, generate a new one."

cat is already used in this lesson, and avoids flicking to and from a nano screen. The only new command is ssh-keygen, which I would treat similarly to the git config commands used at the start of the lesson—an incantation that you can look up when you need it. In that case I don't think the prerequisites would necessarily need changing. I think this version could fit in a section rather than needing an entire episode.

Could you please explain the reasons for the preference for the longer-form version over this abbreviated one? Thanks!

(If this changes your mind, I'm happy to write the above up into a PR; I can have this by early next week.)

I agree with @kekoziar about the prerequisites and I agree with @edbennett that this could be done with the ssh-keygen command within the episode itself.

Because I know I'm not alone in having the experience of bewildering confusion about the process of generating and maintaining SSH keys, I wrote up an explainer of what exactly is going on with each of the commands and where things will be difficult with ssh-agent depending on operating system: https://codimd.carpentries.org/ssh-git-quickly?view

My hope with this document is that it will make demystify the process so that contributors and instructors can find the path that will teach the most useful thing first. 😃

(Note: I will also write up a similar explainer for PAT authentication)

I know that @emcaulay (LC git maintainer) has been thinking about this issue as well and could give some insight from the Library Carpentry side.

@edbennett unfortunately, due to GitHub's changes, and that the Unix shell lesson won't integrate SSH in their lesson, we have to adapt the lesson. It's a good solutions, since SSH is used in many git cloud services. I think we should include more rather than less, and let an instructor determine what they have time to go over. We can do this using callouts and a supplemental episode.

The steps @AnthonyOfSeattle listed are directly from GitHub: Check for existing keys, and generate new keys. And, he stated "Missing the ssh-agent step above is a frustrating error," so we should probably include it.

Thanks, @zkamvar for your doc. I think it will help.

@kekoziar Thanks for your reply. I think I may not have communicated entirely clearly. I completely understand the need to adapt this lesson, which is why I opened this issue in the first place. I agree with others in this discussion that SSH is the better alternative compared to personal access tokens.

I use SSH keys every day, and using ssh-agent is still one of the biggest points where things can go wrong for me. I think that GitHub's instructions are targeted at a different audience to Software Carpentry lessons. I agree with @AnthonyOfSeattle's statement that "Missing the ssh-agent step…is a frustrating error". If you're expecting everyone to correctly type that instruction, and some learners do not type it correctly, then they will see different results and need to ask questions to catch back up. My point is that you can remove that step entirely, because it is not compulsory to be able to use SSH keys with GitHub. Then, there is no risk of frustration if the step is missed. It can't be missed, because it's not there in the first place.

My point is that you can remove that step entirely, because it is not compulsory to be able to use SSH keys with GitHub.

Correct me if I'm wrong (I've just learned this from @zkamvar 's helpful summary), but this is only true if you do not enter a passphrase during the key generation step. I would want to teach it with @edbennett 's simple 2 step approach, with the additional instruction to not enter a passphrase. Note that ssh-keygen will ask if you want to overwrite an existing key file if one exists, so the ls ~/.ssh step is not necessary.

The one remaining boogeyman that trips me up is the known_hosts file, which instructors usually have while trainees usually do not. A lengthy command, but one that essentially hides this complication, to consider is:

$ ssh -q -T -o StrictHostKeyChecking=accept-new git@github.com
Hi itcarroll! You've successfully authenticated, but GitHub does not provide shell access.

This serves both to test the key and add GitHub server's fingerprint to the known_hosts file.

My point is that you can remove that step entirely, because it is not compulsory to be able to use SSH keys with GitHub.

Correct me if I'm wrong (I've just learned this from @zkamvar 's helpful summary), but this is only true if you do not enter a passphrase during the key generation step.

This isn't true. If you have passphrase-protected keys and don't use ssh-agent, then every time you connect to GitHub, you need to type the passphrase for the key. This is very close to what learners are already doing—currently (unless their OS is providing an authentication helper to cache it) they type their GitHub password each time they pull or push. The only change will be that it is now the key passphrase instead.

If you want to not to have to type the passphrase every time, then you're correct that you either need ssh-agent or a key without a passphrase.

I would want to teach it with @edbennett 's simple 2 step approach, with the additional instruction to not enter a passphrase.

I am not keen to encourage learners to use keys without passphrases. These were used to compromise a large number of HPC facilities last year.

Hi all, I have added a writeup of Personal Access Tokens for future reference: https://codimd.carpentries.org/pat-git-quickly?view

I have also added some more information to the Secure Shell tutorial for instructors to demonstrate it as the learners would see it: https://codimd.carpentries.org/ssh-git-quickly?view

Note: I am acting as a neutral party here. There are benefits and drawbacks for each system in regards to cognitive load, generalizability, and security. My aim here is to give you enough information to compare and contrast the methods without hubris.

I consider myself a perennial novice, and therefore I come to this discussion still as a novice.

I have heard of SSH before I found myself trying to take on this issue as a new maintainer of lc-git. Meanwhile, I had never heard of PAT (personal access tokens) before. That biases me toward SSH as a more familiar solution.

I think setup is and preliminaries are valuable parts of our workshops. I'm not in a hurry to skip the first hurdle just so that I can get to the next 5 hurdles. If we don't teach learners about setup, they can't help themselves afterward.

To the SW git maintainers, what is the next step for coming to a decision and implementing a revision?

I'm throwing in my 2 cents that SSH is more universal; AND an instructor can demo the process in a live-coding session in a secure way, because the only visible content is the public key.

The same isn't true for a PAT—having a PAT visible to others is a security risk, and while it might not be an actual issue for an instructor during a workshop, we shouldn't be blasé about demonstrating unsafe actions to learners.

Thank you, @ha0ye, for your input!

After reviewing all the discussion, @kekoziar and I met (as maintainers of the two Git lessons (Software Carpentry and Library Carpentry) and we agreed that SSH was the right choice because it was not a Github-only solution. Therefore, learners could benefit from learning a little about SSH key pairs whether they use Github in their research and work or whether their environment leads them to use a different git hosting solution.

@kekoziar has drafted a superb revision and I've made a review. I think we should get it merged into the lesson shortly.

I believe everything except for the supplemental episode has been committed. I opened a new issue (#824) for the supplemental episode, and will close this one, because the most immediate and main issue that this thread addressed has been resolved. The supplemental episode will be developed on a new branch, so we can merge it fully-formed into the published lesson. Information for how to work with the branch is in the new issue (#824). I look forward to everyone's contributions!