greenkidneybean / Repro_git_vcs_tutorial

A simple tutorial on Git and Github, geared at scientists and reproducible research applications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Git and Version Control

NLM REPRODUCIBILITY WORKSHOP

Keith Hughitt 2019-05-15

Outline

  • Overview
    • Intro to version control systems (VCS)
    • Why is VCS useful?
  • Git Basics
    • Installation
    • Seven most useful git commands to know
      1. Creating a new repo (git init)
      2. Adding files to a repo (git add)
      3. Checking a repo's status (git status)
      4. Saving changes (git commit)
      5. Pushing your changes to a remote repo (git push)
      6. Pulling changes made to a remote repo (git pull)
      7. Downloading a copy of a remote repo (git clone)
  • GitHub Basics
    • Overview
    • Why use GitHub?
    • Single-user Workflow
    • Multi-user Workflow
    • Setting up the master and forked repos
    • Making changes
    • Accepting changes
    • Other multi-user workflows
  • Beyond just code
  • Further reading

Overview

The goal of this tutorial is to familiarize the user with the basics of version control (VCS), Git and GitHub.

Of course, there are already numerous tutorials which do this and do a much better job than I could hope to do, e.g.:

I would encourage people to check these out as well.

Here I am just going to try and cover enough to get people started and hopefully interested enough to try it out and learn more.

Intro to version control systems (VCS)

Version control systems (VCS) are software tools used to track changes to a collection of files and directories and to aide in collaborative development. VCS is most widely used in the context of software development for tracking changes to code, but it can also be used to track changes to other types of work such as manuscripts, data, etc.

Some popular examples include:

  • Concurrent Versions System (CVS)
  • Subversion (SVN)
  • Git
  • Mercurial

Although the big picture is generally the same for each of these, and using any of them is going to be better than using none, there are some differences in the philosophy and function of each.

CSV and SVN were developed first, and are centralized version control systems. This means that there is a master codebase, and client hosts which "checkout" pieces of this code to make changes.

Newer VCS, including the later three listed above, follow a different approach called distributed VCS (dVCS). In this model there is no central repository -- all clients have an entire copy of the repository.

Both approaches have their advantages and disadvantages. The focus in this tutorial, however, is on one of the dVCS: git.

Why is VCS useful?

Some of the main uses for VCS include:

  • Tracking changes (imagine not having an undo button in Word...)
  • Backing up code or other files (Mirroring on GitHub, etc.)
  • Experimentation (branches)
  • Collaboration

Git Basics

In order to make use of Git and Github, you must first download and install the Git client. Below, we focus on using on the command-line git command. Depending on your OS, there may also be a GUI interface to git that you can also use. Many modern integrated development environments (IDEs) (for example, RStudio) also include functionality for interacting with VCS tools, including Git.

Installation

Download and install Git from git-scm.com.

Seven most useful git commands to know

This is 99% of what you need to know to use Git:

  1. git init
  2. git add
  3. git status
  4. git commit
  5. git push
  6. git pull
  7. git clone

1. Creating a new repo (git init)

To create a new git repository, simply enter the root directory which you want to make a repo and run git init:

$ mkdir test
$ cd test 
$ git init
Initialized empty Git repository in /home/username/test/.git/
$

2. Adding files to a Git repo (git add)

$ touch foo.txt
$ git add foo.txt

3. Checking a repo's status (git status)

It's always a good idea before making a commit to check the status of a repo before making any changes using git status:

$ touch newfile
$ echo 'Hello World' > newfile 
$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    newfile

nothing added to commit but untracked files present (use "git add" to track)
$ git add .
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   newfile
$

4. Saving changes (git commit)

Once you have done something interesting, commit it!

$ git commit -m 'Important change #1'
[master (root-commit) 9c5205a] Important change #1
 1 file changed, 1 insertion(+)
 create mode 100644 newfile
$ 

Here, the -m parameter is used to specify a commit "message" to associate with the changes you've made.

Note that when you use the command git commit -m, only the changes that you have stages (using git add) will be included in the commit. In order to include all changes made to files in the repo, you can use git commit -am. This will include all files already in the repo (i.e. previously added using git add) to the commit.

So, to recap:

  • When you want to add a new file, use git add <filename> or git add .
  • When you want to save changes made to one or more existing files in the repo, use git add <changed_file1> <changed_file2> ... + git commit -m "message" or, git commit -am "message" to include all modified files.

5. Pushing your changes to a remote repo (git push)

Once you have committed some changes, you may want to sync them with a remote repository such as GitHub. This is done using the git push command.

$ git push -u origin master
Counting objects: 3, done.
Writing objects: 100% (3/3), 232 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To git@github.com:khughitt/test-repo.git
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.
$

Note that for this to work, you must first create a remote repo and add a reference to it. We will come back to this part later...

If your repo is hosted on Github and this is the first time you are pushing changes from the computer you are using, you will also need to add a public SSH key for that computer to your Github account.

6. Pulling changes made to a remote repo (git pull)

Once you start to collaborate with other people, you will need a way to sync your repo when other people have made changes to the shared repo.

This is done using the git pull command.

$ git pull
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From github.com:khughitt/test-repo
   9c5205a..1336440  master     -> origin/master
Updating 9c5205a..1336440
Fast-forward
 README.md | 3 +++
 1 file changed, 3 insertions(+)
 create mode 100644 README.md
$ 

7. Downloading a copy of a remote repo (git clone)

Finally, you may come across code or other files hosted in an online repo (usually on Github) that you wish to download and possibly make changes to. The command to do so is git clone:

git clone https://github.com/khughitt/labnote                                    
Cloning into 'labnote'...
remote: Counting objects: 763, done.
remote: Total 763 (delta 0), reused 0 (delta 0), pack-reused 763
Receiving objects: 100% (763/763), 1.90 MiB | 3.95 MiB/s, done.
Resolving deltas: 100% (382/382), done.

The above command downloads the khughitt/labnote repository from Github over https, and stores it on your local machine. By default, it will be saved in your current working directory in a directory with the same name as the repo (here, labnote).

GitHub Basics

Overview

GitHub is a free online mirroring service for git repositories. It hosts mostly open source code, although you can also pay to have "private" repositories.

Why use GitHub?

Single-user Workflow

For small projects or scripts that you would like to track and/or share on GitHub, the process is very simple:

  1. Create a repo on GitHub
  2. Follow steps to clone repo and add repo as an upstream remote
  3. Hackity-hack (keep it atomic)
  4. git commit
  5. git push
  6. Repeat steps 3-5.

It is also not a bad idea to add a README.md to the repo with some notes to yourself or others (same as README.txt.)

Multi-user Workflow

The process for collaborating with other users on a project using Git and GitHub is similar to the single-user workflow described above, with a couple additional steps along the way.

Setting up the master and forked repos

  1. Create a repo on GitHub (do this once)
  2. Fork the master repo (each user does this)
  3. Follow steps to clone the forked repo and add repo as an upstream remote (each user does this)

Making changes

Next, once a repo has been created and each user has their own fork of that repo, the process each user follows to make changes is the same:

  1. If master repo has changed, used git pull to merge changes into fork.
  2. Make changes
  3. git commit
  4. git push
  5. Submit a pull request

Accepting changes

Once a pull request (PR) has been submitted, it will appear on the master repo. The PR will list all of the commits made, files changes, and any information the user submitting the PR provided about the PR.

If this all looks good, then any user who has privileges to the master repo can "accept" the PR, and the changes will (usually) be automatically merged into the master repo.

Other multi-user workflows

There are other workflows that can be used for collaboration on GitHub -- the above just illustrates one of these which I am partial to.

For larger efforts, you can also create teams on GitHub so that an entire team owns or manages a repo instead of a single user.

Beyond just code...

One of the nice things about Git and GitHub is that you are not limited to using it for just code. Some other useful things it can be used for include:

Further reading

If you want to learn more, there are a lot of other great tutorials on Git and Github as they pertain to science. Here are just a few examples to help get you started:

Some new section

lorem ipsum lorem ipsum lorem ipsum

References

Note: This tutorial was adapted from an earlier version originally presented at a UMD bioinformatics club meeting in January, 2014.

About

A simple tutorial on Git and Github, geared at scientists and reproducible research applications.