SLP: Introduction to git and github

Go up to the main SLP documents page (md)

People familiar with git and github can skip some parts, depending on your level of familiarity. However, it would be wise to skim the parts you are skipping. And everybody must read the section on tags and branches, as most people who think they know about branching really don't.

Introduction

First, read through the Wikipedia page on git.

Git keeps the files, as well as the entire the version history, in a repository. A repository is just a directory structure created via the git init command (which is not being gone over here). However, most people who use git have the repository kept on a server. One can set up their own server to do so, although the user then has to deal with authentication and authorization (who can access which repositories, etc.). Many people use github, which is what we will be using for this course; other online services also provide git repository hosting, but we won't be going over them here.

The repository itself needs to be cloned in order for the files to be worked on; this is analogous to subversion's checkout command. This is done via the git clone command. There are a few ways to clone a repository:

git clone git@github.com:aaronbloomfield/slp

Or:

git clone https://github.com/aaronbloomfield/slp

For both of these commands, the user who owns the repository is aaronbloomfield, and the repository is slp. Either of the two commands will work, although the first one is considered more secure (see here for details). And both will require you to enter your github username and password any time you clone a private repo, and any time you push (more on pushing, below) to any repository.

The Development Environment Setup (md) page describes how to configure ssh and git so that you do not have to enter your password each time.

Once the repository is cloned, you can work on the files within the created directory. The git commands will work anywhere in that cloned repository -- you don't have to be in the root of that directory structure (which was the case with subversion). There are three separate steps to get a file added to the github repository.

  1. As you create files that you want to add to the repository, you use git add <file>. This stages them for (i.e., gets them ready for) adding to the repository, but does not add them just yet. You can enter git status to see what files have been added and what not been added. If you modify a file, and you want to add those modifications to the repository, you also do a git add <file>. You can add as many files as you want before proceeding to the next step.
  2. Once you have made the changes that you want to make, you commit the changes to the repository: git commit -m"I implemented HAL 9000". This takes all the staged changes (those what were added via git add <file>), and adds them to the repository wrapped into a single commit; you can specify the mssage on the command line as shown in the example. You can commit as many times as you want before proceeding to the next step.
  3. git is a distributed version control system. What this means is that any commits are made to the local repository (i.e., what you have on your computer), and not (yet) made to the version of the repository that git hosts. In order to do that, you execute a git push -- this takes all the commits, and pushes them to the remote server (i.e., github). This means that all the committed changes are now on github.

git command summary

There are many other git commands; type man git for a list, then type man git-clone for the details about the git clone command (likewise for the other git commands).

Exploring github

There are many services that provide git repositories. Github is one of the most popular. They provide free repositories for education, and have a powerful set of tools available through their web interface for managing your git repositories.

Github defines two types of repositories: public and private. Anybody can create as many public repositories as they want, but note that anybody else can see the public repositories. This course (and this document) is available in a public repository. In contrast, private repositories can only be viewed by those who are authorized to view them. Private repositories cost money, although github provides free repositories for educational use.

When viewing any repository, it will display, underneath the list of files, the contents of a file called README.md (or readme.md, or readme.markdown, etc.). Markdown is a plain text formatting system that aims to make the document readable in both text format as well as easily allowing it to be converted into HTML or similar. In fact, any directory that has a readme.md (or similar) will display the formatted contents underneath the list of files. For a quick example of the Markdown syntax, see here.

Note on the right-hand side that github provides issues and a wiki (both of which can be formatted with Markdown). Take a moment to explore these, as they will be used extensively in this course. For the assignment below, you will need to created formatted wiki pages, and create issues (along with labels and milestones).

Also on the right-hand side is the link for settings. From there, you can select Collaborators, and select other people who can access the repository. If the repo is public, then anybody can clone it. But only collaborators can push to a public repo, or have any access (clone and push) to a private repo. You will need to add the course instructors to your private repo for the git homework (md).

Forks and Pull requests

Github easily allows one to fork a repository that they have access to. This is a github feature; there is no git command named "fork". Basically, this is a means to create a new repo that is a copy of an old repo, and keep track of which repo is the old repo. As you have created the forked repo, you can make any changes to it. This is typically done in two different situations. One is if you want to suggest a change to be sent back to the source repo; this is done in github via a pull request (also a github-only feature), and is described shortly. The other reason is if you want to work on a project that is based off of another project. For example, the aaronbloomfield/github-api-tools is a fork of the KnpLabs/php-github-api repo, as the former uses the library that was created in the latter. However, the fork adds features that are not meant to be in the original (i.e., the fork is an application that uses the original).

There are two ways that people can change a repository. The first is if the user has permissions to push to a repo. This is the case for any user's own repositories, and will generally be the case for the group repositories in this course.

However, it may be that one wants anybody to be able to suggest changes to a repository. In this case, the user creates a fork, makes changes to the fork, and then suggests those changes back to the maintainer of the original repository. This is called a pull request (again, this is a github name, and is not a git command, per se). This will send a notification -- or request -- to the maintainer of the original to "pull" the changes into the original repository (hence "pull request"). To submit a pull request, click on the "pull requests" button on the right side of the github repository page.

When you create a fork, it copies the original repo at that point in time. To update the original repo, you will want to follow the directions here. Note that you will likely have to configure the upstream repo the first time you do this.

Tagging

A tag is a way to denote a particular commit or a particular date. For example, version releases are typically tags. This differs from a branch, which is when development is going to go in multiple different directions. A tag is just a means to demarcate a point in time. Consider the main linux kernel repo -- there are a lot of tags there.

In the examples below, we use "v1.4" as the tag name, but it can be any string.

More information about tagging can be found here; the examples in this section were taken from that page.

Branching and Merging

First, a comment about what branching is for. Branching is for when you want a separate development path to occur in conjunction with the original development path. It is NOT to indicate a particular commit or a particular date (that's a tag). Branching is a wonderful and powerful tool -- but, like all tools, it should be used wisely and for it's proper purpose. Linus Torvalds, the creator of git and the Linux kernel, has only one branch in his linux kernel github repo, albeit a very large number of tags. (Granted, he doesn't use github as his primary workflow git server, but still...)

Branching is merging is complicated, and one should read this page in it's entirety.

A few notes from that page (this is NOT a summary):

Note that if you have local changes (modified files, or things added via git add), then you cannot switch branches. You will either have to one of the following:

  1. Undo the changes (run git checkout <file> to switch to the version before the modifications; run git reset <file> to undo an add)
  2. Save the changes via git stash save
  3. Commit the changes via git commit

The rest of branching and merging should have been covered in that document

git for those coming from subversion

While there are many differences between git and subversion, there are also similarities. Here are the equivalent git commands, as much as is possible, to the SVN commands. Note that the flags to these commands are not listed below, and are generally different between the two systems.

A bunch of SVN commands operate the same (more or less) under git:

And some SVN commands have no equivalent under git: