Git

Local repositories

Creating and tracking files

To set up a local git repository (also called repo), move to the directory of interest in the terminal/shell and use the following command.

git init

This creates a git repo for the current directory, and creates a subdirectory called .git which contains all the information git needs to do its job. The initialized git repo also includes the subdirectories and their files as well. As a result, there is no need to initialize a new git repo every time a new subdirectory is created.

The next step is to rename is the default branch to main.

git checkout -b main

We change the default branch name since the community as a whole is moving on from the previous conventional name of master.

At any point we can ask the current status of the git repo as well.

git status

Now for tracking changes with git.

Suppose you create a new file called sentence.txt.

echo "Tracking the first change on git woohoo" >> sentence.txt

Checking the status of the git repo (via git status) will reveal that we will have what are called Untracked files. To track this file, we need to add it.

git add sentence.txt

Checking the status will now reveal this change under Changed to be committed. The next step would be to commit this file, with a short message which describes the changes made.

git commit -m "added text for first git commit"

The git commit command permanently adds the changes added by the git add command in the .git directory. Each commit has a hash value, called its identifier to unique identify it ¹.

We can view the commits made to a repo in reverse chronological order as well, note however if your project has many changes it can be a bit overwhelming, so we can use the -k to show the last \(k\) commits.

git log -1

Other flags for git log include --oneline and --graph.

Now, suppose we add another line to this file.

echo "Second line to the first file created" >> sentence.txt

Checking the status status= will this time reveal this change under Changes not staged for commit. When making a change to a file it is good practice to first view the exact changes being applied, which can be done via the following command.

git diff

From here on we can use git add and git commit to track these changes.

Alternatively, we can also check the differences between the files after they are added to the repo via git add.

git diff --staged

Other flags for git diff include --color-words which show changed words using colours, in case lines are not too fine grained.

The process of tracking changes can be thought of as taking snapshots of changes of a project as it progresses, where git add defines what goes in to the snapshot (putting them in a staging area), and git commit actually takes the snapshot and makes a permanent record of it. If nothing is staged, git will prompt you to use git commit -a which commits all changes done - this is not good practice and should be avoided, since you may commit changes that you forgot you made.

To make git ignore certain files/directories, you can type what to ignore in a file called .gitignore. Below is an example of a .gitignore file where we do the following:

Ignore all files with the extension ".xcf".
Ignore the directory "passwd".
Ignoring just the directory "img" inside the directory "data".
Keep a specific .xcf called "results.xcf".
Ignore all ".dicom" files in "data/scans".
Ignore all ".pdf" files in different subdirectories regardless of their position in the directory tree.

*.xcf
passwd/
data/img/
!results.xcf
data/scans/*.dicom
**/*.pdf

Exploring History

The most recent commit can also be referred to by the identifier HEAD. Suppose we make new changes to our file.

echo "Third line to the first file created" >> sentence.txt

We can see the differences between the file now and from \(k\) commits before the current HEAD by adding HEAD~k in the git diff command. Alternatively, we could also use the full/first 7 characters of the identifier hash value instead as well. For example, to see the difference between the file now and from 1 commit ago, we can do the following.

git diff HEAD~1 sentence.txt

To go back to a certain commit, we checkout.

git checkout HEAD sentence.txt

The command above reverts back to the last commit, thus deleting the third line in sentence.txdt. Instead of HEAD, we can go back to any other previous commit using its shorter identifier, say abcd123 - note that in this case, the snapshot of the file sentence.txt from the abcd123 commit will be in the staging area. Note that instead if you do not specify the file name when using git checkout to a previous commit, you go a detached HEAD state, where you can "reattach" your head by checking out to the main branch via git checkout main.

Remote repositories

Often you would want a copy of your git repo somewhere besides your own personal computer, this where the concept of git remotes are relevant. These are repos hosted online, commonly on services like Github, Gitlab, Bitbucket etc. Suppose we are using Github, with the username ghuser1 and the name of the directory of the git repo above is firstgit. After creating a git repo on Github with the same name, we need to connect our local repo to this remote.

git remote add origin [email protected]:ghuser1/firstgit.git

In the above, origin is a conventional name used to refer to the remote repo. Note we assume that you have SSH setup with Github.

To push local changes to the remote repo, we can do the following.

git push origin main

To pull remote changes to the local repo, we can do the following.

git pull origin main

Collaborative workflow

Suppose now that you are another person (not the owner of the repo) who has access to it nonetheless. In this case, the first step to work on that project is to clone it to your computer - suppose, to a directory called projects.

git clone [email protected]:ghuser1/firstgit.git ~/projects

From here, a basic workflow would be to pull, add, commit, then push everytime you make a change.

However, it might be possible that by the time you want to push your changes to the remote repo, someone pushed their own changes to the remote. In this case, we have to pull the remote changes again. It is possible that the changes you just pulled from the remote conflict with the changes were about to push. In this case, git notifies you of a merge conflict, which we have to manually resolve by looking at the files where they occurred.

Merge conflicts cause a bit of friction since you have to manually resolve them. They can be reduced by pulling from the upstream repo more frequently, making smaller atomic commits, breaking files into smaller ones or any other change that makes it unlikely for more than one person working on the same file, defining tasks required to be done for the project and assigning these tasks accordingly, using conventions for code style, and perhaps most important, using different branches.

Extras

You can shorten some of the commands via git alias.
For more details on undoing changes, see here.
Specific collaborative workflows are detailed here.

Thoughts

This content is derived from the excellent tutorial from Software Carpentry.
All of the above assumes git is configured with the name and email on the user's computer.
TODO Details on branching and git restore.
TODO Look into the difference between *.something and **/ *.something in .gitignore

Footnotes:

With high probability.