Git, GitHub, and RStudio – Part 2: Installing Git

In the last post we gave a brief overview of the benefits of Version Control software, and provided a short description of some of the different software packages that are available.  We concluded by introducing our version control software of choice: Git.

In this post we will cover how to install and get started using Git.  This short tutorial will only cover installing and using Git on a Windows OS (Windows 8.1).  Installation on earlier versions of Windows may differ slightly in the details, although I assume most of the key features have not changed.  Details on other operating systems may be added later.

Before you start

This tutorial has been largely based on the Pro Git book that is available as a PDF for free online, and also available from Amazon.com.  This book is well worth a read and it is highly recommended for those starting off with Git.

If you are feeling a bit impatient now, you should be able to follow this tutorial and get started with Git without first reading the book.  But it is highly recommended that you return to it at some stage in the near future.

Like a lot of software packages (including R) Git has quite a steep learning curve and takes a bit to get used to. In particular it is important to understand the fundamental concepts underlying version control.  We will be starting with the Git terminal and in future posts will talk about how to integrate Git with RStudio or another GUI application.

Downloading Git

Downloading Git for Windows is simply a matter of heading over here and saving the file to your machine.  At the time of writing, the current stable release of Git is Version 1.9.5 and is about a 17 MB file.  Of course, if for some reason you want to download any previous version of Git (or even the latest source code) you can easily do that as well. Why? Wait for it…because the software developers use version control!

Installing Git

Locate the Git installation file (most likely in the Downloads folder) and open it to start the installation process.

Git is licensed under the GNU General Public License which basically means that you are free to copy, modify and distribute as you please.  Pause here to say a big thank you to the kind developers of this and other open-source software who provide us with such incredible free tools.

Installation1

Next you will be presented with a set of installation options.  Unless you have a reason for doing different, it is probably best to simply accept the default settings.Installation2

In the next window you will be given the option of adjusting your PATH environment.  Again, the simplest will be to select the second option, and allow the installation process to add some details to your system PATH.  You should also accept the default settings in the next window where it details how Git treats the line endings in text files.

Installation3

 

The installation should now begin, and will takes a few minutes depending on the particulars of your system.  Congratulations! You have now successfully installed Git on your machine!

An alternative way to install Git

If you are planning to use the online remote repository GitHub (which we will talk about later) you can also install Git by downloading the GitHub for Windows GUI.  This will download the Git software, as well as the handy GUI and automatically set up a bunch of your settings.

There is also a GitHub for Mac GUI, which I assume is very similar.

Getting Started: Configuring Git

First thing to do is to setup Git on your system.

Open Git Bash (type “Git Bash” without quotations into the Windows search bar to find the application).  This is the command line interface for Git.  Alternatively, you can open a command prompt window (type “cmd” into Windows search bar”). GitBashIf your PATH has been modified correctly during setup, typing “git” into the Windows command prompt should result in a print out of Git commands.  If the command prompt returns something like “command not found”, Git has not been properly added to your PATH.   Either add Git to the system PATH or use Git Bash.

The first thing to do with a new installation of Git is to set your user name and email address.  Type the following into Git Bash:

git config --global user.name "YOUR NAME"
git config --global user.email "YOUREMAIL@EXAMPLE.COM"

To check your settings type:

git config --list

This should print out the configuration settings of Git on your system. If the user name or email is incorrect, simply repeat the above commands with the correct information.
You should only have to set up the Git configuration once, although it can be updated or modified at any time.

Getting Started: Setting up Git Project and first Commit

There are two main approaches to starting a Git project. The first is to take an existing project and import it into Git. This is the approach that we will be following here. The second method is to clone an existing Git repository from another server. We will look at this method later when we talk about GitHub.

There are a number of different ways to start a Git repository. The most powerful method is to use the Git command line interface. Although perhaps challenging at first, this is the only method that is capable of the full range of Git commands.  Similar to getting started with R, there is a bit of a learning curve but it is well worth learning the methods.  Later we will cover integrating Git with RStudio and GitHub for Windows, both which provide simple GUI for the basic Git commands. Using the GUI may be easier and quicker, but it is important to understand what Git is doing with each command, and the command line interface is a great place to learn this.

Example R Script

Let’s start by creating some R code to add to a new Git repository.  You can either follow the example below, or follow the steps with you existing R project.

The central limit theorem shows that as increases, the standard error of the mean decreases.  I’ve written a basic R script that involves a few loops and a plot to examine this:

Mean <- 100
SD <- 30
NVec <- seq(from=1, to=100, length=100)
saveMean <- NULL
NSim <- 100
plot(c(0,100), c(0, Mean*2), type="n", xlab="Sample size", ylab="Mean")
for (Sim in 1:NSim) {
 for (X in 1:length(NVec)) {
 N <- NVec[X]
 Rands <- rnorm(N, Mean, SD)
 saveMean[X] <- mean(Rands)
 }
 lines(NVec, saveMean, ylim=c(0, max(saveMean)*1.1))
}

I’ve saved this file as “ExampleRCode.R” in a new directory “~/Documents/ExampleGitProject”.

We are quite proud of this little piece of R code, but we realise that there may be ways to improve the code and it will likely change in the future.  Let’s create a Git repository and take a ‘snapshot’ of the R code so that this version is securely stored and we can go ahead and experiment without fear of loosing or overwriting all of our hard work.

Initialise a repository, track and commit

In the Git Bash command window, type “cd documents/examplegitproject” (without quotations) to navigate to the directory where the R code is stored.  If you are using different R code or an existing R project, replace the file path in the “cd” (change directory) command with the appropriate directory.GitBash2

To initialise a Git repository in this directory we simply need to type:

git init

You may notice that a ‘hidden’ directory “.git” has now been created inside your working directory. This is where all the Git repository information is stored, and it is hidden (at least on Windows OS) because you don’t should not modify the contents.

At the moment the Git repository is empty, we have created the skeleton of the repository but have not added any contents. Let’s take a snapshot of our R script:

git add ExampleRCode.r
git commit -m 'initial project version'

In the first line we added our R code to the repository; we told Git that we want to track this file. You make need to replace the filename with the relevant details for your code. Also bear in mind that Git is case-sensitive. If you have multiple files, you could add all R script files in the working directory by typing:

$git add *.r

In the second line of the above code we are ‘committing’ the tracked file to the repository. You can think of this as taking a ‘snapshot’ of the file. The second part of the code (inside the quotations) is a short comment which describes what is being committed/saved. This description is very useful for looking back at the different changes you have made to your code.

Well that was easy! You have now created your first Git repository with tracked files and an initial commit.

Modifying your files

Great, so we have a piece of R code that examines the effect of sample size on the standard error of the mean.  We’ve also added this code to our first Git repository.  But, although we are happy that our R code is working, we have a feeling that it could be improved. The use of for loops, especially nested for loops, are usually discouraged in R- they can be very slow and R is better than that!  So we want to try to modify our code to improve its efficiency.  We have a couple of options:

  1. Comment out the existing code and write new commands below the commented code.  This will work fine for a simple fix, or to quickly try something, but what if we wish to modify the code again in the future? It won’t be long before we end up with a long script with all sorts of commented-out sections.  Without a lot of detailed comments describing what we are trying, we can very easily end up with a very long and very messy script.
  2. Save a copy of our existing R script with a new name -“ExampleRCode_Old.R” for example.  If we are very clever we might even time-stamp the file name – e.g., “ExampleRCode_220415.R”.  Again, this will probably work fine at first, and with trivial examples like we are using here. But it won’t be long before we have a long list of backup files and very little idea what is different about them.
  3. The third option is to completely overwrite our existing code with our new ideas.  This might feel a bit strange and unsafe at first. Totally overwrite our existing code with new experimental stuff and save the same file? Sounds a bit risky! While we are confident that we can improve the code, we don’t want to loose all our previous work in case we can’t find a better method.  No need to worry, a ‘snapshot’ of the R code has been saved away in our Git repository and we can recover it any stage.

Let’s go ahead and replace our R script with the following code (this code is based on an example from here) :


NSim <- 100
Mean <- 100
SD <- 30
Matrix <- matrix(rnorm(NSim^2, Mean, SD), nrow=NSim)
GetAvg <- function(x) cumsum(x)/1:length(x)
Means <- apply(Matrix, 1, GetAvg)
plot(c(0,100), c(0, Mean*2), type="n", xlab="Sample size", ylab="Mean")
sapply(1:NSim, function (X) lines(1:NSim, Means[,X]))

Now let’s save the file again.GitBash3

If we check the status of our Git:

git status

we will see that Git has alerted us to that one of our tracked files (we only have one tracked file at this stage) has changed.  Git also tells us that this file has not yet been ‘staged’, which means added to the list of files to be ‘saved’ in the next commit.  Let’s stage our modified file and check the status again:

git add ExampleRCode.r
git status

Git is now telling us that the modified “ExampleRCode.r” has been staged and is ready to commit. Let’s go ahead and commit the changed file to our Git repository:

git commit -m "Replaced code with a vectorised version to improve efficiency"

The text in quotations after the -m flag is a comment that describes the major changes that we have made to the files in this commit. The Git Bash terminal should now display a short print out saying how many files were changed (1) and a brief summary of the changes to the code (here 6 insertions of new content and 15 deletions).

We have now completed our second commit to our Git repository.  You can think of this as another snapshot of our project, which we can recover at any time.

Checking the Commit History

The example that we have been using here is obviously a little contrived and simplistic.  But I hope that the power of Git is clear, especially once our project has built up to a large number of tracked files and several commits to our repository.

You can use the Git Bash terminal to check the commit history of your Git repository:

git log

The output in the terminal tells us that there has been two commits to this repository, and the comments give us an idea of what is different between the versions.  In this example so far we only have two commits, so the history may be a bit boring. But as you may imagine, this can be very valuable when you have an extensive commit history.

Where to from here?

In this very brief tutorial we have covered creating our first Git repository, tracking and committing our first file, and committing our modified file to the Git in a second commit. The Git command line is very powerful, and we have only covered a few very simple commands here. If we wish to we can revert back to a previous snapshot of our project (e.g. restore our original R script) either permanently (decide that our new way of doing things was bad and go back) or temporarily (experiment a little more with our earlier code).  The branching functions are where the real power of Git starts to become even more apparent.

We also haven’t yet covered cloning remote repositories (downloading a Git repository from a remote server [GitHub for example] and adding it to our local Git).

If you haven’t already done so, now if probably a good time to read through the Pro Git book.  You may notice that much of this tutorial was based on the Git book.  But the Git book will also go into much more detail about the many powerful capabilities of Git, with fully worked examples how to enter these commands into the Git Bash.  It may be a bit of a steep learning code, and take a little to get used to. But it will be worth it!

Once you have gained a good understanding of the Git fundamental concepts and commands by using the terminal, using Git together with RStudio or GitHub for Windows will be a breeze.

In the next section we will talk about how to use RStudio together with Git to easily track your R scripts and make sure you never loose anything again!

Advertisements
Tagged with: ,
Posted in Useful Software

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: