The last post covered details on how to install Git.
In this post we will introduce GitHub and show how to use RStudio together with Git and GitHub.
What is GitHub?
In the last few posts we have talked about the many benefits of using version control software, and covered the basics of installing and using Git. By using the Git software, and frequently ‘committing’ your changes to your Git repository (together with descriptive comments of course!) you have a series of snapshots tracking the development of your code. Reverting back to a past version of your code is a simple matter of a few commands in Git, and all your hard work is safe forever from accidental loss or overwrite.
Or is it?
What about if your hard drive crashes, or your laptop is stolen, or you accidentally drop a hot cup of coffee over your machine? (that last one happened to me!) It is always a bad idea to store all your important data in only one place. Enter GitHub.
In short, GitHub is a web-based service that hosts your Git repositories. By ‘pushing’ your local Git repository to your GitHub account you are effectively creating a copy of your entire repository onto the web server. Your code is now securely backed up on the GitHub servers, and your hard work is safe from fire, theft or some other office catastrophe.
An additional key feature of GitHub is the ability to share code and collaborate on projects. If you have a public repository (or share the details of your private repo), other users can ‘fork’ your repo to their own GitHub accounts (and Git repos on their own local machines). Collaborators can make edits or fix bugs in the code and you can merge these suggested improvements with a few simple clicks. This is a great way to collaborate on projects with others. And a fantastic way to share your code.
GitHub accounts are free to set up, and as long as you don’t mind having your code publicly available, the online repositories are also free. Private repositories are available for paid accounts. I believe that GitHub may also offer a small number of free private repositories for a limited time for students with a university email address.
Using GitHub with RStudio
Thanks to the fantastic RStudio developers, it is very easy to integrate RStudio with Git on your local machine and remote repositories stored on GitHub.
Set up RStudio
First thing to do is setup RStudio to find and access the Git software on your machine (Git should already be installed – see previous posts for instructions on how to do this). You should only need to do this once. The instructions here are for Windows OS and RStudio Version 0.98.1091. The procedure may vary on different OS and other versions of RStudio, but I expect that most of the steps will be the same.
Open up RStudio and click Tools then Global Options. Click on the bottom icon for the version control software Git/SVN. Either enter the path directly for the git executable (git.exe) or browse to find the file on your machine. Note that it is important to link to the git.exe inside the bin sub-folder. On my machine there is another git.exe located outside this directory. Do not link to this file – unless you would like to see some some rather interesting behaviour from your computer (not recommended unless you have some time to kill).
Forking a remote repo
First we need to determine which repository we would like to associate our project with. Head over to GitHub and sign into your account (or create one first if you don’t have one yet).
In this example we will be forking a repo from the MURUG GitHub account. Once you have signed into GitHub, head over to the MURUG page. Here you should see a list of repositories. New repos will be added here over time (usually after the monthly workshops) so it may be worth revisiting this page from time to time. You can also click Follow to be alerted to new updates to the MURUG GitHub account.
Here we will be forking (remember what that means?) the GGPlotMay2015 repo. This contains the code covering the graphics package ggplot, which we will be working through at this month’s MURUG meeting. Big thanks to Joe Fontaine for providing this tutorial code.
Click the GGPlotMay2015 repo and then click Fork in the upper right side of the screen. You should now be taken to your own GitHub account, and see that the GGPlotMay2015 repo has been cloned into to your account. Great stuff. You now have a complete copy of the entire repo in your own possession to do with as you wish. Next step is to get the code onto your local machine so that you can run and edit it.
There are a couple of different ways to do this. Here we will be using RStudio to access your GitHub account.
Link Project to GitHub Repo
Next step is to link your RStudio project with your Git repo. You can do this by either: 1) starting a brand new project, 2) associating your project with existing work, or 3) link your project to an existing version control repository.
Here we will follow the third option and associate our RStudio project with the existing repo that we just forked from MURUG.
Click File and New Project. You should be presented with the three options described above. We want the third option: checkout a project from a version control repository. Next you will be presented with an option to choose which version control software you are using. We want to clone from a Git repository.
In the next window will be asked to enter the details for your remote repository. Click into the GGPlotMay2015 repo in your GitHub account. On the lower right side of the screen you should a small box showing the URL to the repo. Copy and paste this into the RStudio dialogue box. I have posted a picture of what is looks like on my machine. Note that the repository URL will be different for you (i.e., linked to your GitHub account). The project directory name will be automatically populated with the repo name (you can change it if you like). The final option is to set the location of this repo on your local machine.
After clicking Create Project RStudio should download the repository from your GitHub account and create a new project in the location that you specified. The files are now on your machine and you can begin your work.
You can now begin working with the existing code, as well as add new files to the RStudio project. Let’s make a few changes to the code and commit these changes to our local Git repo.
Open the ResearchMethods_Rscript in RStudio. You will notice that there is a CSV data file in your working directory (this was downloaded from your GitHub repo). Line 18 of the R script imports this data into R. However you need to change to path to the CSV file to the location on your local machine.
Change line 18 of the code to suit your own computer. On my machine this looks like this:
MyData <- read.csv("C:/Users/Adrian/Google Drive/MurdochR/Git/GGPlotMay2015/ResMethods_Data_15Apr2015.csv")
Replace the text inside the quotations and before ResMethods_Data_15Apr2015.csv with the appropriate location.
Save the R script. You’ve now made a change to the code, and can commit this change to your repo. This edit is obviously rather minor, and you wouldn’t usually commit (take a snapshot) of your code after every minor edit.
Click the Git tab in the upper right panel of your RStudio window. You will see that there are a couple of files in the top left panel of the Git window. Here you can review the changes and decide which changes to commit to your repo. If you click the R script file, you will see the code posted in the panel below, with red highlighting indicating the old code, and green highlighting the new code that has replaced it. If you are happy with these edits, click the checkbox next to the R script file.
The file is now staged and will be included in the next commit. You can also add a commit message in the panel to the right. This is useful to describe the changes that have occurred in the file. Enter something meaningful into the commit message box (e.g “updated location of data file”) and click Commit. The change has now been committed to your local Git repo.
You can continue to commit changes to your R project as you continue your work. You can also revert back to previous versions of your code very easily. We won’t be exploring this details in this post, but it is worth spending some time exploring the options of Git both inside RStudio and using the Git command line.
Push to GitHub
The final step is to share our new code with the world! Fortunately this is very easy to do in RStudio.
Inside the Git window (click the Git tab in the upper right panel of your RStudio window if you’ve closed it already), simply click Push in the top right side of the window. You will be prompted to enter your Username and Password for your GitHub account. After this, head over to your GitHub account in your web browser. You should notice that the updated R script file has been added to your GitHub repo, along with the comment describing the changes.
If you make some significant changes to the code that you think should be incorporated into the original source file (the MURUG repo) you can also create a pull request that alerts the owners of the original repo that you have some suggested improvements for their code. We will leave this for another time.