How Git Works

paolo-perrotta-v1

Paolo Perrotta can take you to Git mastery

Welcome to this review of the Pluralsight course “How Git Works” by Paolo Perrotta.

Git Is Not What You Think

Introduction

Paolo says that although this is an advanced Git course, it should also be accessible to most Git beginners.

If you are completely new to Git you might want to see Xavier Morera’s Using Git with a GUI course first.

We are going to learn about Git internals here.

Some of the most command git commands are:

  • git add
  • git commit
  • git push
  • git pull
  • git branch
  • git checkout
  • git merge
  • git rebase

Paolo calls these “Porcelain” commands.

Then there are lower level commands such as:

  • git cat-file
  • git hash-object
  • git count-objects

Paolo calls these “Plumbing” commands.

The main point that he makes is:

“If you want to master Git, don’t worry about learning the commands. Instead, learn the model.”

Git Is an Onion

Git is layered. We can’t understand it all at once, so we need to learn it layer by layer.

Before we understand what a Distributed Revision Control System is, we can simplify by focusing on what a Revision Control System is.

However this still involves a lot of different things. We can peel off one more layer, and what we have is a “Stupid Content Tracker“. This is actually Git’s own definition of itself in its documentation!

If we peel off one more layer we are left with a Persistent Map. It maps keys to values and stores them on your disk drive.

Meet SHA1

Paolo asks what are the keys and what are the values?

The values are any sequence of bytes. These bytes are encrypted using the SHA1 algorithm.

Paolo shows us the SHA1 hash of “Apple Pie” as an example.

You can see this value for yourself using these commands on a Mac:

echo "Apple Pie" |git hash-object --stdin

Paolo says the commands are different on Windows, but what is important is the principles.

We see the resultant hashes are completely different whenever a single character changes.

Paolo explores how extremely unlikely it is for two separate objects to ever collide with each other.

Storing Things

We see the use of the high level command:

git init

A .git directory is created in the current directory and this is where all the Git objects are stored.

We look inside this directory, inside the objects subdirectory and see another subdirectory called “23”. This is the first two digits of the SHA1 hash of the content that we just saved.

We see the remaining digits are found inside of this “23” folder.

Paolo also demonstrates the cat-file command to request the content type:

git cat-file 23991897e13e47ed0adb91a0082c31c82fe0cbe5 -t

We can also us -p to retrieve the original value of “Apple Pie”.

What we have seen is how a persistent map works.

First Commit!

Now that we understand the persistent map, we can build on this to learn the next layer: the Stupid Content Tracker.

In this lesson, Paolo uses a simple example: a cookbook with menus and recipes. And of course our first recipe will be the apple pie.

Once we have a git project, we can see the contents with the git status command. We add a file to the staging area with git add

Once our files are staged we can commit:

git commit -m "First commit!"

We can look at the commits with git log

Now all of this should be familiar to most Git users. But what has happened inside the .git directory?

We find the folder with the first two digits of the commit. Using cat-filewe can view the contents of the commit.

The first line is “tree” followed by a SHA1 hash. Paolo explains this and uses cat-file on the SHA1 to view its contents.

We find the same SHA1 that we saw for the Apple Pie earlier.

Paolo illustrates the Object Database for us.

Versioning Made Easy

Paolo says now that we understand the object model, the versioning is simple to understand as well.

We add Cheesecake to our menu, staged the updated file and commit:

git commit -m "Add cake"

We can now use git log to see both commits.

If we use cat-file to see the contents of the second commit, we see that it shows the parent:

parent 11779f423b047faefe55abb3fb2911d556fba195

This is the SHA1 of our first commit. All commits, except for the first one, have a parent.

We also see that each commit has it’s own tree.

Our Git Object model now has 8 objects in total, and we can verify this with the following command:

git count-objects

Now storing complete copies of a new object when the changes are very small is not efficient. Sometimes Git will store only the differences to optimize disk space allocation.

But Paolo says those optimization details are not important and we should think in terms of the Git Object Model.

One More Thing: Annotated Tags

A tag is like a label for the current state of the project. There are two types:

  • regular
  • annotated

We can create an annotated tag like this:

git tag -a mytag -m "I love cheesecake"

Paolo shows us how to view the contents of this tag including its metadata.

We learn there are just 4 types of objects in Git:

  • Blobs
  • Trees
  • Commits
  • Annotated Tags

What Git Really Is

Paolo explains that Git is a lot like a file system. This is what we mean when we say Git is a content tracker.

This next module will explain branches.

Continue to Part 2: Branches demystified

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s