What is Staging Area in Git?

If you’re a software developer, you’re likely using Git on everyday basis. It’s a very useful tool that hardly anyone learns to use properly. I’d say it’s a little bit like with driving cars: we learn for a moment, pass an exam, then we don’t do anything to boost our skills. Sure, with time we gain confidence on the road, maybe gain ability to foresee problems. But how many of you have gone to additional training to learn how to tackle oversteer or understeer? Learn where the limits of your skills are? How the car actually works? It seems that driving is extremely straightforward: 3 or 2 pedals and a single wheel. But yet there are racing drivers who seem to be capable of handling car on a level no one seems to be able to reach. Oh, and then there’s Ken Block who doesn’t seem to be a mortal being to me.

OK, back to version control. Today I’d like to make you more interested in the tool you are using and show that there’s more to it than add -> commit -> push scheme.

Tracking files

The first thing you should know about git is that while each file can be in either tracked or untracked, you shouldn’t think about your changes as entire files. Let’s create an empty git repository and create an empty file inside:

 $ mkdir git-staging
 $ cd git-staging
 $ git init .
Initialized empty Git repository in /Users/wgonczaronek/Projects/blog/git-staging/.git/
 $ touch README.md

Now we can check the status:

 $ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	README.md

nothing added to commit but untracked files present (use "git add" to track)

File README.md is marked as “untracked” by git. When we add changes, git will not know about them.

 $ vim README.md
 $ cat README.md
# Header

Hello, world.

## Another header.

 $ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	README.md

nothing added to commit but untracked files present (use "git add" to track)

Although we’ve changed the contents of the file, message has not changed. When we type command: git diff, we’ll see empty contents. We can now run the suggested git add command.

 $ git add README.md
 $ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   README.md

OK, so git knows that some changes took place and suggests to commit them. However, the git diff still doesn’t show anything. Why?

Staging area

Before changes are committed, or saved to repository, they are stored in a staging area. You can view what is in there, you can run git dif --cached:

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..8d598fc
--- /dev/null
+++ b/README.md
@@ -0,0 +1,6 @@
+# Header
+
+Hello, world.
+
+## Another header.
+

There are multiple names to this place: staging, cache. Remember this when reading the docs.

What is super important about the staging area is that unlike commit, which is immutable, changes there can be easily added or deleted before the commit. Let’s assume we want to change first header to something more meaningful and delete the last line.

 $ cat README.md
# Git staging area explained

Hello, world.

 $ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   README.md

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   README.md

As you can see, README.md file is both in “to be committed” and “not staged for commit” state. Let’s run git diff.

diff --git a/README.md b/README.md
index 8d598fc..2e7e6cc 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,5 @@
-# Header
+# Git staging area explained

 Hello, world.

-## Another header.

(END)

Why do we see those changes now? Because git was told to track a file. Now it sees that there are changes it doesn’t have in staging area or in any known commits. We can add them to the staging area to create a single, nice commit. But instead of running git add README.md, we’ll patch the existing cache/staging area:

       -p, --patch
           Interactively choose hunks of patch between the index and the work
           tree and add them to the index. This gives the user a chance to
           review the difference before adding modified contents to the index.

           This effectively runs add --interactive, but bypasses the initial
           command menu and directly jumps to the patch subcommand. See
           "Interactive mode" for details.

What is a “hunk”? Well, think of it as a set of changed lines.

 $ git add -p
diff --git a/README.md b/README.md
index 8d598fc..2e7e6cc 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,5 @@
-# Header
+# Git staging area explained

 Hello, world.

-## Another header.

(1/1) Stage this hunk [y,n,q,a,d,s,e,?]?

We have multiple options here. We can get help by typing question mark.

y - stage this hunk
n - do not stage this hunk
q - quit; do not stage this hunk or any of the remaining ones
a - stage this hunk and all later hunks in the file
d - do not stage this hunk or any of the later hunks in the file
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help

I want only to delete the second header and add the changes in the first one to a different commit. I’ll use s to split this hunk.

(1/1) Stage this hunk [y,n,q,a,d,s,e,?]? s
Split into 2 hunks.
@@ -1,4 +1,4 @@
-# Header
+# Git staging area explained

 Hello, world.

Now n to skip this one and y to add another one.

(1/2) Stage this hunk [y,n,q,a,d,j,J,g,/,e,?]? n
@@ -2,5 +2,4 @@

 Hello, world.

-## Another header.

(2/2) Stage this hunk [y,n,q,a,d,K,g,/,e,?]? y

Cool. Now what’s in the staging area? Using git diff --cached:

-## Another header.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..29b5945
--- /dev/null
+++ b/README.md
@@ -0,0 +1,5 @@
+# Header
+
+Hello, world.
+
+

Yes, that’s what we want. Let’s commit this change.

 $ git commit -m "Initial README.md file"
[master (root-commit) 4922bda] Initial README.md file
 1 file changed, 5 insertions(+)
 create mode 100644 README.md
 $ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   README.md

no changes added to commit (use "git add" and/or "git commit -a")

Changes from the staging area have disappeared and git knows that there’s something remaining. We can check it with git diff

diff --git a/README.md b/README.md
index 29b5945..2e7e6cc 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# Header
+# Git staging area explained

 Hello, world.

Exactly what we wanted. We can now shortcut the process of committing changes, since it’s all we’ve got left:

 $ git commit . -m "Changed paragrtaph"
[master 8b87405] Changed paragrtaph
 1 file changed, 1 insertion(+), 1 deletion(-)

Verify

We can now run git show to see what changes were recorded in each commit (since it requires commit reference, remember that you can use @ to refer to the HEAD and @^ to refer to the previous commit):

commit 8b874056880c11cf9702850e63362d236edced2f (HEAD -> master)
Author: gonczor <wiktor.gonczaronek@gmail.com>
Date:   Fri Sep 25 20:43:52 2020 +0200

    Changed paragrtaph

diff --git a/README.md b/README.md
index 29b5945..2e7e6cc 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# Header
+# Git staging area explained

 Hello, world.

And previous one (git show @):

commit 4922bda2c57e8ceb02d397ec2faf21e957130bc0
Author: gonczor <wiktor.gonczaronek@gmail.com>
Date:   Fri Sep 25 20:40:41 2020 +0200

    Initial README.md file

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..29b5945
--- /dev/null
+++ b/README.md
@@ -0,0 +1,5 @@
+# Header
+
+Hello, world.
+
+

Summary

The staging area is an extremely flexible mechanism which makes git a super useful tool. Since we can easily add and remove changes, we can create a clean history. Instead of having a cascade of:

  • Meaningful message
  • Changes
  • Fixes
  • More fixes
  • Typo

Or mega commits where we throw everything at once, we can create commits that are both small and coherent. This could be a gift for future us, who will be digging through project history trying to make out what we meant to create just a day before.

Additional resources