Simple Tools | Diff Checker

Reetam Nandi
10 min readApr 21, 2021

A tiny CLI command that checks and reports the difference between 2 files

git diff | Image from unix.stackexchange.com

A difference check is massively helpful when it comes to figuring out answers to simple questions like Are these files the same and if they are not where do they differ? (E.g. A.txt vs B.txt) to a more difficult style of the same simple question above where we try to figure out the difference between 2 versions of the same file (E.g. main.py [v1.1.1] vs main.py [v2.5.3]).

To answer the 1st Question, there are multiple tools available online, where they require you to copy-paste the contents of your files or upload the files and then it generates the diff report.
For the 2nd Question, any Version Control System (Git, Subversion, etc.) packs the diff tool which answers the question in great colour-coded detail.

But what if you are uncomfortable trusting a website or don’t have a VCS setup or both? Enter a simple diff check tool that allows you to instantly check for differences and it produces an easy to understand diff report.

My motivation to write a simple Python script came from the fact that I needed the difference between 2 sensitive files which weren’t in a VCS and it was time-consuming to check manually due to their substantial size.

Before I continue with the usage and finer details of the script, here’s the repo.

Running the Script

There are mainly 2 ways to run the script.
1. Use Python to execute main.py.
2. Use the Executable to run it directly from your Terminal.

1. Using Python

If you have Python installed on your system then, clone / download the repo and navigate inside the project folder where you can bring up your
Command Prompt / Powershell window to do the following

1. Create and Activate a virtualenv (recommended) then
2. pip install -r requirements.txt followed by
3. python .\main.py to get started

2. Using Executable

If you don’t have Python installed and still want to use it then download the following ZIP. Extract the ZIP, fire up a Command Prompt / Powershell window in the same location as the extracted ZIP and run the executable using .\main.exe to get rolling.

3. Bonus — Build your own Executable

Internet is an amazing but scary place and since Prevention is better than Cure it never hurts to be cautious about random .exe files on the internet.
So if you’re the Cautious kind, go to the executable branch of the repo.
Clone / Download it and then bring up your Command Prompt / Powershell window and do the following

1. Create and Activate a virtualenv (recommended) then
2. pip install -r requirements.txt followed by
3. python setup.py build to build your .exe file
4. Traverse through the build directory to find the main.exe file and create
3 empty files in the directory with the names file1.txt, file2.txt and diff.txt (they are the default files that the script looks for)
5. Now execute from Terminal using .\main.exe

When you are done your build directory should look something like this ZIP.

NOTE: To convert .py into .exe, I used cx_freeze and the config inside the setup.py file might require changes for OS’s other than Windows. Read cx_freeze docs for an in-depth usage guide.

Using the Script

Regardless of the method chosen to run the script you are going to get a screen that looks something like this.

Result after trying to run the script

To fulfil our needs we are going to use the compare command, to know how to use it do the following depending upon how you chose to run the Script
.\main.exe compare --help or python .\main.py compare --help

Result after running compare — help

As we see above, the help menu provides all the necessary info needed to make this work. All we need to do now is pass in the parameters required to get our output.

The ideal usage scenario looks like the following

.\main.exe compare -p <path> -c <path> -o <path>
OR
python .\main.py compare -p <path> -c <path> -o <path>
OR.\main.exe compare --prev <path> -curr <path> -out <path>
OR
python .\main.py compare --prev <path> -curr <path> -out <path>
OR any combination of the options above

Now replace <path> with the path to your files as you see fit.

The Valid way to express a file path is either by using a Fully Qualified Path or a Relative Path using . or .. , using ~ to denote a path is not valid as the Shell / Terminal is responsible for expanding ~ into a Fully Qualified Path.

The script will check if the given paths are valid and throw an error if it can’t find the file like so.

Invalid File Path error

Now, if you’ve failed to pass any of the 3 required arguments it will start a prompt that will ask for user input and if the given file doesn’t exist it’ll keep prompting until it receives a valid input like so.

Argument prompt

The file path inside square brackets is the default file that the Script will pick if the input is skipped (Press Enter to skip).

Now if the files are unequal you’ll see something like this.

Unequal Files Output
file1.txt | diff.txt | file2.txt used for comparing Unequal files as shown above

The diff.txt file above shows the difference in a very simple format where lines prefixed by - indicate lines that are present in the Previous version of the file but “removed” from the Current Version of the file and the lines prefixed by + indicate lines that are not present in the Previous version of the file but “added” in the Current Version of the file.

This should bring you up to speed on how to use the tool and what output to expect.

You can choose to stop reading here if this is all you needed to know,
The next section is a breakdown of the Script.

Technical Details

Python’s Standard Library is a collection of very powerful & useful modules.
The script to produce a diff check is a combination of 2 functions from 2 modules part of the Standard Library.

I’ll come around to the CLI part of things after I explain the core function that the script depends on.

Compare function

The compare function takes in 3 arguments that we were passing from the CLI and the first function that we need to use is the cmp() from filecmp.

cmp() takes in 3 arguments: { file A, file B, shallow(default: True) }.

The cmp() function by default tries to do a shallow comparison for the 2 files passed to it by calling os.stat() on them, os.stat returns a file descriptor and if the file descriptors match then it returns True else it does a deep comparison by comparing the contents of the files chunk by chunk until the end of file is reached and then it returns True if it matches and False otherwise.

The second function that we use is unified_diff() from difflib.

unified_diff() takes in a total of 8 arguments: { [List of strings from file A],
[List of strings from file B], “name of prev file”, “name of curr file”,
date of prev file, date of curr file, number of context lines, “line terminator”}.

A list of strings from both files is received by using file.readlines(), the names are the same as the input as seen in the code above and I didn’t feel the need to use the rest as dates seemed unnecessary and the defaults for the number of context lines(default: 3) and line terminator(default: “\n”) serve their purpose just fine.

NOTE: In the generated Diff file, context lines help set the “context” for the difference report below. E.g. The first 3 lines below are the context lines for a diff report in which the first 2 lines are simply the names that we passed on to unified_diff() and the 3rd line is to set the “context” for which lines in both the files the function is referring to and generating the report
— — C:\Users\reetam.nandi\Desktop\PyToolset\file1.txt
+++ C:\Users\reetam.nandi\Desktop\PyToolset\file2.txt
@@ -1,5 +1,5 @@
<<diff report here>>

unified_diff() compares the list of strings it received and generates the “delta” or difference as a report, that’s all there is to it at a surface level, any more deep diving into this would require me to go through the library and hence is out of scope for this article.

Technical Details — CLI

I came across a wonderful library, Click which makes building CLI with Python a breeze, it hides all the complex logic under simple to use & understandable functions. It gives you all the pieces required to build a CLI of any size, from something simple as I built to a deeply nested and complicated CLI like Git.

Before I go on about how I used Click, we need to understand what Decorators do in Python.

Decorators in an oversimplified sense are just wrapper functions, with a bit more technical depth it can be explained as a “callable that extends or modifies another callable”. Multiple articles going in great depth on Decorators and I would recommend you to start there to have a solid understanding, here are two to get started:
1. Easy to read: programiz.com
2. In-Depth: dbader.org

Another thing before we move on, just so that we are all on the same page,
A CLI command looks like the following: command [options] [arguments],
the simplest way to understand it is by seeing command as a function that takes in “n” arguments which are generally a set of inputs required for the proper execution of the function and uses options or optional arguments which modify the execution behaviour of the function.

E.g. ls -l /root, ls is a standard command used to list files and directories in Linux, here it takes in /root directory as an argument and it takes -l as an option which directs ls to output in long listing format, so to simplify it means:
Run ls for /root directory and print the [long list] formatted output.

For our scenario we don’t use any arguments but take in 3 required options, The “Technically Correct” implementation should have been 3 arguments or
2 arguments and 1 option for the output parameter as my function needs
3 parameters to execute out of which the 2 file arguments are mandatory and cannot be skipped.

The rationale behind not going for a technically correct implementation is that options in Click provide more features and flexibility than arguments.
The major feature differences being:
1. arguments don’t provide prompts for missing inputs whereas options do.
2. arguments need custom documentation/help text, unlike options where we simply pass in the help text as a parameter.
The first difference alone was enough for me to pick options over arguments
as it provides a fail-safe measure and a better end-user experience.

CLI Decorators

The decorators in the above code are explained below:

1. click.group() creates an entry point into the CLI and converts the main function into a group called “main” and allows commands and sub-commands to be specified inside it.
2. click.version_option() is used to create the --version option.
3. click.pass_context() is used to pass the Context to commands and sub-commands in the same group, it is similar to drilling props in React.
4. main.command() creates a command under the main group.
5. click.option() is used to add options to commands, most of its parameters are rather simple and don’t need further explanation, the ones that I’ll go into detail about are the type, prompt and show_default parameters.

type is used to indicate what sort of input is required from the end-user and if there are any basic sanitization or validation rules to apply before handing over the input to the function it is wrapped over.
I used click.Path() to get a verified & sanitized file path as user input.

prompt as the name suggests starts a prompt when it doesn’t receive a value for the option that we miss out on when trying to run the command.

show_default shows the fully qualified path to the default file when prompting the user for input.

This covers pretty much everything there is to know about the script.

Final Thoughts

I hope you’ve enjoyed reading this article and are in good health in these trying times.

This has been quite the learning experience for me trying to write an article on a simple script and this is also the first public article that I’ve published so any advice, recommendation or constructive criticism are all welcome.

Before I go, I have one specific person to thank who has been pushing me to do this for quite some time now and is a fantastic friend, mentor and brother to me, so here’s to you, Uddeshya Singh 🍻.

--

--

Reetam Nandi

Full Stack Developer | Reading, Learning, Building Stuff | Software Engineer @ Highradius