Calculating checksums of multiple files in PowerShell

Sat
27
Jan 2024

Today I would like to share with you a small script that I developed some time ago and I use regularly since then. It calculates hashes (checksums) for multiple files and saves them to a text file. It is written in PowerShell.

A bit of background: While working with games, I often need to move large amounts of data. Packages of 150 GB or more are not uncommon. When copying, uploading, downloading them, how to make sure not a single bit has changed? A solution is obviously to calculate some checksum and compare it between the source and the destination location. If the checksums don't match, it would be beneficial to avoid transferring the entire package again. Thus, packing it into multiple files (a multi-part .7z archive) is a good idea. This, however, requires a convenient way to calculate checksums of multiple files at once.

The script

My script is actually just a single line:

$ExtractName = @{l='Name';e={Split-Path $_.Path -Leaf}}; Get-FileHash -Path INPUT_MASK | Select-Object -Property Hash, $ExtractName > OUTPUT_FILE

To use it:

  1. Open PowerShell console.
  2. Go to the directory with your archives to hash.
  3. Paste the command provided above. Before pressing ENTER:
    1. Replace "INPUT_MASK" with the mask of your files to hash. For example, if the archive files are named "Archive.7z.001", "Archive.7z.002", etc., you can type in "Archive.7z.*".
    2. Replace "OUTPUT_FILE" with the name or path of the output file to be created.
  4. Hit ENTER.

Example PowerShell session:

PS C:\Users\Adam Sawicki> cd E:\tmp\checksum_test\
PS E:\tmp\checksum_test> $ExtractName = @{l='Name';e={Split-Path $_.Path -Leaf}}; Get-FileHash -Path Archive.7z.* | Select-Object -Property Hash, $ExtractName > Checksums.txt
PS E:\tmp\checksum_test>

If input files are large, it may take few minutes to execute. After it is complete, the output file "Checksums.txt" may look like this:

Hash                                                             Name          
----                                                             ----          
CBBABFB5529ACFB6AD67502F37444B9273A9B5BB7AF70EFA0FF1F1EC99B70895 Archive.7z.001
185D73ECBCECB9302981C97D0DDFC4B96198103436F23DB593EA9BAFBF997DAC Archive.7z.002
086640842CC34114B898D2E19270DCE427AC89D64BCD9E8E3D8D955D69588402 Archive.7z.003
BE536C66854530236DA924B1CAED44D0880D28AAA66420F6EBE5F363435BEB4F Archive.7z.004

You can then execute the same script on the destination machine of your transfer and compare files and checksums to make sure they match.

Warning

There is one caveat with this script: If you enter just "*" as the input mask (meaning all files) and if you also specify the output file in the same directory, the script will create and start writing the output file, while also trying to hash it as an input, which will result in an error.:

PS E:\tmp\checksum_test> $ExtractName = @{l='Name';e={Split-Path $_.Path -Leaf}}; Get-FileHash -Path * | Select-Object -Property Hash, $ExtractName > Checksums.txt
Get-FileHash : The file 'E:\tmp\checksum_test\Checksums.txt' cannot be read: The process cannot access the file
'E:\tmp\checksum_test\Checksums.txt' because it is being used by another process.
At line:1 char:58
+ ... {l='Name';e={Split-Path $_.Path -Leaf}}; Get-FileHash -Path * | Selec ...
+                                              ~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ReadError: (E:\tmp\checksum_test\Checksums.txt:PSObject) [Write-Error], WriteErrorExcept
   ion
  + FullyQualifiedErrorId : FileReadError,Get-FileHash

To avoid this issue, either specify a more concrete input mask, like "Archive.7z.*", or specify the output file in a different directory, such as "E:\Tmp\Checksums.txt".

How does it work?

The script is made of several parts: First, Get-FileHash command calculates and prints hashes of all the files specified as the input mask. However, its output is not ideal as it contains absolute paths to the hashed files:

PS E:\tmp\checksum_test> Get-FileHash -Path Archive.7z.*

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
SHA256          CBBABFB5529ACFB6AD67502F37444B9273A9B5BB7AF70EFA0FF1F1EC99B70895       E:\tmp\checksum_test\Archive....
SHA256          185D73ECBCECB9302981C97D0DDFC4B96198103436F23DB593EA9BAFBF997DAC       E:\tmp\checksum_test\Archive....
SHA256          086640842CC34114B898D2E19270DCE427AC89D64BCD9E8E3D8D955D69588402       E:\tmp\checksum_test\Archive....
SHA256          BE536C66854530236DA924B1CAED44D0880D28AAA66420F6EBE5F363435BEB4F       E:\tmp\checksum_test\Archive....

These locations will likely be different on the source and target machine of our copy, so we want to extract only file name, without the path. This is the purpose of Split-Path $_.Path -Leaf. Split-Path coverts a string, extracting only part of a path. The -Leaf parameter indicates that we are interested in the last part of the path, which is the file name.

The rest of the script involves processing a sequence of objects and their properties. As you can see in the listing above, original Get-FileHash command returns a table with columns: Algorithm, Hash, Path. We want to output only the hash and file name, so we filter object properties (Select-Object command) while also applying a custom property that has a name: l='Name' and evaluates its value as an expression: e={Split-Path $_.Path -Leaf} to extract only the file name.

Other solutions

Of course, this script is only one of many possible solutions to this problem. I am sure it could be equally easy implemented in Python, maybe even in old good .bat file.

For my personal use, I would probably prefer Total Commander. This is a file manager for Windows that I love and I use every day. It also supports calculating and comparing file checksums as: Files > Create Checksum File(s)..., Files > Verify Checksums (from checksum files).

Thoughts about PowerShell

Making this script was an exercise for me in learning and using PowerShell. I like this language a lot. I must admit I don't use it very often. For more complex scripts, I rather choose Python. If I used it more, I would probably find some reasons to complain. But I completed Udemy course "Advanced Scripting & Tool Making using Windows PowerShell" and I think the idea behind this language is just brilliant. PowerShell offers several distinct features that I appreciate:

1. On one hand, it is a .NET language (like C#), so we can write functions, classes, use all its rich type system, standard library, even create GUI using Windows Forms. However, it retains a syntax reminiscent of shell languages (Command Arg1 Arg2) rather than the syntax typical of regular programming languages (Function(Arg1, Arg2)).

2. We can browse, create, modify, and remove items in the file system, as well as the Windows Registry, and possibly other locations, using the same syntax, e.g.:

PS E:\tmp\checksum_test> ls

    Directory: E:\tmp\checksum_test

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        2024-01-27      9:07     4294967296 Archive.7z.001
-a----        2024-01-27      9:06     4294967296 Archive.7z.002
-a----        2024-01-27      9:06     4294967296 Archive.7z.003
-a----        2024-01-27      9:07     2582433632 Archive.7z.004
-a----        2024-01-27      9:14            986 Checksums.txt

PS E:\tmp\checksum_test> cd HKLM:\DRIVERS\DriverDatabase
PS HKLM:\DRIVERS\DriverDatabase> ls

    Hive: HKEY_LOCAL_MACHINE\DRIVERS\DriverDatabase

Name                           Property
----                           --------
DeviceIds
DriverFiles
DriverInfFiles
DriverPackages

3. The "verb-noun" convention used in command naming, the possibility to pipeline them using the pipe operator |, and the variety of commands to process, filter, or sort the data makes it convenient to create compound commands, like the one shown in this article.

4. Data is passed as .NET objects - a sequence of objects having properties of various types - strings, numbers, booleans. I believe this is how things should have been from the very beginning of system shells. Printing data as text and parsing it back is a notorious source of bugs and security vulnerabilities, just like fixed-sized buffers, null terminators, comma separators, etc. A string should be just a string, no matter how many or what specific bytes it contains. It is when a special character starts "working" in a special way instead of being just another piece of data when bad things happen.

Comments | #powershell Share

Comments

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2024