Comparing Files and Content

Comparing Files and Content

With Powershell

·

2 min read

When migrating some processing logic to a new framework/service, I wanted to compare the contents of two JSONL files with a schema like:

{"contents": [ "transaction":  {"id": 123, "location": 345, "cost": 175, "dateTime": "2023-01-01", "createDate": "2023-01-01" }]}}

The quickest and most exact way to tell if the two files are different is just to hash the two files and compare the hashes.

Get-FileHash <path_to_file1> -eq Get-FileHash <path_to_file2>

By default the Algorithm used is SHA256, but MD5 is sufficient for file integrity verification. Just don't use MD5 or SHA1 for a cryptographic operation and be aware that MD5 is vulnerable to collision/birthday attacks. A malicious file that has the same md5 hash as the official one could be generated. In this case, we created both files so not an issue.

Get-FileHash <path_to_file> -Algorithm MD5

This method also will not work for the files I'm comparing since I know that they won't be exactly identical. This is because the JSON array has values that are aggregated based on the properties of those items in the array. Apparent when comparing the files with a tool like WinMerge or Meld. The ordering would be different due to the createDate. These files are also on the order of ~200MB. Even at 1GB, the comparison wouldn't be a problem.

{
   "id":123,
   "location":345,
   "cost":175,
   "dateTime":"2023-01-01",
   "createDate":"2023-01-01"
}

If using Linux you can use diff:

diff 1.json 2.json -c

Read in the two files:

$json1 = Get-Content $path1 | ConvertFrom-Json
$json2 = Get-Content $path2 | ConvertFrom-Json

Filter and sort the loaded PSObject based on specific property. In this case we sort by id:

# Select specific properties
$j1 = $json1.contents.transaction | Select-Object -Property id, location, cost | Sort-Object {$_.id}

# Exclude specific properties
$j2 = $json2.contents.transaction | Select-Object * -ExcludeProperty dateTime, createDate | Sort-Object {$_.id}

At this point, we can run Compare-Object on each of the properties we care about. Showing differences between actual properties is a bit hard to parse visually:

Compare-Object "Me" "Me_2"
InputObject SideIndicator
----------- -------------
Me_2        =>
Me          <=

You can get a line-by-line comparison reversing the InputObject array property where index 0 is Left and index 1 is Right, but we can also just iterate through both property arrays.

Here is a function to compare two PSObjects that contain only primitive type properties and are non-nested:

References