When migrating some processing logic to a new framework/service, I wanted to compare the contents of two JSONL
files with a schema like:
{"contents": [ "transaction": {"id": 123, "location": 345, "cost": 175, "dateTime": "2023-01-01", "createDate": "2023-01-01" }]}}
The quickest and most exact way to tell if the two files are different is just to hash the two files and compare the hashes.
Get-FileHash <path_to_file1> -eq Get-FileHash <path_to_file2>
By default the Algorithm
used is SHA256, but MD5 is sufficient for file integrity verification. Just don't use MD5
or SHA1
for a cryptographic operation and be aware that MD5
is vulnerable to collision/birthday attacks. A malicious file that has the same md5
hash as the official one could be generated. In this case, we created both files so not an issue.
Get-FileHash <path_to_file> -Algorithm MD5
This method also will not work for the files I'm comparing since I know that they won't be exactly identical. This is because the JSON array has values that are aggregated based on the properties of those items in the array. Apparent when comparing the files with a tool like WinMerge or Meld. The ordering would be different due to the createDate
. These files are also on the order of ~200MB. Even at 1GB, the comparison wouldn't be a problem.
{
"id":123,
"location":345,
"cost":175,
"dateTime":"2023-01-01",
"createDate":"2023-01-01"
}
If using Linux you can use diff
:
diff 1.json 2.json -c
Read in the two files:
$json1 = Get-Content $path1 | ConvertFrom-Json
$json2 = Get-Content $path2 | ConvertFrom-Json
Filter and sort the loaded PSObject based on specific property. In this case we sort by id
:
# Select specific properties
$j1 = $json1.contents.transaction | Select-Object -Property id, location, cost | Sort-Object {$_.id}
# Exclude specific properties
$j2 = $json2.contents.transaction | Select-Object * -ExcludeProperty dateTime, createDate | Sort-Object {$_.id}
At this point, we can run Compare-Object on each of the properties we care about. Showing differences between actual properties is a bit hard to parse visually:
Compare-Object "Me" "Me_2"
InputObject SideIndicator
----------- -------------
Me_2 =>
Me <=
You can get a line-by-line comparison reversing the InputObject
array property where index 0 is Left
and index 1 is Right
, but we can also just iterate through both property arrays.
Here is a function to compare two PSObjects that contain only primitive type properties and are non-nested: