The challenge:
I wonder what this really is... enc
Where enc
is 灩捯䍔䙻ㄶ形楴獟楮獴㌴摟潦弸彥㜰㍢㐸㙽
and the given code is:
''.join([chr((ord(flag[i]) << 8) + ord(flag[i + 1]))
for i in range(0, len(flag), 2)])
Since this is tagged with Reverse Engineering, I'd like to think that this is the code that generated the text. Things to initially notice:
Since the encoded text appears to be
Unicode
and notASCII
.Since the code contains
chr
andord
, there is some character code usage.There is some shifting of bits.
You can test the code is to see if it is generating the string since all flags start with
picoCTF
.You could use Google Translate, but it's not going to work/help. It is fun to listen to a language tailored voice speak gibberish to you for input and output.
28777 = ord(flag[0]) << 8 + ord(flag[1])
ord('p') << 8 + ord('i')
= 112 << 8 + 105
= 28672 + 105 = 28777
What the code is doing
The for loop within the list comprehension generates a sequence of indexes starting at 0 and incrementing by 2 until the length of
flag
. In other words, just looping over theflag
two characters at a time.The logic in the list comprehension starts with iterating through each character in flag and then converting the Unicode character (like 'a') to an Unicode code point (integer). It does this for both
flag[i]
andflag[i+1]
.The
ord(flag[i]) << 8
section is bit shifting by8
bits to the left.'a' Unicode code point: 01100001 Shifted 'a' left by 8 bits: 0110000100000000
Note that bit shifting to the left by 8 bits is equivalent to multiplying by 256 and bit shifting to the right is equivalent to dividing (specifically floor dividing
//
) 256.The values are then converted back to a Unicode character after those two bytes (8 bits) are added back together. Assuming that the second character is
b
.'b' Unicode code point: 01100010 0110000100000000 + 01100010 ------------ 1001000100000000
Then the the list of characters are joined back together.
Reverse the process
We need to take each character from the enc
string and split each character in half or 8 bits then ord
and chr
them back to original flag
.
The transformation.py
is broken down to show the individual steps. You can run it yourself (not sure how long this link will last).
You can zero out the lower bytes (right 8 bits) by shifting left then shifting right ((ord(enc[i])>>8)<<8)
.
0110000101100010 # a + b in binary form
>>
0000000001100001 # after right shift by 8 bits
<<
0110000100000000 # after left shift by 8 bits
You can also do this with ord(enc[i]) & 0xFF
.
10110101100
& 11111111
------------
10101100
So a two-byte character is obtained with
ord(enc[i])
.We isolate the upper byte with right then left shifting
(ord(enc[i])>>8)<<8
By subtracting the isolated two-byte character with the upper byte we can get the lower byte:
ord(enc[i]) - ((ord(enc[i])>>8)<<8)
.Alternatively you could just get the lower bytes with the bitwise
AND
to11111111
, mentioned above.You could also not deal with any bit operation by using the string representation of the 16-bit character and slicing it:
bin(two_byte_char_code)[2:].zfill(16)
for the upper and lower bytes.
Flag
picoCTF{16_bits_inst34d_of_8_XXXXXXXX}
Takeaways
ASCII
can be encoded within256
bits, whileUnicode
can be 1 to 4 bytes.Use
0xFF
in hexadecimal in place of11111111
.You could forgo byte operations by hacking it with strings and slicing/substring.
The flag varies by login or time or could be partially created with the first 8 character output of
uuidgen
. My flag was not the same as someone else who was working on the challenge.This is similar to bits and bytes section of CodeSignal problems for bit manipulation.
What if you're not given the code that encoded the input? You could try to brute force the back to ASCII/English characters. I tried this with
brute_force_transformation.py
.