Transformation

Transformation

PicoCTF Reverse Engineering Challenge

·

4 min read

The challenge:

I wonder what this really is... enc

Where enc is 灩捯䍔䙻ㄶ形楴獟楮獴㌴摟潦弸彥㜰㍢㐸㙽 and the given code is:

''.join([chr((ord(flag[i]) << 8) + ord(flag[i + 1])) 
    for i in range(0, len(flag), 2)])

Since this is tagged with Reverse Engineering, I'd like to think that this is the code that generated the text. Things to initially notice:

  • Since the encoded text appears to be Unicode and not ASCII.

  • Since the code contains chr and ord, there is some character code usage.

  • There is some shifting of bits.

  • You can test the code is to see if it is generating the string since all flags start with picoCTF.

  • You could use Google Translate, but it's not going to work/help. It is fun to listen to a language tailored voice speak gibberish to you for input and output.

28777 = ord(flag[0]) << 8 + ord(flag[1])
ord('p') << 8 + ord('i')
= 112 << 8 + 105
= 28672 + 105 = 28777

What the code is doing

  1. The for loop within the list comprehension generates a sequence of indexes starting at 0 and incrementing by 2 until the length of flag. In other words, just looping over the flag two characters at a time.

  2. The logic in the list comprehension starts with iterating through each character in flag and then converting the Unicode character (like 'a') to an Unicode code point (integer). It does this for both flag[i] and flag[i+1].

  3. The ord(flag[i]) << 8 section is bit shifting by 8 bits to the left.

     'a' Unicode code point:  01100001
     Shifted 'a' left by 8 bits:  0110000100000000
    

    Note that bit shifting to the left by 8 bits is equivalent to multiplying by 256 and bit shifting to the right is equivalent to dividing (specifically floor dividing //) 256.

  4. The values are then converted back to a Unicode character after those two bytes (8 bits) are added back together. Assuming that the second character is b.

     'b' Unicode code point:  01100010
     0110000100000000
     +
     01100010
     ------------
     1001000100000000
    
  5. Then the the list of characters are joined back together.

Reverse the process

We need to take each character from the enc string and split each character in half or 8 bits then ord and chr them back to original flag.

The transformation.py is broken down to show the individual steps. You can run it yourself (not sure how long this link will last).

You can zero out the lower bytes (right 8 bits) by shifting left then shifting right ((ord(enc[i])>>8)<<8).

0110000101100010 # a + b in binary form
>>
0000000001100001 # after right shift by 8 bits
<<
0110000100000000 # after left shift by 8 bits

You can also do this with ord(enc[i]) & 0xFF.

10110101100
&  11111111
------------
   10101100
  1. So a two-byte character is obtained with ord(enc[i]).

  2. We isolate the upper byte with right then left shifting (ord(enc[i])>>8)<<8

  3. By subtracting the isolated two-byte character with the upper byte we can get the lower byte: ord(enc[i]) - ((ord(enc[i])>>8)<<8) .

    1. Alternatively you could just get the lower bytes with the bitwise AND to 11111111, mentioned above.

    2. You could also not deal with any bit operation by using the string representation of the 16-bit character and slicing it: bin(two_byte_char_code)[2:].zfill(16) for the upper and lower bytes.

Flag

picoCTF{16_bits_inst34d_of_8_XXXXXXXX}

Takeaways

  • ASCII can be encoded within 256 bits, while Unicode can be 1 to 4 bytes.

  • Use 0xFF in hexadecimal in place of 11111111.

  • You could forgo byte operations by hacking it with strings and slicing/substring.

  • The flag varies by login or time or could be partially created with the first 8 character output of uuidgen. My flag was not the same as someone else who was working on the challenge.

  • This is similar to bits and bytes section of CodeSignal problems for bit manipulation.

  • What if you're not given the code that encoded the input? You could try to brute force the back to ASCII/English characters. I tried this with brute_force_transformation.py.

References