What I Did When I Found Emojis in Strings Always Lost the Last Byte

Took me a while to not only figure it out but actually fix that Emojis are encoded in a special way using UTF-8.

If things are well on your side, you’ll see a “thinking” emoji here: 🤔

I wanted to be able to enter these in text files, read them in, display them, and save the contents in a way that they are visible in the text file again. Sounds trivial, but every double-encoding trick I found on the web resulted in escaping the unicode char codes (\ud83e etc.)

Reading stuff in worked with basic NSUTF8StringEncoding, but writing not so much. Turns out the last byte of a line was stripped for some reasen. Turns out all of this was my fault. Thanks to playgrounds for giving me feedback about matching encoded data.

Emoji seem to take 7 bytes when you look at the UTF8-encoded NSData representation. In my tests, it seemed that when an emoji was the last character in a line things went broke. I had to enter characters after it to work. These characters wouldn’t be stored, but at least the emoji would be visible.

My fault was that I was padding strings to have a certain length like so:

extension String {
    func padded(length length: Int, filler: String = " ") -> String {
        return self.stringByPaddingToLength(length, withString: filler, startingAtIndex: 0)
    }
}

I was not using NSString but native Swift Strings, so I expected things to go well. But they didn’t. I don’t know for sure why this method from foundation wreaks havoc with strings containing emoji, but it sure did. At first, the fact that 1 emoji character consists of 2 UTF8-encoded unicode values at once resultet in truncating the line. Even when I increased the padding to 2x the visible character count, the last visible character was lost in the end.

I found a cheat: calculate the byte size of the given string and fill up manually:

extension String {
    func padded(length length: Int, filler: String = " ") -> String {
        let byteLength = self.lengthOfBytesUsingEncoding(NSUTF32StringEncoding) / 4
        let paddingLength = length - byteLength
        let padding = String(count: paddingLength, repeatedValue: Character(" "))
        return "\(self)\(padding)"
    }
}

Works like a charm and doesn’t use data. TableFlip beta testers will probably find more edge cases, but at least the app doesn’t crash or ruin your files.