My friend sent a picture to our group chat complaining about the subtitles on his video.
He assumed this was due to the OCR program used to generate the subtitles. Naturally he was confused how it managed to interpret
's as two random chinese glyphs. I was interested and asked him to extract the subtitles so that we could search for more occurences.
It turns out, there were plenty of places where
's was properly written in the subtitles, but 20 places where there were two random chinese glyphs. They were not all the same two glyphs. The resulting glyphs were always consistent with what the original text was.
I have no idea what could have caused this replacement. There are plenty of words in the subtitle file that have apostrophes without a problem. I thought it might be an encoding problem, since the replacement for l was one codepoint before m.
The offset between the character for
0x6a26, same for
l and its replacement. The offset for
0x6a27 which just confused me even more.