Originally Posted by Resike
The weird part is this function strips properly too, however it brokes the string.len:
Lua Code:
print(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", ""), string.len(string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")))
Returns: "PLAYERNAME"
And i only used the "é" pattern here to properly handle some French server names like: "Chants éternels".
|
This isn't actually stripping properly, it's just broken in such a way that it looks like it's working.
é and
ê are both two-byte characters that share their first byte. When you use
é in the pattern, you're actually using
\195\169. Since Lua's string functions operate on bytes rather than UTF-8 characters, the first byte (195) matches the first byte of
ê (
\195\170) and leaves behind the second byte (170), which is invalid by itself. When WoW's print function encounters this invalid byte, it simply ignores anything after it.
This snippet escapes any bytes > 127 (the end of the ASCII-compatible section of UTF-8):
lua Code:
local stripped = string.gsub("PLAYERNAME-Aggra(Português)", "%-[%a+é'()]+", "")
local escaped = stripped:gsub(".", function(c) local b=c:byte()if b > 127 then return "\\" .. b end end)
print(escaped, #stripped) -- Output: PLAYERNAME\170s) 13