However, if you're ever going to dynamically execute code from the db, then you have to be careful (such as using the exec() statement.
I think that the assumption that every website must accommodate every possible name is fallacious.
If stopping XSS was a simple as finding a magic regex, a lot of us would be out of jobs.
Then you need to decide what you're trying to prevent.
Imagine trying to authenticate a user nameed "Foo'or True Or'foo" — no "dangerous" characters, but there goes your login scheme.
If all you're doing is reading and writing to the db, then properly parameterizing queries should take care of the problem.
Failure to take this into account when attempting to perform access checks based on filename can have severe consequences.
On the surface, resolving the canonical name of a file or path may appear to be a reasonably simple task to undertake.
Regarding numbers, there's only one case with an 8. characters that you can be sure wont end up in a name. You could easily run a large name list through your expression when your done and see what falls out (if any).
Since people really can be named anything nothing is safe to some extent. 1I think you're answering your own question, Skliwz, you're not going to find a regex that covers all unicode characters and prevents cross site scripting.
If its XSS you only need to stop malicious characters, and even then you need to be doing more.
If its people from entering names you don't like, then you're SOL you'll never get a regex that handles every name in every culture.
Fortunately, most modern Unix systems provide, as part of the standard C runtime, a function called is not thread-safe because it changes the current directory as it resolves the path.