Logic Behind Using utf8_bin for DB Text Collation?
Posted: Thu Jan 13, 2011 6:24 pm
I'm a little confused as to why utf8_bin was selected to be used for DB columns such as first name, last name, product name, category name, etc. This marks those columns as binary text columns, which sorts those columns in typically undesired ways. Probably the worst of these cases is how lowercase text is pushed below all uppercase text (because it's doing a binary sort comparison) rather than being mixed in together. That isn't the only problem either, it also means accented characters are also not sorted appropriately either.
If you need some fair trade off between speed and usability, the typical choice (in my experience) is to go with utf8_general_ci rather than utf8_unicode_ci as it's a good middle ground where the most common of the situations above are handled correctly, but it's still faster than utf8_unicode_ci.
If you need some fair trade off between speed and usability, the typical choice (in my experience) is to go with utf8_general_ci rather than utf8_unicode_ci as it's a good middle ground where the most common of the situations above are handled correctly, but it's still faster than utf8_unicode_ci.