utf8rewind
|
utf8rewind
is a C library designed to extend default string handling functions in order to add support for UTF-8 encoded text. Besides providing functions to deal with UTF-8 encoded text, it also provides functions for converting to and from UTF-16 encoded text, the default on Windows.
For a full summary of the interface, please refer to the library interface.
For detailed examples showing how to use the library, please refer to the examples page.
UTF-8 encoded Unicode accounts for over 60 percent of the web. And with good reason! Because UTF-8 is completely backwards-compatible with ASCII, developers only need to change code dealing with codepoints. UTF-8 can encode the full range of Unicode codepoints in a maximum of six bytes per codepoint. However, because most text tends to be the Latin alphabet mixed with special characters, the common case is strings not much longer than pure ASCII.
UTF-16 encoding solves the same problems as UTF-8, but in a different way. UTF-16 is not backwards-compatible with ASCII, resulting in invalid codepoints being encountered when the string is treated as ASCII. As a result, all code dealing with strings must be changed in order to handle these new strings. This can be seen in the changes made in the C strings API:
Description | ASCII | UTF-16 |
---|---|---|
Get the length of a string | strlen | wcslen |
Copy a string to another | strcpy | wcscpy |
Append to a string | strcat | wcscat |
Convert to lowercase | tolower | towlower |
Converting a project to use UTF-16 after the fact is a serious endeavour that touches all code dealing with strings. On the other hand, changing existing code to use UTF-8 only deals with codepoint processing.
This project is licensed under the MIT license, a full copy of which should have been provided with the project.
Use GYP to generate a solution, like so:
tools\gyp\gyp --depth --format=msvs2010 utf8rewind.gyp
Copy 'include/utf8rewind/utf8rewind.h' and 'source/utf8rewind.c' directly into your existing solution. Make sure you specify that the source file should be compiled as C code (/TC
in Visual Studio). Include the header from your source and start using it.
After generating a solution, build and run the "tests-rewind" project. Verify that all tests pass on your system before continuing.
As a user, you can help the project in a number of ways, in order of difficulty:
Use it - Designers of a public interface often have very different ideas about usability than those actually using it. By using the library, you are helping the project spread and could potentially improve it by us taking your project into consideration when we design the API.
Spread the word - If you find utf8rewind
useful, recommend it to your friends and coworkers.
Complain - No library is perfect and utf8rewind
is no exception. If you find a fault but lack the means (time, resources, etc.) to fix it, sending complaints to the proper channels can help the project out a lot.
Write a failing test - If a feature is not working as intended, you can prove it by writing a failing test. By sending the test to us, we can make the adjustments necessary for it to pass.
Write a patch - Patches include a code change that help tests to pass. A patch must always include a set of tests that fail to pass without the patch. All patches will be reviewed and possibly cleaned up before being accepted.
For inquiries, complaints and patches, please contact {quinten}{lansu} {at} {gmail}.{com}
. Remove the brackets to get a valid e-mail address.