utf8rewind
 All Files Functions Typedefs Macros Pages
utf8rewind

Introduction

utf8rewind is a C library designed to extend default string handling functions in order to add support for UTF-8 encoded text. Besides providing functions to deal with UTF-8 encoded text, it also provides functions for converting to and from UTF-16 encoded text, the default on Windows.

For a full summary of the interface, please refer to the library interface.

For detailed examples showing how to use the library, please refer to the examples page.

Why UTF-8?

UTF-8 encoded Unicode accounts for over 60 percent of the web. And with good reason! Because UTF-8 is completely backwards-compatible with ASCII, developers only need to change code dealing with codepoints. UTF-8 can encode the full range of Unicode codepoints in a maximum of six bytes per codepoint. However, because most text tends to be the Latin alphabet mixed with special characters, the common case is strings not much longer than pure ASCII.

Why not UTF-16?

UTF-16 encoding solves the same problems as UTF-8, but in a different way. UTF-16 is not backwards-compatible with ASCII, resulting in invalid codepoints being encountered when the string is treated as ASCII. As a result, all code dealing with strings must be changed in order to handle these new strings. This can be seen in the changes made in the C strings API:

Description ASCII UTF-16
Get the length of a string strlen wcslen
Copy a string to another strcpy wcscpy
Append to a string strcat wcscat
Convert to lowercase tolower towlower

Converting a project to use UTF-16 after the fact is a serious endeavour that touches all code dealing with strings. On the other hand, changing existing code to use UTF-8 only deals with codepoint processing.

Licensing

This project is licensed under the MIT license, a full copy of which should have been provided with the project.

Building the project

Use GYP to generate a solution, like so:

tools\gyp\gyp --depth --format=msvs2010 utf8rewind.gyp
Note
The project has only been tested as compiling and running using Visual Studio 2010 on Windows, but GYP should generate workable output for other platforms as well.

Using the source directly

Copy 'include/utf8rewind/utf8rewind.h' and 'source/utf8rewind.c' directly into your existing solution. Make sure you specify that the source file should be compiled as C code (/TC in Visual Studio). Include the header from your source and start using it.

Running the tests

After generating a solution, build and run the "tests-rewind" project. Verify that all tests pass on your system before continuing.

Helping out

As a user, you can help the project in a number of ways, in order of difficulty:

Use it - Designers of a public interface often have very different ideas about usability than those actually using it. By using the library, you are helping the project spread and could potentially improve it by us taking your project into consideration when we design the API.

Spread the word - If you find utf8rewind useful, recommend it to your friends and coworkers.

Complain - No library is perfect and utf8rewind is no exception. If you find a fault but lack the means (time, resources, etc.) to fix it, sending complaints to the proper channels can help the project out a lot.

Write a failing test - If a feature is not working as intended, you can prove it by writing a failing test. By sending the test to us, we can make the adjustments necessary for it to pass.

Write a patch - Patches include a code change that help tests to pass. A patch must always include a set of tests that fail to pass without the patch. All patches will be reviewed and possibly cleaned up before being accepted.

Contact

For inquiries, complaints and patches, please contact {quinten}{lansu} {at} {gmail}.{com}. Remove the brackets to get a valid e-mail address.