No more undefined index using PHP’s $_GET and $_POST

It’s very annoying having to check whether $_GET['somevalue'] is set before checking its value – but if you don’t, your code is going to throw a a lot of Notice: Undefined index warnings and that’s never good.

I’ve written a quick class to make every PHP developer’s life a bit easier. It’s based on CodeIgniter‘s $this->input class. You can use it as follows and never have to worry about whether the index exists or not… It will just get set to NULL if it doesn’t.

class Input {
     function get($name) {
          return isset($_GET[$name]) ? $_GET[$name] : null;

     function post($name) {
          return isset($_POST[$name]) ? $_POST[$name] : null;

     function get_post($name) {
          return $this->get($name) ? $this->get($name) : $this->post($name);
$input = new Input;

$page = $input->get('page'); // // look in $_GET
$page = $input->post('page'); // // look in $_POST
$page = $input->get_post('page'); // look in both $_GET and $_POST

MD5 C++ Class


The MD5 algorithm produces a 128 bit hash of a byte array (often text).

Don’t Use MD5

MD5 is now considered a “broken” hash and should now no longer be used in high security situations.

OK, If You Must Use MD5…

If you still wish to use MD5, for example to hash user passwords, always add a salt before hashing, to prevent a dictionary attack.

The MD5 class was derived from various C++ examples. The class is thread safe (an instance must be created for each thread) and uses no memory allocation.

In the MD5.h file, note the definition of an unsigned 32 bit integer; you may need to modify this.

typedef unsigned int MD5_UINT32;

There are 6 public functions, all named Compute, in two groups:-

  • to return the MD5 hash as an ASCII (8 bit) string
  • to return the MD5 hash as a wide (UTF-16) string

For each type above there is a Compute function that accepts a “wide” (UTF-16) string; there is an optional parameter to specify if the string should be converted to UTF-8 before hashing. If the string is known to be 8-bit (Unicode 0xff or less), set this to the default “false”.

Use the CMD5 class (files MD5.cpp and MD5.h). The CMD5 class also uses the CUnicode class (file Unicode.h) which has a single static public function.

To use, create an instance of CMD5. Do not share that instance with other threads. Creation of the instance on the stack is recommended. Each public method returns an “unsafe” pointer to either an ASCII or UTF-16 string which is contained in the class instance and is only safe to be used whilst the class remains in scope, or before it is deleted or re-used.

The classes may be downloaded here (6.1Kb zip).

The code below demonstrates the use of all 6 public functions.

#include "md5.h"

//create an instance of MD5 on the stack
//which we will re-use for each example
CMD5 mx;

unsigned char test_array[5] = { 1, 2, 3, 5, 7 };

const wchar_t* res1 = mx.Compute("test string");
// res1 is safe to use until mx is re-used
// normally this would be copied to a safe location
// or used immediately

// version using 'wide' source string. Specify whether to 
// convert the string to utf-8
// (needed if any characters above u+ff are present)
const wchar_t* res2 = mx.Compute(L"test string", false);
// version using byte array
const wchar_t* res3 = mx.Compute(test_array,5);

//methods returning 8 bit string MD5
const char* res4 = mx.Compute_8("test string");
const char* res5 = mx.Compute_8(L"test string", false);
const char* res6 = mx.Compute_8(test_array, 5);

MD5 for sencha touch

There’s no native MD5 functionality in JavaScript, but Joseph Meyers has written his own function, which accoring to this stackoverflow thread is the fastest one in town!

It didn’t take much work, but I’ve converted it into a utility class, ready to drop into your Sencha Touch projects.

Ext.define('MyApp.util.Crypto', {
singleton : true,
alternateClassName : ['Crypto'],

constructor: function(config) {

md5cycle: function(x, k) {
var me = this;
var a = x[0], b = x[1], c = x[2], d = x[3];

a = me.ff(a, b, c, d, k[0], 7, -680876936);
d = me.ff(d, a, b, c, k[1], 12, -389564586);
c = me.ff(c, d, a, b, k[2], 17,  606105819);
b = me.ff(b, c, d, a, k[3], 22, -1044525330);
a = me.ff(a, b, c, d, k[4], 7, -176418897);
d = me.ff(d, a, b, c, k[5], 12,  1200080426);
c = me.ff(c, d, a, b, k[6], 17, -1473231341);
b = me.ff(b, c, d, a, k[7], 22, -45705983);
a = me.ff(a, b, c, d, k[8], 7,  1770035416);
d = me.ff(d, a, b, c, k[9], 12, -1958414417);
c = me.ff(c, d, a, b, k[10], 17, -42063);
b = me.ff(b, c, d, a, k[11], 22, -1990404162);
a = me.ff(a, b, c, d, k[12], 7,  1804603682);
d = me.ff(d, a, b, c, k[13], 12, -40341101);
c = me.ff(c, d, a, b, k[14], 17, -1502002290);
b = me.ff(b, c, d, a, k[15], 22,  1236535329);

a =, b, c, d, k[1], 5, -165796510);
d =, a, b, c, k[6], 9, -1069501632);
c =, d, a, b, k[11], 14,  643717713);
b =, c, d, a, k[0], 20, -373897302);
a =, b, c, d, k[5], 5, -701558691);
d =, a, b, c, k[10], 9,  38016083);
c =, d, a, b, k[15], 14, -660478335);
b =, c, d, a, k[4], 20, -405537848);
a =, b, c, d, k[9], 5,  568446438);
d =, a, b, c, k[14], 9, -1019803690);
c =, d, a, b, k[3], 14, -187363961);
b =, c, d, a, k[8], 20,  1163531501);
a =, b, c, d, k[13], 5, -1444681467);
d =, a, b, c, k[2], 9, -51403784);
c =, d, a, b, k[7], 14,  1735328473);
b =, c, d, a, k[12], 20, -1926607734);

a = me.hh(a, b, c, d, k[5], 4, -378558);
d = me.hh(d, a, b, c, k[8], 11, -2022574463);
c = me.hh(c, d, a, b, k[11], 16,  1839030562);
b = me.hh(b, c, d, a, k[14], 23, -35309556);
a = me.hh(a, b, c, d, k[1], 4, -1530992060);
d = me.hh(d, a, b, c, k[4], 11,  1272893353);
c = me.hh(c, d, a, b, k[7], 16, -155497632);
b = me.hh(b, c, d, a, k[10], 23, -1094730640);
a = me.hh(a, b, c, d, k[13], 4,  681279174);
d = me.hh(d, a, b, c, k[0], 11, -358537222);
c = me.hh(c, d, a, b, k[3], 16, -722521979);
b = me.hh(b, c, d, a, k[6], 23,  76029189);
a = me.hh(a, b, c, d, k[9], 4, -640364487);
d = me.hh(d, a, b, c, k[12], 11, -421815835);
c = me.hh(c, d, a, b, k[15], 16,  530742520);
b = me.hh(b, c, d, a, k[2], 23, -995338651);

a = me.ii(a, b, c, d, k[0], 6, -198630844);
d = me.ii(d, a, b, c, k[7], 10,  1126891415);
c = me.ii(c, d, a, b, k[14], 15, -1416354905);
b = me.ii(b, c, d, a, k[5], 21, -57434055);
a = me.ii(a, b, c, d, k[12], 6,  1700485571);
d = me.ii(d, a, b, c, k[3], 10, -1894986606);
c = me.ii(c, d, a, b, k[10], 15, -1051523);
b = me.ii(b, c, d, a, k[1], 21, -2054922799);
a = me.ii(a, b, c, d, k[8], 6,  1873313359);
d = me.ii(d, a, b, c, k[15], 10, -30611744);
c = me.ii(c, d, a, b, k[6], 15, -1560198380);
b = me.ii(b, c, d, a, k[13], 21,  1309151649);
a = me.ii(a, b, c, d, k[4], 6, -145523070);
d = me.ii(d, a, b, c, k[11], 10, -1120210379);
c = me.ii(c, d, a, b, k[2], 15,  718787259);
b = me.ii(b, c, d, a, k[9], 21, -343485551);

x[0] = me.add32(a, x[0]);
x[1] = me.add32(b, x[1]);
x[2] = me.add32(c, x[2]);
x[3] = me.add32(d, x[3]);


cmn: function(q, a, b, x, s, t) {
var me = this;
a = me.add32(me.add32(a, q), me.add32(x, t));
return me.add32((a << s) | (a >>> (32 - s)), b);

ff: function (a, b, c, d, x, s, t) {
return this.cmn((b & c) | ((~b) & d), a, b, x, s, t);

gg: function(a, b, c, d, x, s, t) {
return this.cmn((b & d) | (c & (~d)), a, b, x, s, t);

hh: function(a, b, c, d, x, s, t) {
return this.cmn(b^c^d, a, b, x, s, t);

ii: function(a, b, c, d, x, s, t) {
return this.cmn(c^(b | (~d)), a, b, x, s, t);

md51: function(s) {
var me = this;
txt = '';
var n = s.length,
state = [1732584193, -271733879, -1732584194, 271733878],
for (i = 64; i <= s.length; i += 64) {
me.md5cycle(state, me.md5blk(s.substring(i - 64, i)));
s = s.substring(i - 64);
var tail = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
for (i = 0; i < s.length; i++)
tail[i >> 2] |= s.charCodeAt(i) << ((i % 4) << 3);
tail[i >> 2] |= 0x80 << ((i % 4) << 3);
if (i > 55) {
me.md5cycle(state, tail);
for (i = 0; i < 16; i++)
tail[i] = 0;
tail[14] = n * 8;
me.md5cycle(state, tail);
return state;

md5blk: function(s) {
var md5blks = [],
for (i = 0; i < 64; i += 4) {
md5blks[i >> 2] = s.charCodeAt(i)
+ (s.charCodeAt(i + 1) << 8 )
+ (s.charCodeAt(i + 2) << 16)
+ (s.charCodeAt(i + 3) << 24);
return md5blks;

hex_chr: '0123456789abcdef'.split(''),

rhex: function(n) {
var s = '',
j = 0;
for (; j < 4; j++)
s += this.hex_chr[(n >> (j * 8 + 4)) & 0x0F]
+ this.hex_chr[(n >> (j * 8)) & 0x0F];
return s;

hex: function(x) {
for (var i = 0; i < x.length; i++)
x[i] = this.rhex(x[i]);
return x.join('');

md5: function(s) {
return this.hex(this.md51(s));

add32: function(a, b) {
return (a + b) & 0xFFFFFFFF;


Include in app.js

requires: [ 'MyApp.util.Crypto']

Use like this:


Shannon-Fano Compression Explained and Demonstrated in Native PHP


This article will explain how Shannon-Fano coding works. Named after after Claude Shannon and Robert Fano, apart from run length encoding, this is probably the simplest form of lossless compression. I also include some PHP code to demonstrate compression and decompression natively.

No deep knowledge of mathematics or compression is necessary, other than a basic knowledge of binary.

About Compression

Compression is Everywhere

Just about everybody uses data compression today, probably without realising it. The DVD or  Bluray that you watch, the MP3 you enjoy, and the digital TV that is now the only type in the UK, all use compression to reduce the size of the data that is stored or sent to you. The images on web pages, and sometimes even the web pages, are also compressed.

Lossy Compression

The compression algorithms used by DVDs, Blurays, MP3s and some computer images (such as JPG files) use a form of compression called lossy simply because it does not reproduce he original data perfectly, it acheives a greater level of compression by removing the parts of the video, picture or sound that are not needed to enjoy the experience.

Lossless Compression

Lossless compression is the counterpart to lossy – the original data is returned unchanged when the compressed data is uncompressed. If you extract files from a ZIP or RAR file, these are returned to you exactly as they were before they were added to the ZIP or RAR file.

This article, and the article to follow, deal only with lossless compression.

Compressing a Short Text Example

Let’s start with a simple example. We will compress the word TATTOO. If we consider only a simple character string for the moment, TATTOO requires 6 bytes of storage.

This article will show how TATTOO can be compressed into just 2 bytes using Shannon-Fano encoding. The detailed explanation of Shannon-Fano is below, but you don’t need it yet.

When a computer reads TATTOO from the 6 bytes of storage, it simply reads from the sequential bytes; each byte contains 8 bits of binary data.

The binary representation of TATTOO is:-

01010100 01000001 01010100 01010100 01001111 01001111

..that is, 6 bytes or 48 binary bits.

If we use Shannon-Fano to encode TATTOO, it is reduced to just:-

10011010 1

…or just nine binary bits. The compressed data requires two bytes of storage (it almost fits into one).

How is it done? Here’s how.

Bits into Characters

The 9 bits 100110101 comprise the word TATTOO but unlike the uncompressed data, 8 bits does not represent a single character; the number of bits that represent each character is variable. That’s how it compresses TATTOO into just 9 bits.

You will see shortly that the 9 bit compressed data 100110101 comprises these bits to store the characters:-

BITS Letter
1 T
00 A
1 T
1 T
01 O
01 O

How does the decompression software “know” how many bits make up each character? How does it “know”, for example, that the first bit represents ‘T’, and the next 2 comprise ‘A’, as revealed in the table above? It uses a binary tree.

Binary Tree

The decompression software is supplied with a binary tree which it uses to decode the bitstream that is the compressed data.

The decoder traverses the tree for once for each compressed element (character, in this instance). This is a simple and therefore fast operation for a computer to execute.

A binary tree can be visualised by reference to the illustration below, showing the options to decode the first 3 binary digits of an encoded element.

The tree is traversed by taking the left branch at each node if the current binary digit is 1, and the right branch if it’s zero.

Using A Binary Tree to Decompress

Now we will use the actual binary tree which will be used to decompress the 9 bit compressed data sample. To decompress that data (100110101) we follow these steps. Please refer to the illustration below.

  1. Start at the Root, at the top.
  2. Read each compressed binary digit from left to right. The first digit is 1. If a digit is 1, take the left fork, and if it’s 0 take the right.
  3. Moving from Root we thus branch left because the first digit is 1.
  4. If there are no futher branches possible from our new position, and there is data at that node (a leaf node), use the letter that is at the new position, as the first decoded character. It’s T. Note that the illustration below also shows the binary bits that were used to decode this character, in this case just 1.
  5. After each character is decoded move back to Root. Read the next binary digit, which is 0. Take the right branch this time, from Root. If there are no further branches at the new position, use the letter; however this time, there are further branches.
  6. Read the next binary digit, 0. Again, move left for 1, or right for 0, so move right. At the new position you will see the numbers 00 which represents the bits you have used so far to decode this letter. We’ve arrived at a node which has no further branches but has a letter. We now have the second letter, A.
  7. Return to Root to decode the third character. The next bit is 1. Move left from Root to reach a node with no further branches and the letter T. We have character number 3.
  8. Repeat again,and once again we use the next bit which is 1; the movement down the tree gets the fourth character, T.
  9. The next bit is 0. Take the right branch from Root and then use the next bit, which is 1, and then take the next right fork to arrive the fifth letter, O.
  10. The final bits are 0 and 1, and you may have spotted that this is identical to the previous character; the sixth and final decoded character is O.
  11. We now have our complete decoded word, TATTOO, 48 bits extracted from just 9!

A Complete Compressed Message

When the source data is compressed, the software assigns the characters to be encoded, to the tree nodes, attempting to create a tree that will yield the most efficient compression of the source data.

We have already seen an example of a compressed bitstream; the complete compressed data includes:-

  1. The length of the uncompressed data
  2. Data telling how to construct the binary tree that was used to compress the data
  3. The compressed data

The items 1 and 2 are a necessary overhead that must accompany the compressed data.

Creating the Binary Tree using Shannon-Fano

The Shannon-Fano algorithm used to create the binary tree to compress and decompress is very simple.

  1. Create a empty binary tree. Set the current position to the root
  2. Create a frequency table for all elements present in the source data
  3. Sort the table by frequency so that the most common element is at the start
  4. Split the table so that the total frequencies in both parts are as close as can be. The most common symbols are in the “left” portion, the least in the “right”. You now have two parts.
  5. Work on each part. Split the part so that the total frequencies in both parts are as close as can be.
  6. Repeat 5 until the part has 2 or less symbols.
  7. Assign digits for each part; the left portion is assign 1, the right is assigned 0
  8. Repeat for all parts.
Symbol Frequency>
T 3
O 2
A 1

In this example, we can clearly bisect the symbols by frequency; the most common (T) has a total of 3. After we have divided, the common portion only has one symbol (T), so we add it to the empty tree. This leaves O and A in the remaining section, so these are added to the tree.

Compressing Data using the Binary Tree

Compression is the reverse of the decompression process explained earlier. Using a binary tree created as above, do this for each character in the uncompressed text:-

  1. Find the leaf node for the current character.
  2. Work from that node up to the root. Repeat 2 until you are at the root (recursively).
  3. Add a 1 to the final output if your move up was from the left branch, 0 if from the right. Repeat 3 until all calls at 2 are done.

There is, However, One Small Problem

Shannon-Fano is not very good. The algorithm to assign the bits to symbols does not produce the best compression results. Shannon-Fano is generally not used now; Huffman coding and other methods have replaced it.

Using Shannon-Fano (Regardless) for PHP

If you would like to use Shannon-Fano in PHP, we have prepared a PHP class which compresses and decompresses text in memory. It was created as a means to demonstrate Shannon-Fano, but it could be utilised.

Typical uses could be:-

  • Storing large text in a database BLOB in a compressed form.
  • Compressing binary data.

If you don’t wish to use PHP compression libraries, or are unable to do so, or if you are interested in compression, consider using the class.

Our PHP Class Shannon.php

The class has just four public functions:-

  • compressText which as its name implies, compresses text, producing a byte array.
  • expandText which expands a byte array that was previously compressed from text
  • compressBin which compresses a byte array, producing another byte array.
  • expandBin which expands a byte array that was previously compressed from a byte array

The code snippet below demonstrates simply the use of the class:-



$instance = new Shannon();

$text = "More ending in death, but this time it sounds like a ";
$text.= "solace after life. I lingered round them, under that ";
$text.= "benign sky; watched the moths fluttering among the ";
$text.= "heath, and hare-bells; listened to the soft wind ";
$text.= "breathing through the grass; and wondered how any one ";
$text.= "could ever imagine unquiet slumbers ";
$text.= "for the sleepers in that quiet earth.";

echo "text len=".strlen($text)." characters\n";

$enc_ar = $instance -> compressText($text);

echo "encoded len=".count($enc_ar)." bytes\n";

$org_text = $instance -> expandText($enc_ar);

    echo "decoded text matches\n";
    echo "decoded text DOES NOT match\n";


The above text (the end of Wuthering Heights) comprises 333 characters. The resulting compressed byte array is 227 bytes in length

The PHP class is available to download here.

Beyond Shanon-Fano

A forthcoming blog post will explain and demonstrate Huffman coding, a similar but more efficient method.

  • compressText which as its name implies, compresses text, producing a byte array.
  • expandText which expands a byte array that was previously compressed.

Sencha + PhoneGap for Android tutorial!

I just spent quite a while trying to get Sencha + PhoneGap to play nice and install the demo app on my Android phone, so here are the installation steps for anyone else unfortunate enough to be as clueless as me when it comes to these things!

1. Download Sencha Touch ( and extract contents to c:\wamp\www\ (or whatever web server you are using’s www directory)

2. Download and install Sencha CMD (

3. Open a command prompt at C:\wamp\www\sencha-touch-2.3.1a-commercial\touch-2.3.1

4. Run “sencha generate app YourAppName ../../YourAppFolder”

5. CD to ../../YourAppFolder

6. Run “sencha phonegap init”

7. Edit YourAppFolder/ and change:




8. Connect your android device via USB and enable USB debugging on the device.

9. Run “sencha app build -run native”

10. Done! You should see the demo app running on your device 🙂

Any questions – ask away!

Visual C++ Runtime and Static Linking Made Simple


This is a new explanation of an old topic, hoping to answer developer and user questions about the use of the Visual C++ Runtime component by Windows applications, and indeed the non-use of the component by those applications that link to Windows “statically”.

What is the Runtime C++ Component?

It’s analogous to the .NET framework or the Java runtime. It provides an environment in which C++ applications created with Visual C++ (within Visual Studio) are able to run on a PC. The applications hook into the runtime component to connect to Windows rather than carrying the code within themselves.

The component is usually installed or updated if needed, by applications that require it, but it can be downloaded at no cost from Microsoft.

Linking to the Runtime C++ component is also known as linking dynamically. Your application will use the relevant DLL which will be loaded only once, into memory, and is shared by all applications running on the computer (the code itself is shared; each instance using it enjoys its own memory space, with its own stack etc).

Advantages of Dynamic Linking

  • The application executable is significantly smaller
  • Only one copy of the relevant code is present on the machine at runtime, and can be shared by multiple applications
  • Security and other fixes are applied to the runtime with no need to update applications in order to apply the same fixes

Advantages of Static Linking

  • Simpler deployment and installation – no need to install or update the C++ runtime
  • Startup can be faster

Linking to Windows in Visual Studio

Many articles on this subject include the names of the various libraries that are used in different configurations, and the command line switches. These are not included here; we instead show the project settings to be made from within Visual Studio.

All screenshots are from Visual Studio 2008.

A Simple Demonstration Project

In order to demonstrate static an dynamic linking, I created a simple library (testlib.lib), not a DLL, which will be linked into our main program. This is the library within our solution:-

The main program is a simple Windows 32 console application (testapp.exe) that uses the above library. The application and the library form the entire solution:-

Debug or Release

When working in a debug configuration, the linking method is not normally important, provided that each project within the solution is linked to Windows in the same manner. I normally choose the default: linking dyamically to Windows.

The examples illustrated below all refer to a release configuration.

How to Link a Project Dynamically to Windows

Each project within the solution must be linked to Windows in the same manner.

In the solution explorer pane, right click on a project, then click on properties:-

The default linking method in Visual Studio 2008 (and earlier) is dynamically,which is here described as Multi-threaded DLL.

Repeat for all projects in your solution.

How to Link a Project Statically to Windows

Follow the steps illustrated below, but select Multi-threaded.

Repeat for all projects in your solution.

Linking an MFC Project

If your project uses MFC, in order to change the linking method, go tothe project property pages, general section. Set the Use of MFC item to static or dynamic:-

Understanding Linking Errors

Linking errors can be confusing, and harder to understand than comilation problems.

If the linker complains that items are already defined in LIBCMT or that something is already defined in msvcrt.lib your first action should be to verify that all projects within your solution are linked in the same manner.

Excluding Libraries

Avoid if possible.

Normally you should never need to exclude all, or specific libraries, unless you are linking to a third-party library, and that in itself can cause problems if there are conflicts.

If, for example, you are linking your prioject statically to Windows but wish to link to a third party static library (not a DLL) which has been compiled to link dynamically to Windows, you will see conflicts which can be removed by excluding a library, but this is not recommended.

Easy circular images with CSS!

Circular images are all the rage on the world wide web at the moment. Here’s how you do it.

First – It’s not essential, but this will work much better with images that are already square-shaped, so maybe do that. Nothing will fall into tiny little microscopic pieces of failure if you don’t though.

Second – the HTML.

<div class="img-circular"><img src="duck.jpg" alt="duck" /></div>

Yes, we COULD just use a single DIV and set its background image, to cut down on a bit of HTML, but if you’re pulling your images from a database, and chances are you might be – this method will cause you much less trouble!

Third (and last, actually) – the CSS.

.img-circular {
     width: 417px;
     height: 417px;
     border-radius: 50%;
     overflow: hidden;


Gapless Digital Audio Playback – One Solution

Mind That Gap

This has post has nothing whatsoever to do with developments in our T2A API, other than in that our developers like listening to digital music.

One long-recognised problem with the listening experience of an album which has been encoded as multiple mp3 files (or wma, aac etc) is that of unwanted gaps. If your preference is for a collection of separate songs, this does not affect you, but listeners of music collections where tracks segue into each other, would not want there to be any gap when listening to that album as a collection of digital audio files.

Why Gaps are Heard

The main reason is that when an mp3 or some other format lossless compression audio file is created, a short silence is created at the start and end of the track. Some audio formats include information to allow playing hardware to compensate for this, but mp3 does not.

Solving with Hardware

Some newer equipment is able to achieve gapless playback of multiple tracks, either by using  a crossfade, or by using information embedded in the audio file to allow it to compensate for any gaps at the start and end of each track, where the audio file format includes that information.

If your mp3 playing device leaves gaps in audio that should be gapless, read on.

Solving with Software – Create An Audio Book


A cumbersome but otherwise completely sucessful method is now demonstrated.

By creating an audio book containing a single file with no gaps but with chapters to denote the positions of the former individual tracks, we will achieve gapless playback, provided that we have a player that supports the chosen format, and supports the selection of chapters for playback.

Audio Book Format

For this demonstration we created an .m4b file. This is actually idential to an .m4a, which is an mpeg-4 audio file using the AAC codec. The m4b extension was created so that Apple’s iTunes software and iPod players can recognize the file as an audio book rather than a normal audio track and thus allow “bookmarking” the file

An m4b is thus best suited, as one might expect, to Apple devices and Apple software on other devices, but other software support for m4b files with chapters does exist for other devices.

We looked briefly at other formats; .wma and .aac support chapters, as does .mp3 with a later id3v2 addition. Support for these formats is poor both in terms of encoding software and hardware compatibility.

Step by Step

We chose for this demonstration a well known album in which some tracks should have no gaps; “The Dark Side of the Moon” by Pink Floyd opens with 3 gapless tracks.

Below is a screenshot of the audio book that we created, playing in Apple’sQuicktime, on a PC.

Note the chapters (which includes 2 extra ones to represent positions within original tracks).

Prepare The Full File

  1. Rip your copy of the CD to a lossless format, such as WAV. This should ensure that there are no unwanted gaps at the start or end of each track. The perfect reproduction of the CD will allow compression to the final format file; recompression of a compressed file should not be executed.
  2. Use a suitable editor to join the lossless tracks together.
  3. Play the joined tracks file, ensure there are no gaps.

You now have a single lossless file, which if it is a stereo 44k PCM audio file is about 650Mb for an hours worth of music.

Choose an Encoder

We used the multiformat video and audio encoder XMedia Recode is a free application. It supports the .m4a format with chapters. We created an m4a and then simply renamed the finished file to m4b.

One alternative means to add the chapters is mp4box which we have not tested.

Create the m4a

Load the complete file. Specify chapters by start and end time, and your chosen chapter name. You may wish, as we did with our Pink Floyd demonstration, insert some extra chapters to allow navigation to a point within a track.

Select a suitable quality for the m4a encoding; the default quality is 128Kkbps but we doubled the bandwidth to 256kbps for our m4a.

XMedia Recode will also allow you to specify title, year and other information about the audiobook.

Rename to m4b

As we have seen, m4a files are idential to m4b. When the m4a encoding is completed. rename the extension to .m4b.

The m4b file is now complete.

A short clip (27 seconds, approx 700Kb) from our m4b audiobook is available here to download. This comprises the 15 seconds before our extra chapter marking the “Breathe (reprise)” section of “Time”, and the first 12 seconds of this section.

Play with Quicktime

If you have Apple’s Quicktime application on your PC, play the m4b using QT; you should see and be able to select from the named chapters, and thus be able to select your track of choice.

Using in iTunes and with Apple Devices

Using iTunes in order to upload the m4b to an iPhone, iPod or iPad, you will note that the m4b does not appear in the “music” section – look in the “books” section. Drag it over to your connected device.

We tested our m4b on an iPhone 4S and an iPad. It works especially well on the former, allowing easy navigation to the chapters / tracks.

iTunes also facilitates the easy addition of album artwork to the m4b file.

Using with Android

Users of Android devices may choose to use a free app, The Akimbo Audiobook Player which supports m4b files with chapters. This has been tested with our m4b on a Sony Experia Z1.


This is an effective but quite cumbersome means to achieve gapless playback.

If you’re fed up with annoying gaps, have equipment that supports an audiobook format with chapters, and are dedicated enough, this approach may work for you.

Working with ISO-8859-1 and Unicode Character Sets


This article gives a brief and not too technical explanation of character encoding, and of the titular character encoding methods. I also outline how to work with the methods, how to fix some common problems and how to choose which encoding system to use.

Why is Character Encoding Important?

If a web developer includes an image in some HTML markup, he/she does not have to specify in what fomat the media was saved – the browser rendering engine will interpret that using a signature in the media file; similarly a media player will interpret a video file to discover which format the file is in.

Unfortunately character strings have no signature that allows the processing engine to automatically determine the format of the character encoding, in situations where multiple formats may be encountered, such as a web page, or if .NET or Java process external text files. The developer needs to inform the relevant engine what the character encoding format is.

When Character Encoding Goes Bad

This is a common sight on web pages:-

The price is �100 or about �120

… or the same text showing a different error:-

The price is £100 or about €120

The correct text should displayed as:-

The price is £100 or about €120

See below for a detailed explaination of the problem and the solution.

What is Character Encoding?

Character encoding is the means by which the characters are stored in a sequence (or stream) of bytes.

One Byte Per Character

The simplest format is the use of a single byte for a character giving 255 possible characters, 0 is usually the terminating character..This is sufficient to display most characters in most western languages, or most characters in any given language.

Two Bytes Per Character

If you have ever programmed in Java or .NET, you will almost certainly have encountered 2 byte (or 16 bit) character encoding, since strings are handled internally in this format. This allows the representation of 65535 characters which may initially seem to be sufficient to represent every possible character in written worldwide culture, and it usually is, but not always.


Unicode simplifies things by allowing any character to be displayed within a single and huge character encoding system, which includes thousands of characters, more than can be represented by a 16 bit character encoding.

It also provides a more space efficient format than the aforementioned 16 bit encoding scheme, the popular UTF-8, and you may also encounter  UTF-16 or even UTF-32.

For a more detailed explaination of Unicode see our earlier blog on the subject.

ISO-8859-1 Encoding

ISO-8859-1 is actually a subset of Unicode. It comprises the first 255 Unicode characters (see below for the full character set) and is also sometimes known as Latin-1 since it features most of the characters that are used by Western European languages.

(The developer should be aware that the first 127 characters are encoded identically in ISO-8859-1 and UTF-8, as a single byte).

Many web pages created by English and other Western European language speakers are still encoded in ISO-8859-1, since this is sufficient to represent any possible character that they wish to display.

ISO-8859-1 vs UTF-8

When faced with the choice of character encoding, the choice is between flexibility and storage space and simplicity.

If only ISO-8859-1 characters are to be used in a project (such as a website), then ISO-8859-1 does offer a slight benefit in terms of storage space, and therefore in the case of a web page, of download size.

Fans of the Swedish/Danish TV show The Bridge will be familiar with the events contained in this sample string:-

Saga Norén leaves Malmö and crosses the Øresund Bridge

The text above comprises 54 characters. All the characters are present within the ISO-8859-1 character set, and so the string can be stored as 54 bytes using a simple one character per byte encoding.

If however the string is stored in UTF-8, it requires 57 bytes. This is because the three non-English characters (which are outside of the lower 1-127 range) are stored in two bytes using UTF-8. There is thus a slight space advantage.

I would nevertheless choose UTF-8 to give flexibility to show any possible future characters. Unicode wins.

Web Page Character Encoding Errors Explained

Remember the incorrectly displayed web page text shown above?

Error 1 was:-

The price is �100 or about �120

Error 2 was:-

The price is £100 or about €120

What has gone wrong? Well, the first example shows what happens when text that has been encoded as ISO-8859-1 is displayed on a web page which has told the viewing web browser that the contents are encoded as UTF-8.

The characters £ and are outside of the lower range (1-127) and are therefore encoded differently in UTF-8 and ISO-8859-1.

The second example shows the opposite; text encoded as UTF-8 is displayed in a page which has informed the web browser that the contents are encoded in ISO-8859-1.

Put simply, the web page encoding information does not match the contents, and horrid errors are shown.

In order to display this correct text…

The price is £100 or about €120

.. the simple solution to both problems is to establish which encoding should be used, and then within the


…of an HTML 4 or earlier page, use

<meta http-equiv="content-type" content="text/html;
charset=utf-8" />

…to specify UTF-8 contents or

<meta http-equiv="content-type" content="text/html;
charset=iso-8859-1" />

…if the contents are ISO-8859-1.

For HTML 5 specifying the character set is simpler:-

<meta charset="utf-8" />

The above code fragments are suitable for flat HTML pages; PHP programmers would use

header("Content-Type: text/html;charset=utf-8");

and a JSP page would use

<%@ page contentType="text/html;charset=UTF-8" %>

…to show just a couple of common examples.

Working with Text Files

A simple text file, as we have seen, carries no header or signature to indicate in what encoding format the text was saved. The programmer should determine that encoding format carefully.

For example, to read an ISO-8859-1 text file containing our 54 character sentence above, in C#, you would:-

StreamReader tr = null;

   tr = new StreamReader("saga.txt",
   String testline = tr.ReadLine();


The above code will ensure that the non-English characters are read correctly into the .NET String class instance.

Reference: The ISO-8859-1 Character Set

These are the displayable characters in the ISO-8859-1 character set, with their Hexadecimal values. Characters 0x20 (space) to 0xff are shown.

Character Hex Character Hex Character Hex Character Hex
20 ! 21 22 # 23
$ 24 % 25 & 26 27
( 28 ) 29 * 2A + 2B
, 2C 2D . 2E / 2F
0 30 1 31 2 32 3 33
4 34 5 35 6 36 7 37
8 38 9 39 : 3A ; 3B
< 3C = 3D > 3E ? 3F
@ 40 A 41 B 42 C 43
D 44 E 45 F 46 G 47
H 48 I 49 J 4A K 4B
L 4C M 4D N 4E O 4F
P 50 Q 51 R 52 S 53
T 54 U 55 V 56 W 57
X 58 Y 59 Z 5A [ 5B
\ 5C ] 5D ^ 5E _ 5F
` 60 a 61 b 62 c 63
d 64 e 65 f 66 g 67
h 68 i 69 j 6A k 6B
l 6C m 6D n 6E o 6F
p 70 q 71 r 72 s 73
t 74 u 75 v 76 w 77
x 78 y 79 z 7A { 7B
| 7C } 7D ~ 7E  7F
80  81 82 ƒ 83
84 85 86 87
ˆ 88 89 Š 8A 8B
Œ 8C  8D Ž 8E  8F
 90 91 92 93
94 95 96 97
˜ 98 99 š 9A 9B
œ 9C  9D ž 9E Ÿ 9F
A0 ¡ A1 ¢ A2 £ A3
¤ A4 ¥ A5 ¦ A6 § A7
¨ A8 © A9 ª AA « AB
¬ AC ­ AD ® AE ¯ AF
° B0 ± B1 ² B2 ³ B3
´ B4 µ B5 B6 · B7
¸ B8 ¹ B9 º BA » BB
¼ BC ½ BD ¾ BE ¿ BF
À C0 Á C1 Â C2 Ã C3
Ä C4 Å C5 Æ C6 Ç C7
Ð D0 Ñ D1 Ò D2 Ó D3
Ô D4 Õ D5 Ö D6 × D7
à E0 á E1 â E2 ã E3
ä E4 å E5 æ E6 ç E7
è E8 é E9 ê EA ë EB
ì EC í ED î EE ï EF
ð F0 ñ F1 ò F2 ó F3
ô F4 õ F5 ö F6 ÷ F7
ø F8 ù F9 ú FA û FB
ü FC ý FD þ FE ÿ FF

Converting Fixed-Width Text Files To CSV in C++ (and for free)

C++ Logo

Large Padded Data

A recent data acquisition brought forth the requirement to process fixed-width text files that comprise the data. This would not have been much of a discussion point were it not for the fact that some of the files were huge – 60Gb in one case. Most of these large files comprise the space character, serving as padding for the fixed-width fields; this serves to illustrate how inefficient fixed-width text files are, but that is not the point we’re making here today.

Converting Using Existing Applications

We decided to convert each file into a CSV file, which can easily be read, edited and loaded into a database. There are applications that are able to convert massive text files to CSV, but during our brief trial of a few programs we found:-

  • One application was unreliable (crashed)
  • One application was buggy (unable to handle commas and/or quotes)
  • One application was expensive (several hundred dollars)

… and anyway, programming your own is much more fun. We created a C++ application in order to process these large files as quickly as possible.

Our C++ Application

We created a C++ command line application called texttocsv. We complied it as a Windows 32 executable, but the code is ANSI C++, used no Windows API code, uses no other libraries, and will compile for other operating systems.

texttocsv can read an 8 bit character fixed with text file (there is no support for Unicode) and quickly convert each row to CSV. It will enclose each field in quotes only where needed, and correctly escape quotes within fields.

A Short Example

Consider the following fixed-width text file (test.txt) with two fixed columns of 20 and 50 characters:-

Jupiter             A planet                 
Andromeda           A "nearby" galaxy
Sirius              A star
Eros                An asteroid, a rocky body
Titan               A moon

…we used texttocsv.exe to create the following CSV (test.csv):-

Jupiter,A planet
Andromeda,"A ""nearby"" galaxy"
Sirius,A star
Eros,"An asteroid, a rocky body"
Titan,A moon

Note that the “Andromeda” row has the quotes around “nearby” correctly escaped.

How To Use

The parameters for texttocsv are:-

  1. Destination file
  2. Source file
  3. List of column widths in the source file, separated by a comma
  4. Start column number, 0 based. This parameter is optional.

The example above was created with the following command:-

texttocsv.exe test.csv test.txt 20,50

Our C++ Source Code


The application was created in ANSI C++. We followed the RAII programming technique, creating classes whose destructors release their resources, and which cannot be created using new.

Execute Sequence

The program is entered at main, and receives the parameters as entered by the user. The paramaters are read, and any missing paramters are reported to the user. If all mandatory parameters are present, an instance of CMyProcess (which is derived from CTextToCSV) is created on the stack. The member function Process is executed, and the returned error code is checked and reported.

CTextToCSV Class

The processing is done using an instance of a class derived from CTextToCSV. The derived class should include the method Progress which is invoked for each row processed; our application simply reports to console every 10000 rows, but if the class was used in a GUI, a progress bar could be displayed. The class must be created on the stack – new is not permitted.

CTextToCSV includes some error codes in an enum named ErrCode. The project code should normally interpret the errors and display to the user, as our simple application does.

CTextToCSV will close any open files in its destructor, and the instances of the other classes used (CCharBuffer and CWidthList) free their allocated buffers in their own destructors. Each class instance is created on the stack, guaranteeing that any resources are released.

The classes include buffer overwrite protection. Memory consumption is modest and is proportional to total width of the fixed text file, and memory allocation errors (if the system is short on memory) are handled, returning the ErrCode value OutOfMem.

All memory allocated is freed by the class destructors.

The Code


//class to hold the array of column widths
class CWidthList
   unsigned int* m_Column_Width;
   unsigned int m_Num_Cols;
   unsigned int m_Total_Width;

   //prevent new from being used - force
   //any instance to be on the stack
   void * operator new (size_t);
   void * operator new[] (size_t);

   //return width of column number (zero based)
   //returns 0 if column number was invalid
   unsigned int GetWidth(unsigned int column_number)
      if(column_number>=0 && column_numberm_Column_Width[column_number]);

      return 0;

   //return total width of columns
   unsigned int GetTotalWidth(void)

   //return the number of columns
   unsigned int GetNumCols(void)

   //parse the widths string
   //returns false if out of memory
   bool ParseWidths(const char* p_widths)
      unsigned int total = 0;

      unsigned int i=0;
      unsigned int len=strlen(p_widths);

      //count number of commas in the width string

      //create the buffer for the column widths

         this->m_Column_Width=new unsigned int[this->m_Num_Cols];
      catch (std::bad_alloc)
         //out of memory for the buffer
         return false;

      //set all widths to 0
      while(i      {

      char val[8];
      int valpos=0;
      int w;
      unsigned int cur_col=0;
         else if(c>='0' && cm_Column_Width[cur_col] = w;
      total += w;

      return true;


//simple 8 bit character buffer class
class CCharBuffer

   char* m_Data;
   unsigned int m_Max_Chars;      //maximum number of
                                  //chars allowed
   unsigned int m_Pos;            //current write pos

   //prevent new from being used -
   //force any instance to be on the stack
   void * operator new (size_t);
   void * operator new[] (size_t);


         //allocate num chars plus an extra
         this->m_Max_Chars = 32;
         this->m_Data=new char[this->m_Max_Chars+1];
      catch (std::bad_alloc)
         //out of memory when creating the buffer
         //so mark the buffer as not created
         this->m_Max_Chars = 0;


      //free the allocated buffer
         delete this->m_Data;

   //return false if failed to allocate a buffer
   bool CheckSpace(const unsigned int num_chars)
      if(num_chars m_Max_Chars)
         return true;

      char* new_dest = NULL;

         //allocate num chars plus an extra
         new_dest = new char[num_chars+1];
      catch (std::bad_alloc)
         //out of memory when resizing destination buffer
         return false;

      if(this->m_Pos>0 && this->m_Data!=NULL)
         //copy the existing data into the new buffer

         delete this->m_Data;      //delete the OLD buffer
                                   //(if it existed)

      this->m_Data=new_dest;         //and use the new one

      return true;

   //specify the current position
   void SetPos(const unsigned int val)
      //ensure poos
      else if(val>=this->m_Max_Chars)

   const unsigned int GetPos(void)
      return this->m_Pos;

   //make space and add a character
   //returns false if failed to make space
   bool Add(const char c)
         return false;


      return true;

   //make space and add a string
   bool Add(const char* src,const unsigned int num_chars)
         return false;


      return true;

   //read pointer to the buffer - only valid while the
   //instance is in scope
   char* Read(void)
      return this->m_Data;

//class to process a fixed width text file into a CSV
class CTextToCSV

   //error codes
   enum ErrCode
      None = 0,

   FILE* m_Dest;            //destination (CSV) file
   FILE* m_Src;            //source text file

   CCharBuffer m_Src_Buffer;
   CCharBuffer m_Dest_Buffer;
   CWidthList m_Width;

   unsigned int m_Start_Col;   //start column (optional)

   //prevent new from being used - force any
   //instance to be on the stack
   void * operator new (size_t);
   void * operator new[] (size_t);

//private nethods

   //read field into m_Dest_Buffer
   //returns OutOfMem if failed to resize dest buffer
   ErrCode ReadField(const int curpos,const int width)
   //first, scan src to get the trimmed extents
   //and to discover if comma is present

      int start=curpos;
      int end=curpos+width;

      const char* src_buf=this->m_Src_Buffer.Read();

            //non space found
      //start and end are inclusive

      bool enclose_in_commas = false;
      int i=start;
            //insufficient space in the destination buffer
            return OutOfMem;
         //enclose in quotes and escape any double quote character

         //add opening quotes
            //insufficient space in the destination buffer
            return OutOfMem;

         //copy all characters and escape any double quote
               //simply add the character
                  //out of memory
                  return OutOfMem;

         //add closing quotes
            return OutOfMem;
      return None;

   //process each row
   //returns ErrCode (normally None)
   ErrCode ProcessRow(void)


      //if a CR is found, terminate the src buffer before it
      char* src_buf=this->m_Src_Buffer.Read();

      char* sp=strstr(src_buf,"\r");

      //pad the src buffer with spaces
      int len=strlen(src_buf);
      int pad_len=this->m_Width.GetTotalWidth()-len;
         //pad with spaces
         sp=src_buf + len;

      //read each field
      unsigned int x=0;
      int curpos=0;

      if(this->m_Start_Col>0 &&
         //specified start column is valid
         while(x < this->m_Start_Col)
            curpos += this->m_Width.GetWidth(x);

      while(x < this->m_Width.GetNumCols())
            //failed to read a field due to memory failure
            return OutOfMem;


         //add a comma UNLESS this is the last field
            //insufficient space in the destination buffer
               return OutOfMem;
            //add the width of previous column
            curpos += this->m_Width.GetWidth(x-1);





      return None;


   //close files
   void Close(void)
      //close src file

      //close dest file


   //process progress report as each row is read
   virtual void Progress(unsigned int row_num)

      this->m_Dest = NULL;
      this->m_Src = NULL;

      //close open files and free allocated buffers

   ErrCode Process(const char* p_dest_file,
                   const char* p_src_file,
                   const char* p_widths,
                   const char* p_start_col)

      //parse the widths string
         return OutOfMem;

      //ensure the src buffer has
      //sufficient space to read total width
		  //ensure src buffer min size
         return OutOfMem;

      //open source file
         //failed to open src file

         return FileNotFound;

      //open destination file
         //failed to open dest file
         return FileOpenForWriteFailed;

      //read start column number if set

      unsigned int row=0;      //row counter
         void* result=fgets(this->m_Src_Buffer.Read(),

         ErrCode err = ProcessRow();
            return err;



      //and close files and buffers

      return None;


//class derived from CTextToCSV to allow
//bespoke progress handling
class MyProcess : public CTextToCSV

   //process progress report as each row is read
   void Progress(unsigned int row_num)
         printf("Row %d\r\n",row_num);


int main(int argc, char* argv[])
   printf("TextToCSV Version (c) 2014\r\n\r\n");

   //read params
   int num_param=argc;
   if(num_param   {
      printf("dest filename (e.g. mydata.csv)\r\n");
      printf("source filename (e.g. mydata.txt\r\n");
      printf("column widths (e.g. 10,10,20,30,50\r\n");
      printf("start column position (optional, 0 based)\r\n");


   const char* src=NULL;
   const char* dest=NULL;
   const char* widths=NULL;
   const char* start_col=NULL;

   int i=1;
   while(i   {
      const char* pr=(const char*)argv[i];
      //assign each paramater
         else if(src==NULL)
            src = pr;
         else if(widths==NULL)
            widths = pr;
         else if(start_col==NULL)
            start_col = pr;

   if(src == NULL)
      printf("Missing source filename");
      return -1;
   if(dest == NULL)
      printf("Missing dest filename");
      return -1;
   if(widths == NULL)
      printf("Missing column widths");
      return -1;

   printf("Processing file %s into file %s\r\n\r\n",src,dest);

   //an instance of our class, derived from CTextToCSV
   //note that this instance is created on the stack which is simpler and
   //safer than using new and delete.
   MyProcess curpos;

   //and process the file
   MyProcess::ErrCode err = curpos.Process(dest,src,widths,start_col);

   //read error code if and print a report to console
         case MyProcess::OutOfMem:
            printf("Error: out of memory\r\n\r\n");
         case MyProcess::FileNotFound:
            printf("Error: source file not found\r\n\r\n");
         case MyProcess::FileOpenForWriteFailed:
            printf("Error: unable to open destination file\r\n\r\n");


      printf("process completed, no errors\r\n");


   return 0;