How to recover file contents after Notepad++ crash

Notepad++ is generally a pleasure to use but it does very occasionally crash and empty whatever file you happened to be editing at the time too… Here’s where you can find a backup version to recover your file if this ever happens to you too:

C:\Users\YOUR_USERNAME\AppData\Roaming\Notepad++\backup

Note: At the time of writing I’m running Notepad++ v6.7.8.2 on Windows 7 Professional, but you’ve not got anything to lose by trying this for other versions of Notepad++/Windows.

Thanks to Indrajit on Stack Overflow for posting the solution originally!

Preventing card fraud when accepting Card Not Present (CNP) transactions

If your business accepts Card Not Present card payments (e.g. an eCommerce website) you are probably aware of the built in checks provider by your merchant services provider:  *AVS, CVV, MasterCard SecureCode / Verified by Visa and Fraud Screening.

*Brief description of built in checks

  • Address Verification Checks   – checks the numerical characters of the transactions billing address and postcode against the details held by the card issuer. (This is not widely used by non-UK cards)
  • Card Verification Value (CVV) – Checks the transactions inputted CVV against the value held by the card issuer.
  • MasterCard SecureCode / Verified by Visa  – services created by the Card Schemes to protect you and your customers.
  • Fraud Screening – your merchant provider provides a score indicating the likelihood a transaction is fraudulent, they also highlight anomalies with the transaction (e.g. transaction billing address country does not match value held by the card issuer).

There is however additional checks you can make to help avoid a fraudulent transaction.

  1. Check the customer emails address – proceed with caution with free email address like Yahoo, Hotmail or Gmail as these are more likely to result in fraud. Subscription email addresses like ‘BTConnect’ or ‘Virginmedia’ are usually safer. Or if the email address is the domain of a company website go to that domain and see if it is an established website, if it’s just a parking page the transaction is less safe. You should also check the name in the email address. Does it make sense when comparing it to the card holder name?  Checking the email address should be a part of your overall checking as many of your honest customers may use free email addresses.
  2. Is the order too good to be true? Be aware if you have an order that is a higher value than your normal orders. Also be aware if you get several orders from the same customer in a short space of time. Have a look at your statistics, how frequently do you get orders from the same customer, if you best honest customer is buying from your website once a month and someone purchases from you 3 days in a row, something might be fishy.
  3. Is it unusual in another way? Is there anything else you can think of that doesn’t match your normal customers.
  4. Check the IP address of the transaction and see where it originates from. Compare this with the billing address.  Be aware that fraudsters can use proxy IP addresses.
  5. Where possible ask customers for a land line telephone number which can be checked using Directory Enquiries (unless they are ex-directory). You can also check the supplied name and address details against the details on the Edited Electoral Roll. This is not a guarantee as it is possible to opt out of having your details published. Try ukphonebook.com people search OR the T2A API search for a person and find a residential telephone number methods.
  6.  Have a look at all the transactions occurring on your website not just the successful ones. Was the customer declined several times before they were successful? You shouldn’t immediately think its fraud as sometimes people mistype things, but you should investigate further.
  7. When you do get a fraudulent transaction or charge-back – investigate the details of the transaction and see if there are any clues to help further improve your fraud screening.

These checks can help you when you are reviewing your transactions but you may want to consider building your own solution that uses a combination of these checks against each transaction before they are submitted to protect yourself further.

Form validation made super easy with jQuery

We spend a lot of time coding both client and server-side form valdilation.

This is the routine I now always use for client-side validation, which has been refined over the years to become what I’d consider to be some pretty tight code, whilst providing good quality feedback to the user to help them complete forms quickly and with minimal confusion.

Hopefully you can learn a trick or two that you can use yourself, by looking at the code…

function errorBefore(error, $insertBefore) {
  $('<p class="error">' + error + '</p>').insertBefore($insertBefore);
}

$(function () {

  $("#my-form").submit(function (e) {

    // remove any errors from previous form submission
    $('.error', $(this)).remove();

    // get jQuery reference to form fields
    var $name = $('#name');
    var $email = $('#email');

    // validate name input
    if (!$name.val()) {
      errorBefore('Please enter your name.', $name);
    }

    // validate email input
    if (!$email.val()) {
      errorBefore('Please enter your email.', $email);
    }

    // check for errors
    if ($('.error', $(this)).length) {

      // find first error and focus on form field it relates to
      $('.error', $(this)).first().next('input, select, textarea').focus();

      // stop form submission
      e.preventDefault();
    }

  });

});

SQL, NULL and the 42nd President

Virtually anyone who has written a SQL query will have encountered NULL column items. All of the text books repeat the same sermon:-

NULL is not equal to anything, not even itself.

..which of course means that if a field is not set (i.e. is NULL) it will be ignored by a query such as:-

select name,'good' from player where score >= 60
UNION ALL
select name,'poor' from player where score < 60 ; 

At first glance the above query would appear to return all players, poor and good. If however a player’s score value is not set, the query will not return that player. If the query is modified thus:-

 select name,'good' from player where score >= 60
UNION ALL
select name,'poor' from player where score < 60
UNION ALL
select name,'unknown' from player where score is NULL
;

..all players are returned. Note the use of IS NULL to ensure that rows with an undefined score are returned.

Name Rating
Jason good
Phineas poor
Medea unknown

However it is also in a sense correct to say:-

NULL is not not equal to anything

Confused?

Consider the following simple table, holding the name, year of coming to office, and current status of the President of the United Status (or POTUS):-

CREATE TABLE IF NOT EXISTS `potus` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255),
`year` SMALLINT,
`status` varchar(20),
PRIMARY KEY (`id`)
) ENGINE=innodb DEFAULT CHARSET=utf8;

We then populate the table with the holders of that particular job over the past century:-

insert into potus (name,year,status) VALUES
('Barack Obama', 2009,'current');

insert into potus (name,year,status) VALUES
 ('George W Bush', 2001,'former');

insert into potus (name,year) VALUES
 ('Bill Clinton', 1993);

insert into potus (name,year,status) VALUES
 ('George H Bush', 1989,'former');

insert into potus (name,year,status) VALUES
('Ronald Reagan', 1981,'deceased');

insert into potus (name,year,status) VALUES
('Jimmy Carter', 1977,'former');

insert into potus (name,year,status) VALUES
 ('Gerald Ford', 1974,'deceased');

insert into potus (name,year,status) VALUES
('Richard Nixon', 1969,'deceased');

insert into potus (name,year,status) VALUES
 ('Lyndon Johnson', 1963,'deceased');

insert into potus (name,year,status) VALUES
 ('John Kennedy', 1961,'deceased');

insert into potus (name,year,status) VALUES
('Dwight Eisenhower', 1953,'deceased');

insert into potus (name,year,status) VALUES
('Harry S Truman', 1945,'deceased');

insert into potus (name,year,status) VALUES
 ('Franklin Roosevelt',1933,'deceased');

insert into potus (name,year,status) VALUES
('Herbert Hoover', 1929,'deceased');

insert into potus (name,year,status) VALUES
('Calvin Coolidge', 1923,'deceased');

insert into potus (name,year,status) VALUES
 ('Warren Harding', 1921,'deceased');

insert into potus (name,year,status) VALUES
('Woodrow Wilson', 1913,'deceased');

Keen observers will note that an error was made when inserting the 42nd President, a Mr Clinton; his current status was not inserted into the table, and is thus NULL.

The following query thus, as you would expect, fails to return Mr Clinton, given that his status is not equal to ‘current’ or ‘former’:-

select name from potus where status IN ('current','former');

However you may think that this query, to return all presidents who are not deceased, would return Mr Clinton:-

select name, year from potus
where status !='deceased'
order by year desc
;

… but it does not. Mr Clinton’s status is NULL, and so it is not not equal to ‘deceased’.
NULL will not work with any regular comparitor (equals, not equals, less than etc).

The query produces:-

Name Year
Barack Obama 2009
George W Bush 2001
George H Bush 1989
Jimmy Carter 1977

The following query returns any live presidents, plus any whose health is undefined:-

select name,year from potus where
 (status !='deceased' or status is NULL)
order by year desc
;
Name Year
Barack Obama 2009
George W Bush 2001
Bill Clinton 1993
George H Bush 1989
Jimmy Carter 1977

No more undefined index using PHP’s $_GET and $_POST

It’s very annoying having to check whether $_GET['somevalue'] is set before checking its value – but if you don’t, your code is going to throw a a lot of Notice: Undefined index warnings and that’s never good.

I’ve written a quick class to make every PHP developer’s life a bit easier. It’s based on CodeIgniter‘s $this->input class. You can use it as follows and never have to worry about whether the index exists or not… It will just get set to NULL if it doesn’t.

class Input {
     function get($name) {
          return isset($_GET[$name]) ? $_GET[$name] : null;
     }

     function post($name) {
          return isset($_POST[$name]) ? $_POST[$name] : null;
     }

     function get_post($name) {
          return $this->get($name) ? $this->get($name) : $this->post($name);
     }
}
$input = new Input;

$page = $input->get('page'); // // look in $_GET
$page = $input->post('page'); // // look in $_POST
$page = $input->get_post('page'); // look in both $_GET and $_POST

MD5 C++ Class

Introduction

The MD5 algorithm produces a 128 bit hash of a byte array (often text).

Don’t Use MD5

MD5 is now considered a “broken” hash and should now no longer be used in high security situations.

OK, If You Must Use MD5…

If you still wish to use MD5, for example to hash user passwords, always add a salt before hashing, to prevent a dictionary attack.

The MD5 class was derived from various C++ examples. The class is thread safe (an instance must be created for each thread) and uses no memory allocation.

In the MD5.h file, note the definition of an unsigned 32 bit integer; you may need to modify this.

typedef unsigned int MD5_UINT32;

There are 6 public functions, all named Compute, in two groups:-

  • to return the MD5 hash as an ASCII (8 bit) string
  • to return the MD5 hash as a wide (UTF-16) string

For each type above there is a Compute function that accepts a “wide” (UTF-16) string; there is an optional parameter to specify if the string should be converted to UTF-8 before hashing. If the string is known to be 8-bit (Unicode 0xff or less), set this to the default “false”.

Use the CMD5 class (files MD5.cpp and MD5.h). The CMD5 class also uses the CUnicode class (file Unicode.h) which has a single static public function.

To use, create an instance of CMD5. Do not share that instance with other threads. Creation of the instance on the stack is recommended. Each public method returns an “unsafe” pointer to either an ASCII or UTF-16 string which is contained in the class instance and is only safe to be used whilst the class remains in scope, or before it is deleted or re-used.

The classes may be downloaded here (6.1Kb zip).

The code below demonstrates the use of all 6 public functions.


#include "md5.h"

//create an instance of MD5 on the stack
//which we will re-use for each example
CMD5 mx;

unsigned char test_array[5] = { 1, 2, 3, 5, 7 };

const wchar_t* res1 = mx.Compute("test string");
//
// res1 is safe to use until mx is re-used
// normally this would be copied to a safe location
// or used immediately
//

// version using 'wide' source string. Specify whether to 
// convert the string to utf-8
// (needed if any characters above u+ff are present)
const wchar_t* res2 = mx.Compute(L"test string", false);
// version using byte array
const wchar_t* res3 = mx.Compute(test_array,5);

//methods returning 8 bit string MD5
const char* res4 = mx.Compute_8("test string");
const char* res5 = mx.Compute_8(L"test string", false);
const char* res6 = mx.Compute_8(test_array, 5);

MD5 for sencha touch

There’s no native MD5 functionality in JavaScript, but Joseph Meyers has written his own function, which accoring to this stackoverflow thread is the fastest one in town!

It didn’t take much work, but I’ve converted it into a utility class, ready to drop into your Sencha Touch projects.

Ext.define('MyApp.util.Crypto', {
singleton : true,
alternateClassName : ['Crypto'],

constructor: function(config) {
this.initConfig(config);
},

md5cycle: function(x, k) {
var me = this;
var a = x[0], b = x[1], c = x[2], d = x[3];

a = me.ff(a, b, c, d, k[0], 7, -680876936);
d = me.ff(d, a, b, c, k[1], 12, -389564586);
c = me.ff(c, d, a, b, k[2], 17,  606105819);
b = me.ff(b, c, d, a, k[3], 22, -1044525330);
a = me.ff(a, b, c, d, k[4], 7, -176418897);
d = me.ff(d, a, b, c, k[5], 12,  1200080426);
c = me.ff(c, d, a, b, k[6], 17, -1473231341);
b = me.ff(b, c, d, a, k[7], 22, -45705983);
a = me.ff(a, b, c, d, k[8], 7,  1770035416);
d = me.ff(d, a, b, c, k[9], 12, -1958414417);
c = me.ff(c, d, a, b, k[10], 17, -42063);
b = me.ff(b, c, d, a, k[11], 22, -1990404162);
a = me.ff(a, b, c, d, k[12], 7,  1804603682);
d = me.ff(d, a, b, c, k[13], 12, -40341101);
c = me.ff(c, d, a, b, k[14], 17, -1502002290);
b = me.ff(b, c, d, a, k[15], 22,  1236535329);

a = me.gg(a, b, c, d, k[1], 5, -165796510);
d = me.gg(d, a, b, c, k[6], 9, -1069501632);
c = me.gg(c, d, a, b, k[11], 14,  643717713);
b = me.gg(b, c, d, a, k[0], 20, -373897302);
a = me.gg(a, b, c, d, k[5], 5, -701558691);
d = me.gg(d, a, b, c, k[10], 9,  38016083);
c = me.gg(c, d, a, b, k[15], 14, -660478335);
b = me.gg(b, c, d, a, k[4], 20, -405537848);
a = me.gg(a, b, c, d, k[9], 5,  568446438);
d = me.gg(d, a, b, c, k[14], 9, -1019803690);
c = me.gg(c, d, a, b, k[3], 14, -187363961);
b = me.gg(b, c, d, a, k[8], 20,  1163531501);
a = me.gg(a, b, c, d, k[13], 5, -1444681467);
d = me.gg(d, a, b, c, k[2], 9, -51403784);
c = me.gg(c, d, a, b, k[7], 14,  1735328473);
b = me.gg(b, c, d, a, k[12], 20, -1926607734);

a = me.hh(a, b, c, d, k[5], 4, -378558);
d = me.hh(d, a, b, c, k[8], 11, -2022574463);
c = me.hh(c, d, a, b, k[11], 16,  1839030562);
b = me.hh(b, c, d, a, k[14], 23, -35309556);
a = me.hh(a, b, c, d, k[1], 4, -1530992060);
d = me.hh(d, a, b, c, k[4], 11,  1272893353);
c = me.hh(c, d, a, b, k[7], 16, -155497632);
b = me.hh(b, c, d, a, k[10], 23, -1094730640);
a = me.hh(a, b, c, d, k[13], 4,  681279174);
d = me.hh(d, a, b, c, k[0], 11, -358537222);
c = me.hh(c, d, a, b, k[3], 16, -722521979);
b = me.hh(b, c, d, a, k[6], 23,  76029189);
a = me.hh(a, b, c, d, k[9], 4, -640364487);
d = me.hh(d, a, b, c, k[12], 11, -421815835);
c = me.hh(c, d, a, b, k[15], 16,  530742520);
b = me.hh(b, c, d, a, k[2], 23, -995338651);

a = me.ii(a, b, c, d, k[0], 6, -198630844);
d = me.ii(d, a, b, c, k[7], 10,  1126891415);
c = me.ii(c, d, a, b, k[14], 15, -1416354905);
b = me.ii(b, c, d, a, k[5], 21, -57434055);
a = me.ii(a, b, c, d, k[12], 6,  1700485571);
d = me.ii(d, a, b, c, k[3], 10, -1894986606);
c = me.ii(c, d, a, b, k[10], 15, -1051523);
b = me.ii(b, c, d, a, k[1], 21, -2054922799);
a = me.ii(a, b, c, d, k[8], 6,  1873313359);
d = me.ii(d, a, b, c, k[15], 10, -30611744);
c = me.ii(c, d, a, b, k[6], 15, -1560198380);
b = me.ii(b, c, d, a, k[13], 21,  1309151649);
a = me.ii(a, b, c, d, k[4], 6, -145523070);
d = me.ii(d, a, b, c, k[11], 10, -1120210379);
c = me.ii(c, d, a, b, k[2], 15,  718787259);
b = me.ii(b, c, d, a, k[9], 21, -343485551);

x[0] = me.add32(a, x[0]);
x[1] = me.add32(b, x[1]);
x[2] = me.add32(c, x[2]);
x[3] = me.add32(d, x[3]);

},

cmn: function(q, a, b, x, s, t) {
var me = this;
a = me.add32(me.add32(a, q), me.add32(x, t));
return me.add32((a << s) | (a >>> (32 - s)), b);
},

ff: function (a, b, c, d, x, s, t) {
return this.cmn((b & c) | ((~b) & d), a, b, x, s, t);
},

gg: function(a, b, c, d, x, s, t) {
return this.cmn((b & d) | (c & (~d)), a, b, x, s, t);
},

hh: function(a, b, c, d, x, s, t) {
return this.cmn(b^c^d, a, b, x, s, t);
},

ii: function(a, b, c, d, x, s, t) {
return this.cmn(c^(b | (~d)), a, b, x, s, t);
},

md51: function(s) {
var me = this;
txt = '';
var n = s.length,
state = [1732584193, -271733879, -1732584194, 271733878],
i;
for (i = 64; i <= s.length; i += 64) {
me.md5cycle(state, me.md5blk(s.substring(i - 64, i)));
}
s = s.substring(i - 64);
var tail = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
for (i = 0; i < s.length; i++)
tail[i >> 2] |= s.charCodeAt(i) << ((i % 4) << 3);
tail[i >> 2] |= 0x80 << ((i % 4) << 3);
if (i > 55) {
me.md5cycle(state, tail);
for (i = 0; i < 16; i++)
tail[i] = 0;
}
tail[14] = n * 8;
me.md5cycle(state, tail);
return state;
},

md5blk: function(s) {
var md5blks = [],
i;
for (i = 0; i < 64; i += 4) {
md5blks[i >> 2] = s.charCodeAt(i)
+ (s.charCodeAt(i + 1) << 8 )
+ (s.charCodeAt(i + 2) << 16)
+ (s.charCodeAt(i + 3) << 24);
}
return md5blks;
},

hex_chr: '0123456789abcdef'.split(''),

rhex: function(n) {
var s = '',
j = 0;
for (; j < 4; j++)
s += this.hex_chr[(n >> (j * 8 + 4)) & 0x0F]
+ this.hex_chr[(n >> (j * 8)) & 0x0F];
return s;
},

hex: function(x) {
for (var i = 0; i < x.length; i++)
x[i] = this.rhex(x[i]);
return x.join('');
},

md5: function(s) {
return this.hex(this.md51(s));
},

add32: function(a, b) {
return (a + b) & 0xFFFFFFFF;
}

})

Include in app.js


requires: [ 'MyApp.util.Crypto']

Use like this:


alert(Crypto.md5('hello'));

Shannon-Fano Compression Explained and Demonstrated in Native PHP

Introduction

This article will explain how Shannon-Fano coding works. Named after after Claude Shannon and Robert Fano, apart from run length encoding, this is probably the simplest form of lossless compression. I also include some PHP code to demonstrate compression and decompression natively.

No deep knowledge of mathematics or compression is necessary, other than a basic knowledge of binary.

About Compression

Compression is Everywhere

Just about everybody uses data compression today, probably without realising it. The DVD or  Bluray that you watch, the MP3 you enjoy, and the digital TV that is now the only type in the UK, all use compression to reduce the size of the data that is stored or sent to you. The images on web pages, and sometimes even the web pages, are also compressed.

Lossy Compression

The compression algorithms used by DVDs, Blurays, MP3s and some computer images (such as JPG files) use a form of compression called lossy simply because it does not reproduce he original data perfectly, it acheives a greater level of compression by removing the parts of the video, picture or sound that are not needed to enjoy the experience.

Lossless Compression

Lossless compression is the counterpart to lossy – the original data is returned unchanged when the compressed data is uncompressed. If you extract files from a ZIP or RAR file, these are returned to you exactly as they were before they were added to the ZIP or RAR file.

This article, and the article to follow, deal only with lossless compression.

Compressing a Short Text Example

Let’s start with a simple example. We will compress the word TATTOO. If we consider only a simple character string for the moment, TATTOO requires 6 bytes of storage.

This article will show how TATTOO can be compressed into just 2 bytes using Shannon-Fano encoding. The detailed explanation of Shannon-Fano is below, but you don’t need it yet.

When a computer reads TATTOO from the 6 bytes of storage, it simply reads from the sequential bytes; each byte contains 8 bits of binary data.

The binary representation of TATTOO is:-

01010100 01000001 01010100 01010100 01001111 01001111

..that is, 6 bytes or 48 binary bits.

If we use Shannon-Fano to encode TATTOO, it is reduced to just:-

10011010 1

…or just nine binary bits. The compressed data requires two bytes of storage (it almost fits into one).

How is it done? Here’s how.

Bits into Characters

The 9 bits 100110101 comprise the word TATTOO but unlike the uncompressed data, 8 bits does not represent a single character; the number of bits that represent each character is variable. That’s how it compresses TATTOO into just 9 bits.

You will see shortly that the 9 bit compressed data 100110101 comprises these bits to store the characters:-

BITS Letter
1 T
00 A
1 T
1 T
01 O
01 O

How does the decompression software “know” how many bits make up each character? How does it “know”, for example, that the first bit represents ‘T’, and the next 2 comprise ‘A’, as revealed in the table above? It uses a binary tree.

Binary Tree

The decompression software is supplied with a binary tree which it uses to decode the bitstream that is the compressed data.

The decoder traverses the tree for once for each compressed element (character, in this instance). This is a simple and therefore fast operation for a computer to execute.

A binary tree can be visualised by reference to the illustration below, showing the options to decode the first 3 binary digits of an encoded element.

The tree is traversed by taking the left branch at each node if the current binary digit is 1, and the right branch if it’s zero.

Using A Binary Tree to Decompress

Now we will use the actual binary tree which will be used to decompress the 9 bit compressed data sample. To decompress that data (100110101) we follow these steps. Please refer to the illustration below.

  1. Start at the Root, at the top.
  2. Read each compressed binary digit from left to right. The first digit is 1. If a digit is 1, take the left fork, and if it’s 0 take the right.
  3. Moving from Root we thus branch left because the first digit is 1.
  4. If there are no futher branches possible from our new position, and there is data at that node (a leaf node), use the letter that is at the new position, as the first decoded character. It’s T. Note that the illustration below also shows the binary bits that were used to decode this character, in this case just 1.
  5. After each character is decoded move back to Root. Read the next binary digit, which is 0. Take the right branch this time, from Root. If there are no further branches at the new position, use the letter; however this time, there are further branches.
  6. Read the next binary digit, 0. Again, move left for 1, or right for 0, so move right. At the new position you will see the numbers 00 which represents the bits you have used so far to decode this letter. We’ve arrived at a node which has no further branches but has a letter. We now have the second letter, A.
  7. Return to Root to decode the third character. The next bit is 1. Move left from Root to reach a node with no further branches and the letter T. We have character number 3.
  8. Repeat again,and once again we use the next bit which is 1; the movement down the tree gets the fourth character, T.
  9. The next bit is 0. Take the right branch from Root and then use the next bit, which is 1, and then take the next right fork to arrive the fifth letter, O.
  10. The final bits are 0 and 1, and you may have spotted that this is identical to the previous character; the sixth and final decoded character is O.
  11. We now have our complete decoded word, TATTOO, 48 bits extracted from just 9!

A Complete Compressed Message

When the source data is compressed, the software assigns the characters to be encoded, to the tree nodes, attempting to create a tree that will yield the most efficient compression of the source data.

We have already seen an example of a compressed bitstream; the complete compressed data includes:-

  1. The length of the uncompressed data
  2. Data telling how to construct the binary tree that was used to compress the data
  3. The compressed data

The items 1 and 2 are a necessary overhead that must accompany the compressed data.

Creating the Binary Tree using Shannon-Fano

The Shannon-Fano algorithm used to create the binary tree to compress and decompress is very simple.

  1. Create a empty binary tree. Set the current position to the root
  2. Create a frequency table for all elements present in the source data
  3. Sort the table by frequency so that the most common element is at the start
  4. Split the table so that the total frequencies in both parts are as close as can be. The most common symbols are in the “left” portion, the least in the “right”. You now have two parts.
  5. Work on each part. Split the part so that the total frequencies in both parts are as close as can be.
  6. Repeat 5 until the part has 2 or less symbols.
  7. Assign digits for each part; the left portion is assign 1, the right is assigned 0
  8. Repeat for all parts.
Symbol Frequency>
T 3
O 2
A 1

In this example, we can clearly bisect the symbols by frequency; the most common (T) has a total of 3. After we have divided, the common portion only has one symbol (T), so we add it to the empty tree. This leaves O and A in the remaining section, so these are added to the tree.

Compressing Data using the Binary Tree

Compression is the reverse of the decompression process explained earlier. Using a binary tree created as above, do this for each character in the uncompressed text:-

  1. Find the leaf node for the current character.
  2. Work from that node up to the root. Repeat 2 until you are at the root (recursively).
  3. Add a 1 to the final output if your move up was from the left branch, 0 if from the right. Repeat 3 until all calls at 2 are done.

There is, However, One Small Problem

Shannon-Fano is not very good. The algorithm to assign the bits to symbols does not produce the best compression results. Shannon-Fano is generally not used now; Huffman coding and other methods have replaced it.

Using Shannon-Fano (Regardless) for PHP

If you would like to use Shannon-Fano in PHP, we have prepared a PHP class which compresses and decompresses text in memory. It was created as a means to demonstrate Shannon-Fano, but it could be utilised.

Typical uses could be:-

  • Storing large text in a database BLOB in a compressed form.
  • Compressing binary data.

If you don’t wish to use PHP compression libraries, or are unable to do so, or if you are interested in compression, consider using the class.

Our PHP Class Shannon.php

The class has just four public functions:-

  • compressText which as its name implies, compresses text, producing a byte array.
  • expandText which expands a byte array that was previously compressed from text
  • compressBin which compresses a byte array, producing another byte array.
  • expandBin which expands a byte array that was previously compressed from a byte array

The code snippet below demonstrates simply the use of the class:-


<?php

require('Shannon.php');

$instance = new Shannon();

$text = "More ending in death, but this time it sounds like a ";
$text.= "solace after life. I lingered round them, under that ";
$text.= "benign sky; watched the moths fluttering among the ";
$text.= "heath, and hare-bells; listened to the soft wind ";
$text.= "breathing through the grass; and wondered how any one ";
$text.= "could ever imagine unquiet slumbers ";
$text.= "for the sleepers in that quiet earth.";

echo "text len=".strlen($text)." characters\n";

$enc_ar = $instance -> compressText($text);

echo "encoded len=".count($enc_ar)." bytes\n";

$org_text = $instance -> expandText($enc_ar);

if(strcmp($org_text,$text)==0)
{
    echo "decoded text matches\n";
}
else
{
    echo "decoded text DOES NOT match\n";
}

?>

The above text (the end of Wuthering Heights) comprises 333 characters. The resulting compressed byte array is 227 bytes in length

The PHP class is available to download here.

Beyond Shanon-Fano

A forthcoming blog post will explain and demonstrate Huffman coding, a similar but more efficient method.

  • compressText which as its name implies, compresses text, producing a byte array.
  • expandText which expands a byte array that was previously compressed.

Sencha + PhoneGap for Android tutorial!

I just spent quite a while trying to get Sencha + PhoneGap to play nice and install the demo app on my Android phone, so here are the installation steps for anyone else unfortunate enough to be as clueless as me when it comes to these things!

1. Download Sencha Touch (http://www.sencha.com/products/touch/) and extract contents to c:\wamp\www\ (or whatever web server you are using’s www directory)

2. Download and install Sencha CMD (http://www.sencha.com/products/sencha-cmd/download)

3. Open a command prompt at C:\wamp\www\sencha-touch-2.3.1a-commercial\touch-2.3.1

4. Run “sencha generate app YourAppName ../../YourAppFolder”

5. CD to ../../YourAppFolder

6. Run “sencha phonegap init”

7. Edit YourAppFolder/phonegap.local.properties and change:

phonegap.platform=ios

to

phonegap.platform=android

8. Connect your android device via USB and enable USB debugging on the device.

9. Run “sencha app build -run native”

10. Done! You should see the demo app running on your device 🙂

Any questions – ask away!

Visual C++ Runtime and Static Linking Made Simple

Introduction

This is a new explanation of an old topic, hoping to answer developer and user questions about the use of the Visual C++ Runtime component by Windows applications, and indeed the non-use of the component by those applications that link to Windows “statically”.

What is the Runtime C++ Component?

It’s analogous to the .NET framework or the Java runtime. It provides an environment in which C++ applications created with Visual C++ (within Visual Studio) are able to run on a PC. The applications hook into the runtime component to connect to Windows rather than carrying the code within themselves.

The component is usually installed or updated if needed, by applications that require it, but it can be downloaded at no cost from Microsoft.

Linking to the Runtime C++ component is also known as linking dynamically. Your application will use the relevant DLL which will be loaded only once, into memory, and is shared by all applications running on the computer (the code itself is shared; each instance using it enjoys its own memory space, with its own stack etc).

Advantages of Dynamic Linking

  • The application executable is significantly smaller
  • Only one copy of the relevant code is present on the machine at runtime, and can be shared by multiple applications
  • Security and other fixes are applied to the runtime with no need to update applications in order to apply the same fixes

Advantages of Static Linking

  • Simpler deployment and installation – no need to install or update the C++ runtime
  • Startup can be faster

Linking to Windows in Visual Studio

Many articles on this subject include the names of the various libraries that are used in different configurations, and the command line switches. These are not included here; we instead show the project settings to be made from within Visual Studio.

All screenshots are from Visual Studio 2008.

A Simple Demonstration Project

In order to demonstrate static an dynamic linking, I created a simple library (testlib.lib), not a DLL, which will be linked into our main program. This is the library within our solution:-

The main program is a simple Windows 32 console application (testapp.exe) that uses the above library. The application and the library form the entire solution:-

Debug or Release

When working in a debug configuration, the linking method is not normally important, provided that each project within the solution is linked to Windows in the same manner. I normally choose the default: linking dyamically to Windows.

The examples illustrated below all refer to a release configuration.

How to Link a Project Dynamically to Windows

Each project within the solution must be linked to Windows in the same manner.

In the solution explorer pane, right click on a project, then click on properties:-

The default linking method in Visual Studio 2008 (and earlier) is dynamically,which is here described as Multi-threaded DLL.

Repeat for all projects in your solution.

How to Link a Project Statically to Windows

Follow the steps illustrated below, but select Multi-threaded.

Repeat for all projects in your solution.

Linking an MFC Project

If your project uses MFC, in order to change the linking method, go tothe project property pages, general section. Set the Use of MFC item to static or dynamic:-

Understanding Linking Errors

Linking errors can be confusing, and harder to understand than comilation problems.

If the linker complains that items are already defined in LIBCMT or that something is already defined in msvcrt.lib your first action should be to verify that all projects within your solution are linked in the same manner.

Excluding Libraries

Avoid if possible.

Normally you should never need to exclude all, or specific libraries, unless you are linking to a third-party library, and that in itself can cause problems if there are conflicts.

If, for example, you are linking your prioject statically to Windows but wish to link to a third party static library (not a DLL) which has been compiled to link dynamically to Windows, you will see conflicts which can be removed by excluding a library, but this is not recommended.