A Scrum Of One: March 2011

Thursday, March 10, 2011

Never leave well enough alone…until you understand what’s going on!

We now have built a fast generator for predictably unique but ordered Guids.

Before the profiler indicated that DateTime.Now.Ticks was quite an expensive call, I suspected the process of reversing the bytes in a 64-bit integer was too unwieldy.

Having addressed the Ticks counter issue, I decided to investigate the byte-reversing operation anyway. They don’t call me a geek for nuthin’!

This is the code I had my suspicions about:

public static byte[] GetReversedBytesSlow(this Int64 value)
{
    var rgb = BitConverter.GetBytes(value);
    Array.Reverse(rgb);

    return rgb;
}

You can tell that I was looking back fondly to my C/C++ days, where we could cast the integer to an array of 8 bytes and do some pointer math!

Entering ‘unsafe’ territory…

I created a new Class Library project, and marked it as containing unsafe code.

Then I went at the unsafe code leaving caution to the winds:

public static unsafe class Convert
{
    private static byte[] GetReversedBytesFor64BitValue(byte * rgb)
    {
        return new[]
               {
                   rgb[7], rgb[6], rgb[5], rgb[4], rgb[3], rgb[2], rgb[1], rgb[0]
               };

    }

    public static byte[] GetReversedBytes(this UInt64 _this)
    {
        return GetReversedBytesFor64BitValue((byte*)&_this);
    }

    public static byte[] GetReversedBytes(this Int64 _this)
    {
        return GetReversedBytesFor64BitValue((byte*)&_this);
    }
}

Now, that felt really good to do Smile

By hand-unrolling the pointer arithmetic, I was sure that I would definitely be faster than the general reversal algorithm that Array.Reverse would use.

The proof of the pudding…

First, we need to make sure we’re doing the right thing in our unsafe speed-demon.

We write a test:

private static void EnsureArraysAreIdentical(byte[] expected, byte[] actual)
{
    Assert.AreEqual(expected.Length, actual.Length);

    for (var i=0;i<expected.Length; i++)
    {
        Assert.AreEqual(expected[i], actual[i]);
    }
}

private const long TEST_MIN = -10000000L;
private const long TEST_MAX = 10000000L;

[TestMethod]
public void TestReverseBytes64Bit()
{
    for(var l = TEST_MIN; l < TEST_MAX; l++)
    {
        EnsureArraysAreIdentical(GetReversedBytesSlow(l), l.GetReversedBytes());
    }
}

which tests both reversal methods for identical results from –10 million to +10 million, zero inclusive.

The results speak for themselves:

Now to see if we’ve actually made any difference in the performance!

Nice!

We’ve improved the performance almost 3x – it’s just that the results only become noticeable after 20 million integer reversals!

Now that’s what I call a productive afternoon!

Making it fast!

In the last post, I outlined a strategy to generate Guids in a strictly increasing order to address the issues outlined here.

While Guid.NewGuid() natively takes quite a bit of time to create a random Guid, it would be interesting to see if our implementation is slower, or faster – and to investigate ways to speed it up.

Our first run

Hmm.. that seems to be pretty good – but inspecting the code should point a steady finger at the Thread.Sleep(1) call. So how does one make the sequence increase, without the Sleep() call.

Enter Interlocked.Increment()

A strategy to ensure monotonically increasing numbers would be to initialize a counter with DateTime.Now.Ticks, and then in the generation method, to increment it in a thread-safe manner – thus:

private static long _counter;

public static Guid NewSequentialGuid()
{
    lock (LockObject)
    {
        Interlocked.CompareExchange(ref _counter, DateTime.Now.Ticks, 0);
        Interlocked.Increment(ref _counter);

        var rgb = BitConverter.GetBytes(_counter);
        Array.Reverse(rgb);

        return new Guid(_a, _b, _c, rgb);
    }
}

The tests continue to pass, assuring us that the Guids are still unique. Measuring the performance, we get:

Much better – we’ve made things around twice as fast. I would leave things as they are now – except I’m quite curious to see if I can extract some more performance out of it.

The importance of profiling

We have only five lines of code there – where do we start speeding things up?

The best way to find out would be to fire up the profiler in Visual Studio 2010 and run the program in there. That points us to:

Hmm… looks like we’re spending most of our time getting the Ticks count, even if we never use it more than once – definitely wasteful.

Why don’t we cache the Ticks count once, and just use the Interlocked.Increment call to make it unique?

private static long _counter = DateTime.Now.Ticks;

public static Guid NewSequentialGuid()
{
    lock (LockObject)
    {
        Interlocked.Increment(ref _counter);

        var rgb = BitConverter.GetBytes(_counter);
        Array.Reverse(rgb);

        return new Guid(_a, _b, _c, rgb);
    }
}

That results in

Pretty good. So let’s look at a sequence of Guids created 50ms apart.

The implication of this is that every time the OrderedGuid class is initialized, the _counter would get set, and thereafter it would be a simple incremented sequence – even though there was a time delay of 50ms between each Guid being created.

If we wanted the creation of the Guid to be more ‘random’ and be more closely tied to the Ticks count, we could keep the _counter refreshed every so often with a Timer.

private static readonly Timer Timer = new Timer(ResetCounter, null, 0, 100);

private static void ResetCounter(object arg = null)
{
    lock (LockObject)
    {
        _counter = DateTime.Now.Ticks;
    }
}

static OrderedGuid()
{
    lock (LockObject)
    {
        ResetCounter();
    }
}

This would keep the _counter up to date reasonably frequently.

As you can see, we have a good somewhat random distribution between Guids created 50ms apart.

We’ve done this without appreciably slowing things down – given the coarse granularity of the tick-counting approach to timing, all we can say is that we haven’t made things very much different by introducing the timer.

And now we have a viable, fast way of generating predictably-unique, ordered Guids.

Wednesday, March 9, 2011

The First Cut…

The first, naïve implementation of NewSequentialGuid() was:

public static Guid NewSequentialGuid()
{
    lock (LockObject)
    {
        var ticks = DateTime.Now.Ticks;
        return new Guid(_a, _b, _c, BitConverter.GetBytes(ticks));
    }
}

Exercising this code to print out 10 Guids in a tight loop yielded the following results:

Two points leap to our notice:

Several of the Guids are identical – the red circled records are identical.
When the Guids increase, they seem to be increasing at the most significant end of the last two segments of the Guid – the orange circled column is increasing downwards.

Let’s fix the first problem first by introducing a forced delay before creating the Guid, ensuring that the tick count of DateTime has a chance to increment itself.

Fixing the second one requires us to reverse the bytes of the 64-bit integer passed to the last argument to the Guid constructor.

The modified code looks thusly:

public static Guid NewSequentialGuid()
{
    lock (LockObject)
    {
        Thread.Sleep(1);

        var ticks = DateTime.Now.Ticks;
        var rgb = BitConverter.GetBytes(ticks);
        Array.Reverse(rgb);
        return new Guid(_a, _b, _c, rgb);
    }
}

which generates the following result:

Now we’re generating Guids that grow in a reasonable order, and they’re unique.

To further verify that the Guids are unique, we can write a quick test to generate a million or so Guids, put them in a list, sort the list, and ensure that no two Guids are the same.

[TestMethod]
public void TestGuidsAreUnique()
{
    var _guids = new List<Guid>(C_MAX_GUIDS);

    for (var i = 0; i < C_MAX_GUIDS; i++)
    {
        _guids.Add(OrderedGuid.NewSequentialGuid());
    }

    VerifyItemsAreUnique(from _g in _guids orderby _g.ToString() select _g);
}

or even better, with a multi core machine where there can be truly concurrent creation of Guids, we could write this, which creates a million Guids using Parallel.For() into a ConcurrentBag<T>:

[TestMethod]
public void TestGuidsAreUniqueMT()
{
    var _guids = new ConcurrentBag<Guid>();

    Parallel.For(0,
                 C_MAX_GUIDS,
                 () => 0,
                 (_, __, ___) =>
                 {
                     _guids.Add(OrderedGuid.NewSequentialGuid());
                     return _guids.Count;
                 },
                 _ => { });

    Assert.AreEqual(C_MAX_GUIDS, _guids.Count);

    VerifyItemsAreUnique(from _g in _guids orderby _g.ToString() select _g);
}

We are relieved to notice that both our tests pass. So we can now use this OrderedGuid class in our database-based applications.

Enter the ‘Ordered’ Guid

So in the previous post, I talked about the desire for something with the uniqueness of Guids with the ordered-ness of integers.

Microsoft already have something like this in the database from SQL Server 2005 onwards, but in my mind, this takes manifests the worst aspects of both integer and Guid primary keys.

For starters, NEWSEQUENTIALID() is a database function, which implies that the database generates the id for you – which makes it functionally equivalent to an auto-incremented integer, with the added cost of the Guid’s size.

The true strength of the random Guid is that the id is predictably unique before the insert operation – so that sets of related rows can be inserted directly.

So taking a leaf out of that book, I decided to write a predictably unique Guid, which is also ordered, and reasonably quick to create.

General Principle

The following constructor for Guid seems like a good staring point:

public Guid(int a, short b, short c, byte[] d)

We could somehow configure

a to be a (random number) to represent a realm or domain
b to be a (random number) to represent a server in that domain
c to be a (random number) to represent an application on that server

and then provide a monotonically increasing 64-bit value for d, and we would be in business.

The Guid could be self-constructed external to the database, be predictably unique, and be ordered as well.

Setting up the Context Arguments

I created a new Class Library for this OrderedGuid class.

I used the Settings tab to create three parameters along with some default values. These values will persist after any changes made by making changes to the Class Library’s app.config file.

In the static constructor of the OrderedGuid class, set up the parameters with random values.

public static class OrderedGuid
{
    private static int _a;
    private static short _b;
    private static short _c;

    private static readonly object LockObject = new object();
    
    static OrderedGuid()
    {
        lock(LockObject)
        {
            var randomNumberGenerator = new Random();
    
            _a = (int) Settings.Default.Realm_UniqueID;
            while (_a == 0)
            {
                _a = randomNumberGenerator.Next();
            }
    
            _b = (short) Settings.Default.Server_UniqueID;
            while (_b == 0)
            {
                _b = (short) randomNumberGenerator.Next(short.MaxValue);
            }
    
            _c = (short) Settings.Default.Application_UniqueID;
            while (_c == 0)
            {
                _c = (short) randomNumberGenerator.Next(short.MaxValue);
            }
        }
    }

    public static Guid NewSequentialGuid()
    {
        throw new NotImplementedException();
    }
}

We’ll discuss and refine the NewSequentialGuid() implementation shortly. Stay tuned…

A familiar debate…

Those of us who write database-based applications are familiar with several choices that have to be made when designing the database. One of these is the selection of the primary key type.

Several schools of thought exist on the matter, and the discussions generally descend into the realms of religious dogma. There is no single hard-and-fast correct answer.

I am steering clear of several more fundamental debates in this post – specially the discussion of natural vs. surrogate primary keys. We’ll assume that we’re using surrogate primary keys in keeping with the general nature of the post.

From what I’ve learned over the years, the important characteristics of a primary key are as follows:

1) It must to be unique across the table. A typical approach would be to generate a sequence (or let the database automatically create one) and obtain a unique integer value as the primary key when a new row is inserted in a table.

This approach has several issues – first among which is that the primary key for a row isn’t known until after the row is inserted, which makes inserting a set of related rows somewhat involved.

Furthermore, in real life, databases are backed up and restored, and potentially copies of the databases need to be synchronized. So it would be nice if a primary key were globally unique, and not auto-generated by the database. This can quite easily be achieved by making a Guid the primary key type.

2) It must be easy to index (sort) by, since primary keys of one table can be foreign keys in others, and filtering by foreign keys is a very common operation.

The (potentially automatically generated) integer primary key is perfect suited to fill this need.

A Guid primary key is atrociously bad for this operation, since the globally-unique nature of a Guid is obtained by making it somewhat random. Guids do not sort easily, and in code, they natively do not even support ordering – only equality-comparison.

If only there were a way to get the globally unique nature of a Guid with the ordered nature of integers…

Hello World

G’day!

I’m an ‘agile’ software architect by day and a raging .NET code-monkey by night. Here are some of my adventures in software engineering…

Hope you enjoy reading them, and let me know if there’s something new I should be looking at!

- Sof1