Sunday, December 04, 2005

Writing Clear Code

This topic is a bit of a divergence from Custom Actions, but useful for some general coding skills. I'm going to highlight the C# language in this post, although you don't need to know C# to benefit from this discussion - I'll cover the essentials in the post. The topic is how to write clear code that makes it easier to defer the understand the intention of the code rather than the behavior of the code. The subject came up about a few months ago in a discussion with a coworker who initially didn't quite agree with me. I'll lay out the argument here as I did with him.

Take for instance the following code snippet:

string a = "hello";
string b = "hello";
System.Console.WriteLine(a == b);
System.Console.WriteLine((object)a == (object)b);
Lets explore the last two lines. The a == b equivalence operator is working on strings, and by C# rule, two strings are equivalent if they are both null, or if both values are non-null references to string instances that have identical lengths and identical characters in each character position. Translation - two nulls are identical, and any two strings that match, case sensitive, will result in a true expression. In the latter comparison, when we cast both of the strings to objects, we are not comparing the values of the strings, but the objects themselves.

What do you think it will print? The answer is True and True. Why? String literals that are identical within the same assembly refer to the same instance. Translated, 'a' and 'b' are variables that refer to the same underlying object.

Now lets replace the declaration and assignment for string b with:
string b = "he"+"llo";
Now what do you think it will print? The answer is again True and True. Why? Because the Lexical analysis (lexer) in the compiler stripped out the needless additive operator in the literal. The same would have happened if we changed the string to be "hell\u006f" - since '\u006f' is the Unicode escape sequence of the lower-case letter 'o', and this expansion takes place (most likely) prior to the lexer during the transformation phase in the compilation process.

Now lets replace the same line as above with the following two lines:
string b = "he";
b += "llo";
Now what do you think it will print? The answer is now True and False. Why? The strings are equivalent, but since the preprocessing of the source file before the compilation did not identify the strings as being the same literal, they are in different objects. In fact, because a string is immutable (once created cannot be changed), there was the construction of the string "b" during declaration and initial assignment that was later garbage collected when a new instance of "b" was created during the string concatenation.

Now that the necessary background information is understood, can you tell me the intention of the following code snippet?
string a,b;
//code here that assigns and manipulates a and b
...
if (a==b)
compareOK();
else
compareBad();
I'm hoping you are going to tell me that you have no clue as to the intent of the programmer. The intent could have been to compare the two strings in a case sensitive manner, ignoring cultural rules (which is what actually is happening in the above snippet). It also could have been to compare object equivalence, just that the programmer forgot to cast the strings to object first. The intention could also have been a case-insensitive comparison. Another possible intent was to compare strings using cultural rules. The intent of the programmer simply is not clear!

Rewriting the comparison line in the above snippet correctly, assuming the actual behavior of the code was the intent should look like this:
if ( String.Compare(a, b, false,
System.Globalization.CultureInfo.InvariantCulture ) == 0 )
This tells me that the intent of the programmer was to compare the two strings in a case sensitive manner, ignoring cultural rules . There is no other possible intent given the above line of code. This is not saying that the code is correct - just that the intent of the code is clear to those lucky enough to read it, and that the developer thought it through.

If I am skimming through some code and see a non-String.Compare() string comparison, I will sprinkle a //BUGBUG: comment above it. Why? The simple string comparison syntax has been a rather common cause of bugs in C# code - much like the C/C++ switch statement. When investigating an issue in a piece of code, I will first look for these BUGBUG's and the TODO's as a starting point - often with better-than-random chance results.

1 comment:

Adrian said...

A man after my own heart. I became very, very distressed recently to receive commercial, paid for code to various boards which make extensive use of 'goto' (spit, ftuii) in C/C++.