Why Would Anyone Use the Goto Statement?
When I was catching up on my unread blog posts in BlogLines this morning, I saw a post from Jeffrey Palermo entitled 'Does anyone still use the "goto" statement? Microsoft does - level 100'.
Update: Once I posted this I did a check on the end result to make sure all the links work. It seems the blog post that caused me to write this (linked above to an invalid page) has disappeared - I can't see it on his blog anymore. I have no idea why. Luckily, I can still see it in bloglines :) Apologies to those who wanted to go and read it.
In it, he shows the code for the HttpUtility.HtmlEncode method, as shown by the decompiler in Reflector. He then went on to say that this was a method that really should have been refactored into a few different methods. Talk about an itchy refactor finger!
Rather than just leave a comment on his post, I thought it was worthy of a new post to discuss why it was that it seemed that Microsoft would use such an arcane practice that has such a bad stigma attached.
First, we need to remember that Reflector can only show a decompilation. That is, it doesn't know (and has no way of knowing) what the original source code was. It can only see the generated IL, and make it's best guess at what might have been there originally.
Second, simplistically speaking, IL is a restricted subset of assembler (that is, it reads as if it's an assembler-like language) and when you're down at that level, there's not that much you can do. A computer is good at adding two numbers together. Really good. And it can do it extremely fast. And that's really all a computer can do :) Everything is based on the fact that it can add. It can also compare numbers - and as a result, we can branch to different memory addresses - and this gives us an ability to do things like 'if' and call methods. AND it can jump - which is goto - but all a jump really is is an unconditional branch :)
Third, compilers convert language constructs into code that will actually work on a computer, and make optimisations (or optimizations, for you Americans who read this :) along the way. Some language constructs compile to something quite different in IL, and some optimisations compile to something even differenter (I have a feeling that while that's not a valid word, someone else probably thought of it before me).
So now that we know all that, what can we assume? First up, I think it's extremely safe to assume that there isn't any goto's in the original source. Second, if there actually is, I bet there's a very good reason.
You see, there's nothing really wrong with using a goto. That is, technically speaking there's nothing wrong. It affects readability, sure - but if there's a damn good reason, and decent inline comment explaining what's going on, why, and perhaps even a non-optimised version to show what the construct is representing, have we really done anything wrong?
I say no, we haven't done anything wrong. But it has to be used judiciously. If it's of particular benefit to the performance of the application, use it! If it's of no benefit perf-wise, then would it better to use a more readable construct? Absolutely.
So what compiles into the dreaded goto when you think you've written some whiz bang super impressive readable code?
The Select statement will do it easily, when combined with the Continue statement.
Private Sub SelectMethod()
Dim i As Int32 = 0
Do
i += 1
Select Case i
Case 5
DummyMethod(i + 6)
Continue Do
Case 4
DummyMethod(i - 4)
End Select
DummyMethod(i + 1)
Loop While i < 14
End Sub
Private Sub DummyMethod(ByVal s As Int32)
'Do Nothing
End Sub
This is some simple sample code that I knocked up (no, I didn't get it pregnant). When we look at it's decompilation in Reflector, we get:
Private Sub SelectMethod()
Dim num1 As Integer = 0
Label_0004:
num1 += 1
Select Case num1
Case 4
Me.DummyMethod((num1 - 4))
Case 5
Me.DummyMethod((num1 + 6))
GoTo Label_0042
End Select
Me.DummyMethod((num1 + 1))
Label_0042:
If (num1 >= 14) Then
Return
End If
GoTo Label_0004
End Sub
The loop has been totally removed, and we're left with just the goto's to jump back from the the bottom to the top again, and one in the middle of the Select. But why? What caused it?
The Continue statement was the culprit. We've demanded that in Case 5, it needs to drop what it's doing and jump back to the start - in fact, if you think about it, the Continue statement is just a goto anyway.
What else can cause it? Anything that says that things need to be dropped and start running somewhere else in the same method - that, combined with a circumstance where the compiler can't refactor it down to a set of if/else sets. The goto won't get inserted if the rest of the loop can be inserted in an else block :)
So what's that give us? An If within an If, but stuff outside the the If's. Confused? :)
Private Sub ForMethod()
For i As Int32 = 0 To 13
If i < 4 Then
If i > 8 Then
Continue For
End If
If i < 8 Then
DummyMethod(i)
End If
End If
If i < 8 Then
DummyMethod(i + 1)
End If
Next
End Sub
Again, this is dependant on being within a loop of some sort, but this time there's no Select - only a couple of If's. In reflector we see:
Private Sub ForMethod()
Dim num1 As Integer = 0
Label_0003:
If (num1 < 4) Then
If (num1 > 8) Then
GoTo Label_003B
End If
If (num1 < 8) Then
Me.DummyMethod(num1)
End If
End If
If (num1 < 8) Then
Me.DummyMethod((num1 + 1))
End If
Label_003B:
num1 += 1
Dim num2 As Integer = 13
If (num1 <= num2) Then
GoTo Label_0003
End If
End Sub
We've now got a goto instead of the loop. If either of the 'If < 8' tests are removed, there's no goto anymore. There's other things that can cause this too, but the principals remain the same - the generated IL is just a touch too confusing to decompile nicely.
Things aren't really that special. You saw that in that last example I had a simple For loop. As I mentioned, making the contents simpler removed the gotos. But were the goto's actually removed? Hell no.
The decompiler in Reflector (a pretty damn smart decompiler too, I've noticed) may have been able to map the IL to what was recognisable as a For loop, but in the IL, it's just using a goto - it's all it can do - make decision based on a value (when it gets to the bottom of the loop) and maybe jump back up the top to do it all again - or maybe not (and exit the loop).
So all we've proven is that it's possible to fool a decompiler :) The IL has goto's absolutely everywhere, all through the BCL in .net. I picked a method completely at random that I figured might have more than one or two lines of code and had a look at the IL - and I found goto's. In fact, the first I found was on the third line of the code! :) This is the start of the System.Windows.Forms.Form.ShowDialog(IWin32Window) method.
L_0000: ldarg(0.1)
L_0001: ldarg(0.0)
L_0002: bne.un.s(L_002c)
What's this in english? Here's a nice english translation:
If the param to this method does not equal 'Me', then GOTO L_002c.
After all, how else can a dumb CPU that only knows how to add two numbers do it?
I guess Jeffrey was right in his other post that appeared this morning. Writing software is too easy these days. For someone who's trying to claim that it's too easy to write bad software but it takes a 'good' programmer to write good software - it certainly didn't look to me be much of a good programmer trying to point and laugh at a so-called poor piece of code. The 'good' programmer would have either known what was going on, or done the research to find out why before trying to make someone else look bad.