April 16th, 2013

The case for keeping dead code

I was reading Martin Van Aken’s great post on how dead code rots a codebase. To be clear up front, I completely agree with him and he makes some wonderful points. In previous teams that I’ve worked with, I’ve had this conversation countless times, which usually results in the forceful removal of both commented-out and unused code. However, in putting together supporting arguments for said conversations, I’ve encountered two main reasons for not removing such code.

One case for keeping commented-out code

I’ll borrow an analogy for commented-out code from Martin’s post (emphasis added):

Comments are supposed to be a trail, like little bread crumbs dropped by the previous programmers to help you find your way. Each of them is a sign, which means you should ponder them : how easy would it make for the person following you to follow this trail ? How straightforward is it ? Having no trail signs at all make the road difficult, but false or contradictory ones can be even worse.

False trail signs absolutely make a trail difficult to follow. However, there’s a case where having no trail signs pose a problem: if they don’t indicate a dangerous path that’s hard to back out from.

If a block of code seems like an obvious way to solve a problem, but has some serious repercussions, then it will be helpful to include such code and leave it commented out. However, there must be a comment that clearly says “WARNING: do NOT solve this problem with the below code, for reasons x, y, and z.” It would be terrible for a new developer to come along doing some hero refactoring and think that they’ve found a great way to simplify some code — only to introduce a critical bug that’s already been fixed.

One case for keeping unused code

When you maintain a completely internal codebase, there is no reason to keep code that’s no longer used. However, if you maintain a library that has an external API, removing code that you believe is no longer used may have a breaking effect on anyone who uses your API. Once you make a contract to have your API work a specific way, it becomes costly to change that. You will need to communicate the change to everyone who uses the API. Even if the method really is no longer used, you don’t want to prevent someone from using it in a perfectly valid way in the future. This is why you must always design your API with its usage in mind.

Typically, it’s a much better idea to mark the method as deprecated, and then communicate a schedule for it’s complete removal. But as you know, deprecated methods live in libraries for a very long time, so don’t get too hopeful for removing that unused code just yet. Even if you don’t have a legal obligation to tell users when you drop methods, you may end up angering a whole whack of people when you do it.

A related example for an internal library is a persistence layer. There may be a situation in which the Delete method is never called. Unless this operation is forbidden, then you should probably still include it, because it’s an important method to fill out the basic CRUD operations on data. (Of course, you can reduce the code maintenance issue by using an ORM system, but really, that’s just pushing the same issue on to someone else’s plate.)

No hard rules

Once again, this goes to show that there are no hard rules for maintaining code, and that it pays to be aware of how changes you make will affect both your users and your fellow developers.

January 3rd, 2013

Partial initialization for C# enums (and a gotcha)

Microsoft identifies three ways of initializing values for enums, all of with which you should be familiar:

  1. Specify none of the values (implicit initialization):
    enum Meat { Chicken, Beef, Lamb }
  2. Specify all of the values (explicit initialization):

    enum Meat { Chicken = 3, Beef = 14, Lamb = 18 }
  3. Specify the starting value (part of what I’ll call partial initialization):

    enum Meat { Chicken = 1, Beef, Lamb }

It is this latter case that I want to expand on (but only a little bit).

First off, Microsoft doesn’t really recommend having a starting value for an enum without explicitly providing a default value:

In this enumeration, the sequence of elements is forced to start from 1 instead of 0. However, including a constant that has the value of 0 is recommended. For more information, see Enumeration Types (C# Programming Guide).

This is based on how enumerations work under the covers. Unless a member is explicitly set to use the value of 0, the default value is the first one in the list. So, in the previous example, the default value for the enum will be Meat.Chicken. However, if I define the enum like this:

enum Meat { Chicken = 1, Beef = 0, Lamb = 18 }

then the default value will be Meat.Beef. So far, this should be nothing new to you.

Here’s something that might be new, though: you can mix-and-match implicit and explicit initialization anywhere in the enum.

Try to imagine what will happen in this case:

enum Shape { Square = 12, Circle = 17, Triangle, Trapezoid = 22 }

Your first assumption would be correct: Shape.Triangle is 18. Seems innocent enough, right? But note what happens when you add a value like this:

enum Shape { Square = 12, Circle = 17, Hexagon = 18, Triangle, Trapezoid = 22 }

Well, now Shape.Triangle is 19! If you aren’t paying close attention when making such a change, then you’re going to break data.

Let’s go ahead and add another shape:

enum Shape { Square = 12, Circle = 17, Hexagon = 18, Triangle, Dodecahedron = 19,
    Trapezoid = 22 }

This is bad news also. You might think that the compiler would figure out a way to resolve this. However, it really screws things up. Check the value of Shape.Triangle now: again 19!

Looking at all these cases, there’s a simple rule of thumb you can follow when it comes to partial initialization: don’t do it. Either specify values for all the members in your enum, or none of them. Mixing and matching is a recipe for inadvertently introducing bugs into your application.

December 27th, 2012

Let sleeping dogs lie: the political issues with refactoring

I was recently reading some of our project’s code before adding in a new feature. As is typically the case, I felt the burning desire to make the existing code easier to understand. However, refactoring code without a specific purpose can be a controversial topic.

Here’s my take: generally speaking, it’s a Good Thing to refactor code, provided that the changes are not significant (of course, the more well-covered the code is by unit tests, the more leeway you have in what would be considered “significant”). Everybody on the team benefits from code that is easier to read.

On the opposite side of the fence, the most commonly-used argument boils down to “if it ain’t broke, don’t fix it”. But there’s another angle that I hadn’t previously considered: team politics.

For better or worse, most organizations end up implicitly viewing the code as being owned by individual programmers, or at least different teams. Because of this, there are a couple of things that can happen when you modify code that you (or your team) didn’t write.

  1. Hurting programmer pride: As programmers, any code that we write is a personal mission to solve the problem at hand. When another developer comes along and modifies that code, it’s like telling you your code sucks. Hey, maybe it does. But not many people like being told that, especially when the change itself doesn’t solve a specific and objective goal. If you work with a lot of proud people, constantly making changes to their code won’t boost your stock within the team.
  2. The defect blame game: If you’ve been in the industry long enough, you know exactly what happens a new bug comes in: “Oh, who wrote that code? Give it to them.” This is a side effect of strong code ownership, though not without good reason: typically, the programmer who wrote the code causing the bug is the one who has the most knowledge about the subsystem, and that person will be able to fix the problem the fastest and most effectively. But if you were the last person to touch the code — even if it was a simple as fixing some indenting — there’s a very good chance that the bug will end up on your plate. Not only is this something you probably don’t want to happen, but as someone who probably doesn’t know the code so well, it will take you longer to fix the issue. This is bad for the team as a whole.

These are good reasons to avoid excessive refactoring. But being someone who believes that the advantages of refactoring generally outweigh the disadvantages — especially in the long-term, and especially when managed appropriately — I am still interested in finding the most appropriate way to ensure it is done.

I’ll put forth a couple ways to assuage the political resistance to refactoring other people’s code:

  1. First off, any superficial refactoring of existing code must follow a defined coding standard. It doesn’t particularly matter what that standard is. Tabs vs. spaces? Who cares! If you have a standard, then any changes you make have a valid reasoning. If it violates the standard, then it is broke, so you do fix it. Also, developers can’t technically be offended; instead of just changing the code, you’re fixing it by bringing it up to standard. It’s more of a favour than a slight (Disclaimer: this doesn’t apply to all developers).
  2. Second, if you do decide to refactor, make sure you do it on its own branch, and clearly label the change as only including refactoring (as opposed to including possible breaking changes). This way, when others come along to see who introduced a defect, they know that you’re not [necessarily] the one with the best knowledge of the subsystem. Additionally, keeping the change completely separate makes it easier to revert.

As a final note, I think the most important factor in resolving these political issues is communication. Don’t sneak your changes in behind people’s backs. Don’t be a lone wolf. As part of a programming team, it’s important that everybody works towards the same goals. Don’t get me wrong: cleaning up existing code should be one of those goals, but it can only be done successfully if people agree that it is indeed one of those goals.

December 20th, 2012

Refactoring as you go makes it hard to identify the important changes

I’ve always been a big fan of improving code as I read it (though this opinion is controversial). This typically happens when I’m trying to understand some code I’ve never seen before (or code that I’ve written long enough ago to have forgotten it). If I see an opportunity to simplify or clarify the code, I want to do it right there on the spot. For one, it helps me to better understand the feature I’m working on. But also, if I don’t do it now, there’s a very small chance of it ever happening.

But there’s a downside.

When I commit the refactored code along with the new feature, it makes it hard to identify which parts of the check-in belong to the new feature and which are simply refactorings. If the commit causes a bug, then it becomes much more difficult to determine if it was the feature code that caused the issue or the refactoring. Additionally, it makes it harder for the project maintainer to review pull requests in a Fork & Pull environment, or any environment where commits must be reviewed before being accepted into the trunk.

A better solution (and the solution I currently use) is to create a separate branch for all refactoring related to a feature, and commit this separately from the feature branch. This has the advantage of keeping both types of work separate from the other. However, this also has a disadvantage: the feature branch cannot be committed without the refactoring branch if the feature branch is using the refactored code. In my case, when I’m refactoring to improve my feature, I want the refactoring in my feature branch. So, in this situation, the refactoring branch must be accepted first. If it is rejected, then the feature branch cannot be committed on its own.

November 28th, 2012

Where the icons used in Visual Studio come from: Babel Icons

This is a fairly low-content post, but today I learned that the source for many of the icons in Visual Studio comes from the Babel package:

The Babel Package comes with a collection of icons that are used in the IntelliSense completion lists and in the drop-down bar used for quick navigation. Most of these icons are arranged in sets of six variations, where each variation corresponds to the access concepts of public, internal, friend, protected, private, and shortcut. For example, you can have different icons for a method depending on whether it is public, protected or private.

Here’s what the icons look like, most of which will be familiar to any of my fellow .NET developers:

Babel Icons