I've gotten burned several times lately with little defects that are indirectly caused by implicit "read the tea leaves" style programming. Here's an example of what I mean taken from my work last week.
<Rule DayLimit="5" />
It's just an attribute in an XML configuration file that specifies configurable business rules, no big deal. The problem was that a value of "0" in the DayLimit attribute was interpreted as a completely different business rule than a positive value in the DayLimit attribute. Fortunately an automated regression test picked up the forgotten requirement. I think it would have elminated some confusion if there had been a separate attribute for the "0" case like AllowPostDating="False" to be more explicit.
I caused an edge case bug by a little bit of sloppy programming. I was pulling data from some validation tables in the database into an array of objects that would be consumed by business logic classes. In this case it was perfectly legal to have "child" rows even if the "header" row didn't exist. In the case of orphan records, I just created the header object and assigned a zero value to it's Rate property. In some cases during business validation I would check whether Rate != 0 to check if the business entity existed at the header level. This was confusing, but workable until some automated tests failed with false validation errors when existing header records had a rate of zero. Once I understood the issue, I corrected the object structure to make the existence test more explicit like the following.
public class ImplicitRecord
{
private decimal _rate;
public decimal Rate
{
get { return _rate; }
set { _rate = value; }
}
}
public class ImplicitBusinessClass
{
public void Process(ImplicitRecord record)
{
// Check if there is ANY rate
if (record.Rate == 0)
{
// create an error message
}
}
}
public class ExplicitRecord
{
private decimal _rate;
private bool _hasRate;
public decimal Rate
{
get { return _rate; }
set { _rate = value; }
}
public bool HasRate
{
get { return _hasRate; }
set { _hasRate = value; }
}
}
public class ExplicitBusinessClass
{
public void Process(ExplicitRecord record)
{
// Check if there is ANY rate
if (!record.HasRate)
{
// create an error message
}
}
}
One of the scariest, most error prone idioms in all of software development is passing around a Hashtable/ArrayList of Hashtable/ArrayList objects. How many pernicious bugs have been caused by fouling up the key values to the Request, Session, and QueryString collections? Assuming you have a choice in the matter, which class below would you rather consume based on it's public API? The answer is ExplicitActionClass unless you're being perverse just to spite me. Remember that other developers will follow behind you, so code to reduce the probability of their mistakes. Make the public API easy to use and intention revealing.
public class HashtableActionClass
{
public ArrayList Execute(Hashtable arguments)
{
// unwrap things in the arguments hashtable and perform work
string userName = (string) arguments["USER_NAME"];
decimal purchaseAmount = (decimal) arguments["PURCHASE_AMOUNT"];
// perform work and return an answer
return new ArrayList();
}
}
public class ReturnClass
{
}
public class ExplicitActionClass
{
public ReturnClass Execute(string userName, decimal purchaseAmount)
{
ReturnClass returnValue = new ReturnClass();
// perform work and log results to the returnValue variable
return returnValue;
}
}
I've never coded in C++, but I'm guessing that passing around pointer structures between objects led to some truly wicked bugs.
Evil Databases
Ambiguous database designs can cause even more damage. I've often run across database tables whose columns represent very different conceptual things depending upon the values in another column. I know there is a little bit of database inefficiency by having a bunch of columns with lots of null data, but I'd still much rather have separate columns for separate logical concepts and pieces of information.
I've been in several situations where the only way to integrate two or more systems was to directly peek into another system's underlying database. This is fraught with so much risk that it's borderline insane. It's risky because you're duplicating a lot of logic required to "interpret" the business meaning in the underlying data. On one hand you have to correctly reproduce the interpretation, and on the other hand you must keep the duplicated logic synchronized through later changes. The synchronization just isn't going to happen because the different codebases are probably being built and tested separately. Any change in the database schema becomes risky without coordinated, large scale automated testing on every downstream system or any form of compile time checks keeping the applications synchronized with the database.
At a previous employer we had an absolutely humongous Operational Data Store (ODS) that contained answers to every question. Many applications in the enterprise touched this monster. A much greener, more idealistic version of myself foolishly tried to give the ODS team a suggestion to optimize a terribly sluggish view I needed to access. I basically had my ear chewed off and told that the code change would take 6 months of regression testing to do something like that. I ended up creating a polling mechanism to cache the very important, extremely volatile data every 15 minutes just so our application could actually function with any kind of decent performance. The biggest usage of our user interface turned out to be a report on the cached data that we'd thrown in at the last minute as a "nice to do" feature request. We almost had to add another web server to the farm just because of the demand for this data that was too difficult to pull out of the ODS.
I think that people are dangerously overexuberant about SOA in general, but prudent usage of SOA should eliminate the need for duplicating the fragile database sharing crap, and that sounds pretty darn good to me (using web services as a thin pass-through data layer instead of raw ADO.NET or JDBC seems rather foolish to me though).
I feel that coding explicitly is orthogonal to the Static versus Dynamic Typing debate. "Duck Typing" is one thing, but hidden meaning in code is just plain bad coding.