How Bad Can it Be, Really? Plenty…
This morning I came in to a horrible problem brought on by a new business focus, and folks not asking the right people what the possible impact would be of a seemingly simple change. Both of the apps I'm responsible for suffered this morning - one was a disaster until I could get a fix in the code for the problem, and another was simply not showing the right data. Arguably, not a lot better, but at least it was up and running.
The problem started long before I joined the Shop with the decision by someone to make all the databases case-insensitive sorting. This means that the data: "Steve" and "STEVE" are different in the database, but if you try to do a SELECT on the data, you'll get both values on any variation of "Steve" in the WHERE clause.
Let's assume we had a table in this database where people's names were held.
ID | First | Last | Age |
412 | Steve | Jobs | 44 |
21 | Tom | Swift | 44 |
332 | Bill | Gates | 44 |
12 | Tom | Slick | 44 |
then it would be possible to do the query:
SELECT * FROM people WHERE FIRST='steve'
and get:
ID | First | Last | Age |
412 | Steve | Jobs | 44 |
but you'd get the exact same results if you did:
SELECT * FROM people WHERE FIRST='STEVE'
or:
SELECT * FROM people WHERE FIRST='sTeVe'
It doesn't matter to the query processor. I can't see a possible reason for this - why not just force all the data in a table's column to be uppercase? It would make it case-insensitive, but not open you up to the following disaster.
Last evening something was added to this mythical table - a new "Steve Jobs". Let's say the table originally looked like this:
ID | First | Last | Age |
412 | STEVE | JOBS | 44 |
21 | Tom | Swift | 44 |
332 | Bill | Gates | 44 |
12 | Tom | Slick | 44 |
and they wanted to correct the mistake in the case of the name. Well... the obvious change to me is to run the SQL:
UPDATE people SET FIRST='Steve', LAST='Jobs' WHERE ID=412
But that's not what was done. No, they created a new person so that the database looked like this:
ID | First | Last | Age |
412 | STEVE | JOBS | 44 |
21 | Tom | Swift | 44 |
332 | Bill | Gates | 44 |
12 | Tom | Slick | 44 |
601 | Steve | Jobs | 44 |
now we're in a pickle. When we try to find Steve with the SQL:
SELECT * FROM people WHERE FIRST='Steve'
we're going to get both of the rows:
ID | First | Last | Age |
412 | STEVE | JOBS | 44 |
601 | Steve | Jobs | 44 |
and where we were expecting one row to be returned, we now have two. Different systems will handle this differently, but there's no way the database will be able to differentiate them by their names. In reality, the ID is all that you have, and that's typically not visible to the users of a complex system.
This is what hit me this morning - two rows, and the new row was the first one returned, and it wasn't completely set up properly, so a lot of the supporting data wasn't there. Disaster.
Since there's nothing you can do to the SELECT statement, you have to filter the output, so my code went from:
String sql = "select ID from people where First='" + name + "'"; ResultSet rs = stmt.executeQuery(sql); if (rs != null) { if (rs.next()) { id = rs.getInt("ID"); } }
to:
String sql = "select First, ID from people where First='" + name + "'"; ResultSet rs = stmt.executeQuery(sql); if (rs != null) { while (rs.next()) { if (name.equals(rs.getString("First"))) { id = rs.getInt("ID"); break; } } }
and then later in the code, of course, I need to check and see that I actually got something.
While it's not horrible, it's something that's totally avoidable by either setting the case on the fields in the table, or allowing correct case determination in the SELECT statements so that this would easily have been found out early in the process.
As it was, I spent several hours on this - fixing code, and planning for other similar problems as they migrated datasets in the database. It's just not necessary.