Last week I was tasked with investigating an excel report which was failing to generate for certain large accounts. My initial thought was an inefficient query or poorly implemented logic was to blame. However after digging through the query logs I couldn't find anything wrong with the queries and the business logic appeared reasonable for the ask. Yet the report was timing out after what I would have considered a reasonable time for it.
Without any obvious clues I decided to beat on it caveman style and see how long it actually would need to generate. After increasing the request timeout from 10 to 20 then then 30 minutes the report stopped timing out and instead failed with a java heap memory error. This was the clue I needed.
Consider the following conditions. You have a data set with several thousand records which you need to match against another collection by a specific key and then output a specific field as a string. This is the approach I encountered in the code:
- Query the database
- Convert the query object into an array of structures with keys "id", "description" & "label"
- For each structure create a cfc object (foo) representing that item
- Populate foo with the data from that structure
- Append the foo object to an array of objects
- Loop over the objects array and output the value of foo.getDescription() if the value of foo.getId() is in some other structures list of id's
Can you see the problem? Would it help if I told you that getDescription() does nothing to the value of the "description" field other than return it.
The solution was an easy one. By just removing steps 3,4 & 5, returning the array of structures and then referencing the "array[some_index].description" field directly I was able to get the report to stop eating java memory and complete in under 2 minutes.
There was absolutely no value add to converting the array of structures to an array of objects, and it killed the performance of the process. Now I could have taken this a step further and just returned the query object itself but it would have required more changes to the logic not shown here and it would have only gained milliseconds.
This was an easy fix but its a symptom of a two common problems that I see it time and again in different code bases.
Best Case Only Thinking
Looking at the above coding logic you can immediately tell that this was initially tested against a small dataset of maybe a dozen records. There was no thought given to what would happen with several thousand records until it was too late and this code was already in production.
Subsequent efforts to improve this code were able to fix obvious issues around poorly optimized queries and the like, but this underlying problem was not obvious without real world sized data.
So for the love of little processors, come up with and test against a realistic worst case scenario before it becomes a reality!
Blindly following a pattern
The second problem is less obvious but just a bad. What happens is a programmer or more dangerously a manager learns a specific coding technique or pattern, (in this case creating objects to represent items) and applies it everywhere (or mandates it be applied everywhere) without taking the time to think about if its the correct pattern for the situation.
As developers we need to not just be blind programmers following patterns, but instead stop to consider what is correct for the situation, and be willing to push back against blind mandates when necessary.
Oh and in case my opinion on this is not clear.
There is nothing wrong with using CF's build in data structures to pass data around when more complex objects are not needed! </Rant>