Common indirection found in code

3/19/2023

High caliber engineers find this incredibly embarrassing and make it a career goal to never have one of their services fail to perform adequately when tested at peak production loads using full sized data. The most disruptive and most common failure I encounter is engineers testing with small amounts of data who are seeing good performance and then having the system fail to perform well at production data volumes. I fully agree that delivering code capable of solving the business problem as early as possible in the project is fundamental to success. Where I diverge is that when you have code in a working state for medium sized modules or services. You need to determine the highest transaction rates and largest data volumes needed by the production loaded system. Then you need to demonstrate the ability of the module or service to meet at least 2 times the production performance requirement under sustained tests. The critical aspect is testing with data volumes larger than the production system data size. I may be excessively focused on performance but it is mostly because I have seen lack of focus cause good people harm. I regularly meet very talented executives and senior managers who worked for decades to reach a position of authority and are at risk of loosing their jobs and their credibility due to a poorly performing system. They have not been able to fix the problems despite multiple attempts over an extended period of time. I disagree with this approach largely because we have major companies hiring us to fix problems in systems were they have invested millions of dollars and are on the verge of cancellation because the team has struggled for months or years without reliably delivering sufficient performance.

Most of the industry is living by the maximum " CPU power is increasing so fast that we don't have to worry about performance". One of the most common quotes is " premature optimization is evil" which they interpret as "don't worry about performance until it has manifest as a problem". This is a good example of when fixed or bounded latency was critical. High performance was important but performance was mostly a tool we used to help meet the bounded latency requirements.Ī large fraction of our income comes from clients who have systems that are responding too slow (high latency) normally caused by poor performance. We were gathering telemetry coming in from high bandwidth radio and microwave links. If the CPU was blocked and didn't respond we could loose several frames of data. Since we were stress testing billion dollar machines loosing data when a critical failure occurred could become a multi-million dollar cost and require re-running the test.

Low latency and high performance are two separate but closely related domains. In low latency systems what most people actually mean is latency that is reliably below an acceptable threshold.Įarly in my career I worked on near real-time systems where latency of CPU availability was critical. It really is a question of pay now or pay a lot more not very much latter. *1 I have found that adequate focus on maximizing performance per CPU core can allow a single CPU to do much more work. This can delay the need for the next set of sophisticated distributed processing techniques. Reducing the number of distributed components can reduce code volume and complexity enough to pay for the cost of local optimization several times over. It is fairly common to see these optimizations increase the work capacity of a single multi-core CPU by at least 10 times which effectively removes the need to break that work among 10 computers. The second is using clever distributed architecture to allow more CPU cores to participate in the solution. I tend to focus on performance at two levels. The first is getting the maximum speed per CPU core. It is easy to tell people how make high performance systems but it requires years of focus to gain the intuition about where to look and how to think. Feel free to contact us as we provide this kind of help for teams who can not afford the time needed to grow their own gurus. I greatly appreciate your comments since they are often the genesis of my next post. I received a request to mentor low latency C/C++ code from several readers as a result of my last blog post " C++ faster than C# depends on the coding idom" I love to mentor but thought it would help a larger group if I shared the some of the basic rules in a separate blog post. I use these techniques for our Stock and Forex trading prediction engines but they are equally useful in most domains. This is a huge topic area easily large enough to fill several books so please forgive items I missed.

0 Comments

Common indirection found in code

Leave a Reply.

Author

Archives

Categories