optimization techniques of the rich and famous
category: code [glöplog]
Hi! I want to make a thread about optimizations. Platform independent. Tell us your tips. Here's mine:
Do things in chunks. If your machine has cache it's way better to keep things in that cache. For example, my nes emulator. I bang out the background and bang out the sprites over it. There's over-draw with the sprites on the background but it's worth it.
When it comes to big reads or writes do it in size of the arch's data-size.
What are your tips!!!!?!?!?
Do things in chunks. If your machine has cache it's way better to keep things in that cache. For example, my nes emulator. I bang out the background and bang out the sprites over it. There's over-draw with the sprites on the background but it's worth it.
When it comes to big reads or writes do it in size of the arch's data-size.
What are your tips!!!!?!?!?
- Speed, size/memory consumption, and simplicity work against each other. Find the best way to measure and balance each of them.
- When prototyping, simple, repeatable and robust is better than fast.
- If computation is slow, make a table! If memory is slow, compute it!
- Don't be too general. Do only the thing you need.
- Get familiar with the hardware. The faster way on one platform may be slow on another.
- When prototyping, simple, repeatable and robust is better than fast.
- If computation is slow, make a table! If memory is slow, compute it!
- Don't be too general. Do only the thing you need.
- Get familiar with the hardware. The faster way on one platform may be slow on another.
- Optimization is less about code and more about data: size, format, layout, transforms, caching, hardware requirements.
- Iterate fast: find ways to try out ideas quickly.
- Find out how others solved a similar problem to get more ideas.
- How do things work under the hood and why? Learn how to read compiler output for different platforms.
- Code that's customized for your particular problem is better optimized than generic code.
- Don't feel attached to your code. It's OK to delete it if you find something better.
- Simple algorithms and data structures are better than fancy ones when data is small. The break-even point is higher than you think.
- Iterate fast: find ways to try out ideas quickly.
- Find out how others solved a similar problem to get more ideas.
- How do things work under the hood and why? Learn how to read compiler output for different platforms.
- Code that's customized for your particular problem is better optimized than generic code.
- Don't feel attached to your code. It's OK to delete it if you find something better.
- Simple algorithms and data structures are better than fancy ones when data is small. The break-even point is higher than you think.
"Don't feel attached to your code". Totally! just because you worked on something a long time doesn't make it good.
- Try everything you can think of; anything is allowed.
- Learn as much boolean logic as you can first.
- Learn the CPU’s instruction set and how it behaves.
- Keep a table of instructions with cycle counts and sizes.
- If you have a puzzle mindset, use it.
- Try alternate techniques; measure and keep the fastest or smallest code.
- Prototype different designs in a higher-level language, then optimize in assembly; keep the fastest or smallest.
- Once the inner loops are optimized, try to reduce the number of loops used.
- Learn as much boolean logic as you can first.
- Learn the CPU’s instruction set and how it behaves.
- Keep a table of instructions with cycle counts and sizes.
- If you have a puzzle mindset, use it.
- Try alternate techniques; measure and keep the fastest or smallest code.
- Prototype different designs in a higher-level language, then optimize in assembly; keep the fastest or smallest.
- Once the inner loops are optimized, try to reduce the number of loops used.
I recently wrote down (for an article) some notes on speed optimization in code execution...
Specifically in the context of oldskool democoding on PC/DOS:
- Make your routines as self-contained and monolithic as possible. You don’t want to be jumping around between various external procedures and functions inside a loop.
- Minimize memory accesses; in inner loops, try to rely only on CPU registers for intermediate calculations.
- Design your data structures so that your algorithm reads memory linearly. This way you make better use of the CPU cache.
- Look for ways to reduce computations inside the inner loop; think about what can be moved one level higher. Anything that can be calculated outside the loop should be moved out of it!
- Use partial loop unrolling. This saves cycles both on jumps and on register counter operations, and at the same time makes code prefetching more effective.
- Minimize branching inside inner loops.
- Read from and write to memory not in single bytes, but in “words” (16-bit) or “double words” (32-bit).
- Minimize multiplications and divisions. Replace them with bitwise tricks where possible. Use fixed-point arithmetic instead of floating-point. (If you dig deeper into the subject, there are whole books about it — e.g. Hacker’s Delight — but be careful, that’s heavy stuff!)
- Use lookup tables with precomputed values.
- Apply self-modifying code.
- Write narrow, non-scalable code. Hardcoding is your friend :)
The less flexible and configurable your code is, the better you can optimize it for a specific case.
- And of course: ingenuity, trickery, and fooling the viewer.
Specifically in the context of oldskool democoding on PC/DOS:
- Make your routines as self-contained and monolithic as possible. You don’t want to be jumping around between various external procedures and functions inside a loop.
- Minimize memory accesses; in inner loops, try to rely only on CPU registers for intermediate calculations.
- Design your data structures so that your algorithm reads memory linearly. This way you make better use of the CPU cache.
- Look for ways to reduce computations inside the inner loop; think about what can be moved one level higher. Anything that can be calculated outside the loop should be moved out of it!
- Use partial loop unrolling. This saves cycles both on jumps and on register counter operations, and at the same time makes code prefetching more effective.
- Minimize branching inside inner loops.
- Read from and write to memory not in single bytes, but in “words” (16-bit) or “double words” (32-bit).
- Minimize multiplications and divisions. Replace them with bitwise tricks where possible. Use fixed-point arithmetic instead of floating-point. (If you dig deeper into the subject, there are whole books about it — e.g. Hacker’s Delight — but be careful, that’s heavy stuff!)
- Use lookup tables with precomputed values.
- Apply self-modifying code.
- Write narrow, non-scalable code. Hardcoding is your friend :)
The less flexible and configurable your code is, the better you can optimize it for a specific case.
- And of course: ingenuity, trickery, and fooling the viewer.
I only have one technique:
- Measure.
The better (i.e., relevant, accurate, fast-to-get-an-answer) your measurements are, the better you will be able to optimize. Conversely, do crap on faith without benchmarks and profiles, and you'll just end up flailing around.
- Measure.
The better (i.e., relevant, accurate, fast-to-get-an-answer) your measurements are, the better you will be able to optimize. Conversely, do crap on faith without benchmarks and profiles, and you'll just end up flailing around.
- Turn off the dynamic voltage and frequency scaling
(For more stable measurements that doesn't depend on your device's temperature or batterylife if applicable)