Jump to content

Object code optimizer

From Wikipedia, the free encyclopedia

A binary optimizer takes the existing output from a compiler and produces either a better execution file or optimized machine code in live program memory, with the same functionality.

An object code optimizer, also known as a binary optimizer, takes a program's object code or machine code either from a linked executable binary file, or from live program memory, and produces sections of optimized code, invoked through either redirection or code replacement, that is functionally equivalent yet more performant. Program speedup is a common optimization goal. Some are intended to modernize old binaries to take better advantage of updated hardware,[1] while others lean heavily on profile-guided optimization and interprocedural optimization to deliver performance gains.[2] Some utilize run-time mechanisms to introspectively improve performance using techniques similar to JIT compilers.[3]

Examples

[edit]
  • COBOL Optimizer was developed by Capex Corporation in the mid 1970s for COBOL. This type of optimizer depended, in this case, upon knowledge of "weaknesses" in the standard IBM COBOL compiler, and actually replaced (or patched) sections of the object code with more efficient code. The replacement code might replace a linear table lookup with a binary search for example or sometimes simply replace a relatively slow instruction with a known faster one that was otherwise functionally equivalent within its context. This technique is now known as strength reduction. For example, on the IBM/360 hardware the CLI instruction was, depending on the particular model, between twice and 5 times as fast as a CLC instruction for single byte comparisons.[1][4]


  • IBM Automatic Binary Optimizer for z/OS (ABO) optimizes old COBOL binaries to improve performance by utilizing newer hardware instructions available on modern IBM Z servers. ABO does not require recompilation from application source code. This allows older COBOL applications to achieve comparable performance to what would be possible if recompiled from source with the latest compilers, without the risk of introducing subtle behavioral changes that may occur when recompiling from source with newer tool chains. In some cases, older COBOL programs may not even have the original source code available.[5]


  • Spike Executable Optimizer developed by Digital Equipment Corporation and later extended by Compaq, was a profile-guided executable optimizer for DEC Alpha binaries on Windows NT and Tru64 Unix. It optimized for code layout and branch reduction, along with profile-guided data cache prefetching, targeted both user mode and kernel mode executable code, and delivered speedups on various programs including up to 40% speedup for Oracle Database running the TPC-C benchmark. It did not require recompilation from source, and included the Spike Optimization Environment (SOE) and Transparent Application Substitution (TAS) to manage executable instrumentation, profile collection, and automatic selection between instrumented and optimize versions of system binaries.[6][7]


  • Solaris Binary Optimizer, formerly known as Sun Studio Binary Code Optimizer, was originally developed by Sun Microsystems and is currently maintained and distributed by Oracle Corporation as the tool binopt. It can be used to both instrument and optimize x86-64 and SPARC binaries for the Oracle Solaris operating system. It's main optimizations are profile-guided inlining, hot/cold code layout and branch optimization, and improved data placement and alignment. It requires binaries built using specialized compiler flags.[8][9]


  • The SOLAR Project included three distinct tools, ALTO for DEC Alpha, PLTO for x86, and ILTO for Itanium. These were primarily link-time optimizers (LTO) that could also function as binary optimizers on finished executables. They would take ECOFF or ELF binary files as input along with relocation data to reconstruct inter-procedural control flow graphs. ALTO performed whole-program analyses and optimizations and applied techniques such as constant propagation, unreachable code elimination, inlining, code layout, and instruction scheduling. Many of these optimizations were profile-guided.[10] There was also work done to target power savings by prioritizing reduced memory operations over cycle count estimates when directing optimizations, since memory accesses have an outsized impact on power.[11] PLTO focused on value profile-guided specialization by instrumenting executables to collect value profiles on runtime constants, then cloning and optimizing code paths for frequent values driven by cost-benefit analysis.[12] ILTO worked by disassembling bundles from Itanium machine code into normalized control‑flow graphs, recovering predicate and speculation information, and rewriting the machine code to apply unpredication/unspeculation, selective if‑conversion, late scheduling, bundling, and layout changes, to produce either faster or more analyzable binaries.[13] There was a system called Squeeze, based on ALTO, that was a binary rewriting tool for compacting binaries for the Alpha architecture without reducing performance. Code footprints were reduced through code factoring and procedural abstraction which increased code reuse for common instruction sequences. Liveness analysis was used to identify and eliminate dead data from executables.[14][15]


  • Dynamo was a research project by Hewlett Packard Labs in the late 1990s that implemented a transparent dynamic binary optimizer for PA-RISC programs running on the HP-UX operating system. Dynamo ran as an interpreter for unmodified binaries that would emit the hottest traces of machine code into a software code cache. Those traces would then be optimized by reducing branching, improving code locality, and applying copy propagation, constant propagation, strength reduction, loop invariant code motion and loop unrolling. Target binaries had to be relinked to it's custom version of the standard C runtime to load Dynamo's shared library. Because it effectively functioned as a lightweight virtual machine, it had broad compatibility across user mode code. It achieved speedups of up to 22% on some SpecInt95 benchmarks.[3] Dynamo later spawned DynamoRIO in collaboration with researchers at MIT.


  • DynamoRIO is an open-source dynamic binary instrumentation framework that extends the original Dynamo project, supporting runtime optimization across multiple architectures and OSes.[16]


  • ADORE was a research project that implemented a dynamic optimizer for unmodified Itanium binaries. Loaded into a target process as a shared library, it would generated optimized code traces in a code cache and live patch the original code to the optimized version. This approach eliminated the interpreter overhead found in Dynamo by allowing the original code to run natively in place before redirecting hot regions to an optimized code cache. It leveraged hardware performance counters to perform profile-guided software prefetching, delivering up to 57% speedups on some SPEC2000 benchmarks.[17]


  • COBRA was a research project that also implemented a dynamic binary optimizer for Itanium executables on Linux with a similar architecture as ADORE, primarily adding phase detection and reoptimization to adapt to changing workloads, with up to 68% speedup on one benchmark using profile-guided prefetching.[18]


  • Dynimize is a dynamic binary optimizer for Linux x86-64 programs. It loads a code cache into a target process and uses x86 to x86 JIT compilation plus hot patching to apply profile-guided compiler optimizations, running as a separate process from its targets. It emphasizes minimal system changes without modifying binaries on disk, does not require recompilation from source, and does not require target processes to be restarted.[19] It optimizes both executable and shared library code in memory, and v2 (beta) also optimizes unmodified Linux Kernels and device drivers in memory.[20] It is a general purpose dynamic binary optimizer for any program that meets its compatibility requirements however commercial support is limited to use with MySQL/MariaDB databases,[19] where it shows speedups of up to 55%.[21]


  • BOLT is a post-link optimizer built on top of the LLVM framework. Utilizing sample-based profiling, BOLT improves the performance of real-world applications even for highly optimized binaries built with both feedback directed optimization and link-time optimization. For GCC and Clang compilers, BOLT speeds up their binaries by up to 20.4% on top of FDO and LTO, and up to 52.1% if the binaries are built without FDO and LTO. It's use of sample based profiling with minimal overhead has made practical deployment of binary optimization throughout Meta's datacenters possible.[2] It can also optimize the Linux Kernel, and has been integrated into the LLVM Project.[22]


  • Propeller is a profile-guided, relinking optimizer developed by Google for warehouse scale applications. It takes a fully linked binary plus linker metadata and runtime profiles, maps sampled PCs back to symbols/basic blocks, and then performs whole‑program, profile‑guided layout optimizations without requiring a source rebuild. It's been deployed to production across Google's datacenters, and has been integrated into the LLVM Project. By using low-overhead sample based profiles collected from the binaries it's optimizing, it can more accurately apply profile-guided code layout optimizations than traditional profile-guided optimization from source code.[23]

Advantages

[edit]

Some binary optimizers allow for the application of better compiler optimizations to executables where source code is not available, in some cases instantly with minimal effort.[5] They may allow for more accurate profile attribution when profile guided optimizations are applied.[22] Whole program optimizations can be applied to finished executables, beyond the scope of traditional source code compilation units.[10] Dynamic binary optimization allows for more accurate, flexible profiling information, specializing shared library code per process, and may not require changes to binaries on disk.[3]

See also

[edit]

References

[edit]
  1. ^ a b "System/360 Instruction Timing Information" (PDF). Archived from the original (PDF) on 2010-07-11. Retrieved 2010-01-07.
  2. ^ a b Panchenko, Maksim; Auler, Rafael; Nell, Bill; Ottoni, Guilherme (2019-02-16). "BOLT: A Practical Binary Optimizer for Data Centers and Beyond". 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). pp. 2–14. arXiv:1807.06735. doi:10.1109/CGO.2019.8661201. ISBN 978-1-7281-1436-1. S2CID 49869552.
  3. ^ a b c "Transparent Dynamic Optimization: The Design and Implementation of Dynamo" (PDF). Archived from the original (PDF) on 2003-05-03.
  4. ^ Evans, Michael (1982-12-01). "Software engineering for the Cobol environment". Communications of the ACM. 25 (12): 874–882. doi:10.1145/358728.358732. S2CID 17268690.
  5. ^ a b "IBM Automatic Binary Optimizer for z/OS". Archived from the original on 2025-12-16.
  6. ^ Cohn, Robert S.; Goodwin, David W.; Lowney, P. Geoffrey (1997). "Optimizing Alpha Executables on Windows NT with Spike" (PDF). Digital Technical Journal. 9 (4): 3–20. Archived from the original on 2025-12-16.
  7. ^ Flower, Richard; Luk, Chi-Keung; Muth, Robert; Patil, Harish; Shakshober, John; Cohn, Robert; Lowney, P. Geoffrey (2001). "Kernel Optimizations and Prefetch with the Spike Executable Optimizer with Spike".
  8. ^ "The Binary Code Optimizer". Archived from the original on 2010-07-22. Retrieved 2010-01-07.
  9. ^ "Oracle Solaris Binary Optimizer". Archived from the original on 2024-09-14.
  10. ^ a b Muth, R.; Debray, S. K.; Watterson, S.; De Bosschere, Koen (2001). "Alto: A Link-Time Optimizer for the Compaq Alpha" (PDF). Software—Practice and Experience. 31 (1): 67–101. Archived from the original (PDF) on 2008-02-21. Retrieved 2025-12-17.
  11. ^ Debray, Saumya; Muth, Robert; Watterson, Scott (2001). Software Power Optimization via Post-Link-Time Binary Rewriting (PDF) (Technical report). Department of Computer Science, University of Arizona. Retrieved 2025-12-29.
  12. ^ Schwarz, Benjamin William (2002). Post Link-Time Optimization on the Intel IA-32 Architecture (PDF) (Honors thesis). University of Arizona. Archived from the original (PDF) on 2006-03-01. Retrieved 2025-12-17.
  13. ^ Snavely, Noah. Optimizing and Reverse Engineering Itanium Binaries (PDF) (Thesis). Archived from the original (PDF) on 2006-03-01. Retrieved 2025-12-17.
  14. ^ De Sutter, Bjorn; De Bus, Bruno; De Bosschere, Koen; Debray, Saumya. Combining Global Code and Data Compaction (PDF). Department of Electronics and Information Systems, Ghent University / Department of Computer Science, University of Arizona. Retrieved 2025-12-29.
  15. ^ ""SOLAR" Software Optimization at Link-time And Run-time". Archived from the original on 2016-02-14.
  16. ^ Bruening, Derek; Garnett, Timothy; Amarasinghe, Saman (2003). An Infrastructure for Adaptive Dynamic Optimization. International Symposium on Code Generation and Optimization (CGO). Retrieved 2026-01-13.
  17. ^ Lu, Jiwei; Chen, Howard; Fu, Rao; Hsu, Wei‑Chung; Othmer, Bobbie; Yew, Pen‑Chung (2003). The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System (PDF). MICRO 36 (36th Annual IEEE/ACM International Symposium on Microarchitecture). Archived from the original (PDF) on 2024-07-05. Retrieved 2025-12-18.
  18. ^ Kim, Jinpyo; Hsu, Wei-Chung; Yew, Pen-Chung (2007). "COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications". 2007 International Conference on Parallel Processing (ICPP 2007). p. 25. doi:10.1109/ICPP.2007.23. ISBN 978-0-7695-2933-2. S2CID 15079211.
  19. ^ a b "Dynimize Product Overview". dynimize.com. Dynimize Inc. Archived from the original on 2025-08-11. Retrieved 2025-12-18.
  20. ^ "Dynimize User Guide v2". dynimize.com. Dynimize Inc. Archived from the original on 2025-12-18. Retrieved 2025-12-18.
  21. ^ "Improvements in MySQL & MariaDB CPU Performance With Dynimize". dynimize.com. Dynimize Inc. Archived from the original on 2024-06-23. Retrieved 2025-12-18.
  22. ^ a b "OptimizingLinux.md". GitHub. llvm/llvm-project. Archived from the original on 2025-08-02. Retrieved 2025-12-18.
  23. ^ Shen, Han; Pszeniczny, Krzysztof; Lavaee, Rahman; Kumar, Snehasish; Tallam, Sriraman; Li, Xinliang David (2023). Propeller: A Profile Guided, Relinking Optimizer for Warehouse-Scale Applications. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS '23. ACM. pp. 617–631. Archived from the original on 2025-07-02. Retrieved 2025-12-18.