Cutting startup work without touching the kernel
A controlled look at how choosing a lighter representation of the same workload cut application-level startup work from 4.45s to 1.00s — and exactly where that optimization stops.
Chan Meng2 min readWhen a compute workload starts cold, a surprising amount of its startup time goes into work that has nothing to do with producing the answer: importing heavy libraries, deserializing model objects, and rebuilding in-memory structures that were thrown away when the last process exited.
We wanted to know how much of that we could remove at the application level — without privileged containers, custom node pools, or kernel-level checkpointing.
The same answer, lighter representations
The key observation is that one workload can produce identical output from progressively lighter representations of itself. A model loaded through its full framework is the heaviest form. The same model expressed as plain numerical arrays is lighter. Expressed as pure standard-library data and arithmetic, it is lighter still — and, for the right class of workload, it returns the same result.
Walking down that ladder, in a controlled measurement, application-level startup went from 4.45s to 1.00s — a 78% reduction in application-level work — with the output validated as equivalent at each step.
The boundary, stated plainly
These are real, controlled, application-level measurements — repeatable, not a mockup. They are not a production SLA and not a complete end-to-end activation time. The 78% describes application-level work only; it does not remove the infrastructure cost that surrounds it. Pod scheduling, container creation, and image layers still dominate the full cold path, so the end-to-end improvement is far smaller than the application-level number alone suggests.
Saying that out loud matters. A measurement describes a setup; strip the setup away and you are no longer reporting a result, you are implying one you never tested.
Finding the floor
The lightest representation — pure standard library, no numerical framework — is also where this technique stops. Below it, the remaining time is the language interpreter booting and the container being created. You cannot shave those at the application layer; going further means kernel-level checkpoint/restore or deeper infrastructure changes, which carry their own cost and operational trade-offs.
That is the useful conclusion, not a disappointment. Knowing precisely where application-level optimization ends tells you exactly which problems belong to the app and which belong to the platform — and lets you spend effort where it actually moves the number, with the boundary attached every time you report it.

Chan Meng
Founding Principal Engineer — Activation, Execution and AI Systems
Activation architecture, execution systems, AI-assisted orchestration, technical proof development, telemetry, and system hardening.