Stealth • Production • Debugging
Elite engineers debug production systems without triggering alerts, downtime, or data leaks.
Real-World Techniques: eBPF for invisible inspection — attach dynamic tracing to live systems without modifying binaries.
LD_PRELOAD/LD_AUDIT interception — override specific library calls for targeted instrumentation without recompiling.
Shadow traffic mirroring — test new services against a live production stream by duplicating packets/requests off the critical path.
1) eBPF tracing, profiling & data paths (no binary changes)
- CO-RE + libbpf (portable BPF): Write once, run across kernels using BTF; loader relocates types/offsets at load time. See libbpf overview and CO-RE explainer, plus BTF docs and libbpf README.
- kprobes/uprobes/tracepoints/fentry: Choose hooks for minimal footprint & safety; lifecycle/limits covered by LWN on BPF & security and tracing primers (Julia Evans).
- USDT across runtimes: Statically defined user probes you can attach with eBPF (eBPF.io USDT, bpftrace docs, Collabora, Open vSwitch USDT guide).
- Ring buffer output (low overhead): Prefer
bpf_ringbuffor steady event export — kernel docs ringbuf, deep dive by Andrii Nakryiko, examples GitHub, and a 2024 walkthrough Hello eBPF Ring Buffers. - Crisis-mode techniques & flamegraphs: Brendan Gregg’s recent posts: Linux Crisis Tools (2024), Blog index 2024–2025, classics like off-CPU flame graphs.
- Production lessons: Cloudflare’s eBPF engineering (Tubular: fixing socket API, ebpf_exporter).
- Continuous profiling + trace correlation: Polar Signals/Parca: correlating tracing with profiling, and updates (Parca + OTel).
- K8s auto-telemetry (zero code changes): Pixie eBPF collection (how Pixie uses eBPF, HTTP/TLS capture).
Quick start patterns
- Hot function latency (kernel or user): pick fentry/uprobes; emit to ringbuf → userspace aggregator; keep sampling rates modest to avoid noisy neighbors. See ringbuf and Nakryiko.
- Cross-service “slow request” hunt: enable Parca Agent with tracing correlation; slice by trace-ID to see cumulative CPU across hops (how-to).
2) Dynamic linker interception — beyond LD_PRELOAD
- Understand the linker’s lifecycle: symbol resolution, PLT/GOT, lazy binding — LWN: dynamic linking, MaskRay on PLT, primer PLT/GOT explainer.
- Use
LD_AUDIT(rtld auditing) for stealthier hooks: loads beforeLD_PRELOAD, separate namespace, fine-grained callbacks. See rtld-audit(7), ld.so(8), Oracle docs (1, 2, 3), Arch man page notes. - Security posture & trade-offs: SentinelOne’s deep dive on using
LD_AUDITto beat traditional preloading — blog; additional research case study.
Minimal auditor skeleton (conceptual)
Export la_version() and la_symbind* to observe/redirect selected symbol bindings (e.g., connect, sendmsg) and emit metrics — not payloads — to avoid data-handling risk. Exact signatures: rtld-audit(7).
3) Shadow traffic (a.k.a. request mirroring) without touching users
- Envoy: static or header-triggered mirror policies; mirrored responses are ignored (out of band). Docs: route mirroring sandbox and API RequestMirrorPolicy. See header nuances/issues: host rewrite & Host header.
- NGINX:
ngx_http_mirror_modulecreates background subrequests; responses ignored. Docs: module, build flags configure. - Istio: first-class mirroring/shadowing tasks in service meshes — task, concepts overview, workshop lab.
- Cloud/underlay mirroring: AWS VPC Traffic Mirroring (agentless packet copies to sensors/shadow stacks) — what it is, getting started, how it works, plus AWS blog deployment tips.
Safe patterns
- Mirror requests only or scrub bodies for PII; tag mirrored traffic via headers to avoid poisoning analytics. Envoy/Istio docs above show header-based mirroring.
Field recipes (tight, copyable starting points)
- A. eBPF CO-RE trace of a user function (uprobes) → ringbuf
Build with libbpf; attachuprobesto the symbol you care about; emit a struct with timestamp/args (mind ABI!) tobpf_ringbuf_output; consume in a tiny userspace reader. References: libbpf overview, ringbuf, deep dive, examples. - B. Zero-rebuild TLS/HTTP observability
In K8s, deploy Pixie (or Parca + tracing) to capture HTTP metadata/TLS statistics without sidecars or code changes; constrain namespaces to reduce blast radius. See Pixie eBPF, Parca correlation. - C.
LD_AUDITrequest-metadata tap
Provide an auditor that logs only(dst IP, port, bytes, errno)forconnect/sendmsgbindings; ship via Unix domain socket to a local collector; avoid payload capture. API signatures: rtld-audit(7), loader behavior ld.so(8). - D. Envoy header-driven shadowing (on/off per call)
Addrequest_mirror_policieswith acluster_header(e.g.,x-mirror-cluster) so ops mirror only a subset by injecting that header. See sandbox & API. - E. NGINX multi-target mirroring
location /api/ {
mirror /_shadow_a;
mirror /_shadow_b;
proxy_pass http://prod_upstream;
}
location = /_shadow_a { internal; proxy_pass http://shadow_a$request_uri; }
location = /_shadow_b { internal; proxy_pass http://shadow_b$request_uri; }
Docs: ngx_http_mirror_module
Engineering guardrails (to stay invisible)
- Keep overhead tiny: prefer fentry/tracepoints over kprobes when possible; use ringbuf; sample, don’t stream (see ringbuf, LWN).
- Portable by design: CO-RE + external BTF if needed; don’t bake kernel offsets (libbpf overview, CO-RE).
- Don’t grab payloads: mirror headers/latencies and compute diffs server-side; Envoy/NGINX mirror ignores mirrored responses (Envoy, NGINX).
- Prefer USDT / well-known probes before deep uprobes — less churn, safer upgrades (USDT, bpftrace).
- Know your linker:
LD_AUDITis earlier/stronger thanLD_PRELOAD; understand detection trade-offs (rtld-audit, SentinelOne).
Further deep sources (hand-picked)
- Kernel & man-pages: libbpf overview, bpf ringbuf, BTF, rtld-audit(7), ld.so(8).
- Brendan Gregg (2024–2025): Linux Crisis Tools, blog index, AI Flame Graphs.
- Cloudflare engineering on eBPF: Tubular, ebpf_exporter.
- Parca / Polar Signals: Trace↔Profile correlation, Continuous profiling posts.
- Envoy / NGINX / Istio: Envoy mirroring, API, NGINX mirror, Istio mirroring.
- AWS VPC mirroring (agentless): What it is, Getting started, How it works.
Comments
Post a Comment