Apr 3

Tips and tricks to work around inaccessible observability

3 Comments

jps

Jun 16

The ideas are quite useful for cloud services as well.

But without real time monitoring, how could such systems be continuously improved based on actual usage?

The air-gap setup doesn't allow real-time monitoring. The "Pseudonymized metrics and logs" idea is the best we came up with. Do you have any other idea? The main constraint for these systems is that they are not supposed to call home by design (think military-level security).

Reply (1)

Share

jps

Jun 16Edited

I wasn't thinking about targeted optimizations but open/unbounded improvements we do on production systems. They need live metrics, infra setup and code at hand altogether.

What about recording inputs, passing them through SLM to summarize and randomize to generate stories, to build some kind of "new" journal together with pseudonymized metric and log samples, that can be manually reviewed, carried out and expanded into test data for simulation?

Obviously full metrics and logs would be far too much. So it'd be like some basic statistics and selected samples, just enough to measure and verify whether the system conditions from simulated input matche the original state.

Reply

Share

Reliability Engineering for Air-Gapped…