Embracing Black Swans in system architecture

Back in 2015, while preparing for a senior architecture role, I learned that decision making under uncertainty was a core skill to learn. At the time, I interpreted this as: we know what we know, we know what we don’t know, and we must accept some uncertainty while iteratively reducing risk over time. After more than a decade of architectural work, I realize I missed a crucial aspect: uncertainty isn’t only about manageable risks — it also encompasses events that can turn the entire system upside down.

Examples are everywhere. The rapid rise of the internet in the 1990s reshaped business models and forced organizations to rethink their technology strategies. The emergence of cloud computing in the late 2000s transformed how systems are built and deployed, catching many legacy vendors off guard. Escalating cybersecurity threats continue to reshape infrastructure and IoT practices. Covid19 produced wide-ranging effects, from health impacts to remote work adoption and global supply chain disruptions. Unfortunately increasing military conflicts add further significant instability (beyond the horrible sufferings). And of course, there are countless “micro” disruptions: a critical colleague leaving, organizational changes, a supplier discontinuing support, or sudden market shifts.

N. N. Taleb develops these ideas in his Incerto series. He argues that modern life does not follow neat Gaussian distributions. Rare, extreme, and unexpected events — Black Swans — shape our world far more than routine occurrences. Traditional risk management cannot adequately prepare us for them. Life is far more random than we assume, and our systems must embrace this reality through optionality, tolerance for failure, and a refusal to rely on predictability. In doing so, systems become Black‑Swan‑robust (never fully Black‑Swan‑proof), and in some cases even antifragile — capable of improving under stress.

The software industry has intuitively adopted parts of this mindset in fault tolerance and resilience. Chaos engineering — with Netflix’s Simian Army as a well-known example — shows that systems become more reliable when regularly exposed to stressors. In organizational and evolutionary aspects, our industry still lacks a mature understanding of how to achieve Black Swan-robustness. We have developed instincts: iterate, separate interface from implementation, release early, fail fast, defer commitment, apply separation of concerns, follow YAGNI and KISS. I came across Barry O’Reilly’s work on antifragile architecture, and while he still has an emphasis on fault-tolerance, he is digging into the areas I’m more interested in by bringing in business and technology aspects next to the operational concerns. He is also validly referring to the distinction of complicated and complex defined in the Cynefin Framework.

In this post, I attempt to extend these perspectives on why these instincts help and bring the thoughts to a practical conclusion.

Modular system architecture to contain impact

System architecture consists of design decisions that affect multiple products and are difficult to change. I define modularity here as loose coupling between architectural decisions: changing one should not require changes to many others. This does not make any decision “easy to change”, but it keeps the impact local, minimizing ripple effects.

Consider a few examples:

(Starting with an obvious one) Defining system interfaces in a language independent way — using REST, Thrift, gRPC… — shields us from changes in Java, Go, Python, and similar ecosystems.
(A less obvious one) Using protocol-agnostic interaction and data models in industrial communication minimizes system-wide impact when one protocol changes. The Web of Things model is a promising approach, enabling polyglot fieldbus systems through unified interactions and web friendly data models.
Addressing different system capabilities using distinct mechanisms: configuration, runtime/operations, state persistency, communication, user interface. A separation of concerns brings not only the usual benefits of choosing the best option for every capability and reducing exposure to compatibility breaks but also prevents ripple effects when some of the capabilities are impacted by a Black Swan.
Separating internal and external data exchange conceptually, each with its own expectations of performance, precision, and compatibility.

The idea is inspired by compartments (known in shipping and fire safety). The loose coupling between the architecture decisions allows us to limit the impact to a local zone, much like how a ship can contain the flooding in a certain part and how a fire door can save the rest of the building from the spreading fire. We should be extremely careful about global decisions affecting the whole system. They are tempting because they seem simple and uniform, but they can introduce extreme fragility.

Modular system architecture provides additional advantages — enough to warrant a dedicated post in the future.

Core asset platforms to provide optionality

Certainly, core asset platforms are already a topic for large organizations. Lowering cost via reduced duplication, higher quality, shared expertise, and consistent operations are the usual (and valid) arguments. The pitfalls are also widely recognized: inflexible platforms, cross-team dependencies, ivory tower core assets, cross-team transparency & trust issues… Success depends on steady and competent leadership. I’d like to bring in another perspective in this post: core asset platforms are one of the key tools we have for navigating unforeseen risks and unexpected opportunities.

Taleb’s background is in option trading; so, he uses the investment portfolio frequently as an example in his books. He argues against a traditional evenly distributed diversification and recommends a “barbell” strategy that invests mostly in highly safe assets, while investing a smaller fraction in high risk & high potential gain items.

Unlike financial investments, architectural investments are extremely inflexible. It’s easy to move money from one asset to another; it’s not easy to rewrite features, redesign product and system architecture, or refactor years of accumulated code. Startups can obviously pivot thanks to their small scale, but the situation is different for a large enterprise:

products must be maintained for existing customers
engineering organizations are large and interconnected
codebases are vast and complex
dependencies accumulate across the portfolio

So, what can large enterprises do?

This is where core asset platforms become vital.

A well‑designed platform introduces a healthy separation between:

core capabilities that should remain stable across long time horizons, and
product‑specific implementations that must adapt quickly to changing markets.

This separation creates sort of a barbell structure and brings optionality—the ability to respond to unexpected risks or opportunities (Black Swans) without risking the entire offering.

A core platform should be guided not by individual product needs but by the overall strategy of the organization. It is the long‑term, durable foundation.

Meanwhile, product teams take these capabilities and package them into offerings tailored to specific markets or customer needs. These offerings may need to evolve rapidly or even be replaced or deprecated.

The key assumption here is powerful: Black Swans are unlikely to invalidate your foundational capabilities—but they will absolutely disrupt your product offerings.

For example, in an IoT context, capabilities like connectivity, device provisioning, secure software update, deployment infrastructure, trust establishment, licensing & monetization etc. are slow‑changing and strategically essential.

Meanwhile, market-specific bundling, key features, UX and workflows must adapt quickly — and sometimes drastically. With a strong platform, rewriting a product might risk 10% of your investment instead of 100%. This does not just reduce risk; it increases agility. Experimentation, bet-based approaches, and “fail fast” thinking work well on the product side — but you don’t want to put your entire offering on a bet.

Like the modular system architecture elaborated above, it is also essential that these capabilities have a loose coupling among themselves because a high interdependency means that Black Swans can lead to global effect, risking most of the portfolio. So, a modular core asset platform to support the modular system architecture is essential.

Robust to antifragile

My claim in this post is about making system architectures robust against Black Swans. Taleb makes a threefold distinction: fragile (harmed by stress), antifragile (improved by stress) and robust (not influenced by stress).

Can we go beyond robust and make system architectures antifragile? I think in resilience and Chaos Engineering, we’re certainly moving in this direction. For the evolutionary and organizational aspects, on which I focused in this post, I’m not sure yet, but there is certainly reason for hope:

A modular core asset platform that is organically pruned and replenished in relation to evolving technology and market (like a tree adapting to changing sunlight in its environment) will grow stronger over time.
A learning culture in the organization that understands that architecture is not a fixed set of decisions, but one that needs to evolve and adapt, will improve its decision making by deciding confidently and not prematurely, and revisiting decisions when necessary.
A well-maintained core asset platform will save the organization money and bring new revenue by leveraging new opportunities quickly. If some part of this is invested properly back in the platform, a virtuous cycle is established.

In general, unlocking antifragility requires feedback loops that make architecture, platform, and offerings improve with each iteration. Whether this can be fully achieved remains to be seen — we need more real‑world evidence.

Putting it into practice

To turn these ideas into action:

Treat architecture as a set of interconnected design decisions. Analyze how tightly they are coupled.
Divide software investment into a stable core and a dynamic, experimental layer, and maintain a healthy separation.
Establish feedback loops so architecture, platform, and products evolve together.
Do not try to predict randomness — embrace it.

This is my first attempt to connect Black Swan theory with software development. I look forward to readers’ perspectives and contributions — let’s add a bit of randomness and sharpen the ideas together.

Doğan Fennibay's blog

on software & system architecture

Embracing Black Swans in system architecture

Modular system architecture to contain impact

Core asset platforms to provide optionality

Robust to antifragile

Putting it into practice

Leave a Reply Cancel reply