Last updated 19 Nov 2020

On micro-frontends

My time at Indeed focused on improving front-end development practices and tooling across product teams. It was a journey from stabilization to remediation to modernization to, finally, innovation. In the process, I got to build and grow a team, and gain new skills in managing managers and setting product strategy and vision.

My capstone project, as it were, was a micro-frontend platform. Over the course of just a few quarters, it radically improved the speed with which teams could deliver and iterate on web UI, and radically shifted the ownership model of that UI. I wanted to spend a few minutes reflecting on that effort and some of the lessons I learned.

What the platform does #

The platform enables a “host” page to incorporate user interface and functionality that is provided by services called “providers.”

A host that implements the platform can specify regions where providers can be displayed within a page; at runtime, the platform 1) determines which providers might want to display in a given region, 2) brokers the request to those providers, and 3) converts their response into content that can be incorporated into the HTML response of the host page.

There are a lot of details hidden in those couple of sentences, but fundamentally, the system is incredibly simple: providers are just services that respond to HTTP requests with JSON that includes the initial HTML for the provider, along with URLs for the JavaScript and CSS for the provider. Hosts integrate a Java library that handles load balancing the requests to provider instances, and incorporate the HTML, CSS, and JavaScript from provider responses into the HTML returned by the server.

Technical challenges #

As we were building the system, some of the bigger technical challenges were around ensuring that product quality and functionality didn't suffer as we facilitated a new UI ownership model. We also needed to make sure that we could change the overall state of the platform without having to touch individual hosts or providers.

One of the most obvious technical challenges hearkened from my third-party JavaScript days. The browser’s runtime environment is shared among all of the code that runs within the page. That means that provider JavaScript or CSS can plausibly interfere with other JavaScript or CSS on the page — in an extreme case, provider JavaScript could delete the entire contents of a page. We had to balance legitimate provider needs with the imperative that a provider could not break critical functionality, which sometimes resulted in frustration for provider teams.

Resiliency was another big concern. If the the provider that displays job search results is slow or broken, job search results still need to display. The platform introduced the notion of “fallback content,” allowing providers to regularly register content that will be used if their provider fails to respond. This fallback content can include JavaScript that can fetch data and then render HTML in the browser, ensuring that users can still receive a functional UI even in the event of provider failure.

To mitigate the need for frequent host changes, we introduced a configuration artifact that could be updated without requiring a host deploy, and hosts would pick up the changes within minutes of the change being published. This artifact specified which providers should appear on a page, and where. With this mechanism, host pages could pre-define “zones” where providers were allowed to appear, and providers could be added and removed from those zones via the configuration. We later added the ability for hosts to revise the configuration at runtime, based on their knowledge of the context for the specific page view.

Non-technical challenges #

Just as we were rolling out the first prototype of the system, I had a realization: the biggest challenges we were going to encounter in securing broad adoption of a micro-frontend platform would be non-technical. Over the following two years, my hunch proved to be right.

Probably the biggest non-technical challenge was governance. When a single product team owned the entirety of a page, every decision about changes to that page went through that team. A micro-frontend platform fractured that ownership model; we had to develop processes that let the old owners feel confident that their pages wouldn’t change radically without their knowledge, while preserving the velocity gains for teams that now owned individual pieces of UI.

We talked a lot about “the pink button problem” — what would happen if a provider team violated a host team’s trust while trying to optimize for their own metrics? There were some technical guardrails we could impose, but it was easy to imagine behavior that would defy easy technical prevention. In reality, this largely proved to be a theoretical concern, not a practical one. Yes, there were some times that providers did surprising things, but because everyone worked at the same company and ultimately shared the same goals and mission, those situations were relatively easy to solve through conversation. Investing in technical solutions might prove necessary someday, but I’m glad we didn’t do it too early.

The other challenge that stands out was coaching provider teams on operations and ownership. From the beginning, we made sure that the platform ecosystem was highly observable: every provider and host team had real-time visibility into errors, latency, requests, cache misses, and more. What we quickly found is that provider teams were unaccustomed to worrying about these things, because the host teams had historically borne this responsibility.

While the platform is quite resilient to provider failure, the platform team continued to spend time escalating failures that provider teams weren’t noticing. It’s hard to know what the platform team can safely ignore — is that spike in provider errors due to a bug in the platform, or the provider? We never wanted a provider team to be the first to discover a bug that was ours, but we also couldn’t afford to be the first line of oeprational defense for provider teams. It’s a needle we still hadn’t figured out how to thread by the time I left.

To micro-frontend or not? #

The more successful the micro-frontend platform was, the more teams were coming to us and asking if they should use it — especially as they were creating whole new products and experiences. The answer was often “no,” because the Miroservice Premium applies to micro-frontends, too:

[M]icroservices introduce complexity on their own account. This adds a premium to a project’s cost and risk – one that often gets projects into serious trouble. – Martin Fowler

In my experience, a micro-frontend architecture has clear benefits when certain things are true.

First and foremost, a micro-frontend architecture seems to be most suited to an environment where web pages are deployed via a number of distinct codebases — that is, I’m not sure that a micro-frontend architecture makes sense in a monorepo. Distinct codebases alone aren’t a reason to move to a micro-frontend architecture. Here are some other indications that it might be appropriate:

There is agreement that independent product ownership of pieces of UI is acceptable for reasons other than delivery velocity. A micro-frontend approach makes sense if a product manager can draw a box around a thing and say, “I’m entirely OK if someone else makes decisions about what’s inside this box without talking to me, as long as this box continues to generally be focused on <concept>.”
There is substantial UI tech debt in the “host” codebase, and in-place remediation is broadly understood to be prohibitively expensive or disruptive. For example, in many legacy UI codebases, introducing the test automation required to enable continuous integration and delivery would require a wholesale rewrite of the UI. This argument is most compelling if there’s evidence that people have tried and failed to make progress in the past.
There is a strong case for the reuse of a piece of UI functionality across multiple surfaces. This UI functionality should be non-trivial and substantially similar across those surfaces. It’s possible for reuse to be managed via a library, but outside of a monorepo, this presents challenges of deployment coordination. Managing reuse via a service allows for centralized deployment of changes.

Unless at least one of these things is true, a micro-frontend approach may introduce complexity without benefit. In a greenfield project, perhaps the only reason to incorporate a micro-frontend architecture is for reuse — there is no tech debt yet, and there are other ways to achieve independent UI ownership without incurring the microservice complexity.

I strongly recommend againt incorporating a micro-frontend architecture until you can concretely describe the benefits you expect to achieve, and why those benefits are difficult or impossible to obtain otherwise. It can be a hard decision to undo once you head down the path.

Surprises and opportunities #

I always expected that provider teams would see velocity benefits, but I was surprised at the impact on host teams. Today, they’re largely spared from reviewing, releasing, and supporting external contributions, which has given them the bandwidth to invest in velocity improvements of their own. Host team velocity has more than doubled. Some host teams have chosen to author new functionality via providers rather than in the existing codebase, increasing their velocity even further.

One of the biggest unaddressed pain points is the reliance on libraries as a distribution mechanism for the platform itself. Provider and host teams tend to be risk-averse when it comes to updating libraries; “pinning” to a specific version is broadly viewed as acceptable, and there is no Indeed-standard mechanism for triggering upgrades. Driving adoption of a new version requires communication and, sometimes, cajoling. There’s an opportunity to deliver more value faster by shifting the bulk of the client library responsibility to a centralized service that the platform team can iterate on without requiring host deploys. On the other hand, this would introduce a new layer of resiliency risk that might not be acceptable.

While the platform we built was explicitly intended to be incorporated into existing applications, where those applications are ultimately responsible for serving the HTML for a page, I can definitely see a future where the platform is extended so that it is serving the pages, and those pages are composed of a lightweight shell — responsible for overall page layout and the page’s header and footer, say — and the rest of the page is then populated with providers.

In summary #

Two years of working on a micro-frontend platform gave me a ton of lived experience with the tradeoffs of a micro-frontend architecture. There's far more context than I can share here about why and how this approach was so well suited to deliver huge velocity improvements for UI development at Indeed — even small changes in that set of circumstances might have led to a very different result. Regardless, I hope that my experience might inform your own views on whether and how this approach might make sense for you.