Platform Engineering at ASAPP

Platform Engineering at ASAPP serves as the connective tissue between infrastructure and product development, playing a pivotal role in realizing ASAPP’s mission to deliver exceptional AI-driven solutions. By building and maintaining a robust, scalable, and flexible foundation, the team empowers the entire organization to move faster and smarter. Platform Engineering drives impact across three key dimensions:

Product Innovation – Accelerating development cycles and enabling rapid experimentation with new features.
Customer Satisfaction – Ensuring high reliability, security, and performance to deliver a seamless user experience.
Operational Efficiency – Streamlining infrastructure operations to optimize resource utilization and reduce costs.

The team’s vision is clear: to make engineering at ASAPP efficient, fast, and joyful—unlocking creativity and velocity across the company.

‍

How ASAPP built a flexible, reliable, and secure platform for enterprise-grade deployments

Platform Engineering at ASAPP

Scalable deployment systems for enterprises

Modern SaaS applications must scale not just with usage—but with complexity, security requirements, and diverse customer needs. At ASAPP, where thousands of agents rely daily on our AI-driven systems, we set out to solve a critical challenge: how to deploy quickly, safely, and repeatedly across a diverse and growing product landscape.

Here’s how we designed and implemented a scalable deployment and configuration system—one that powers thousands of deployments per day and helps us meet the needs of enterprise customers without sacrificing developer velocity.

Designing for scale and security

We built our internal deployment platform like a product—our primary users just happen to be our fellow engineers and researchers. We wanted to meet these clear set of design goals:

Flexible deployment models: From single-tenant clusters to bin-packed or logically multi-tenant setups, we needed a solution that supports varied customer patterns.
Extensibility: Our engineering teams work on everything from generative AI to telephony. We aimed to cover 90% of use cases out-of-the-box, while making the rest easy to plug in.
Best practices by default: From CI to Kubernetes manifests, we wanted default paths to enforce security, reliability, and efficiency.
Secure and compliant: Enterprise customers expect systems to align with PCI, SOC2, and internal governance. We built those requirements directly into our platform.
Cost awareness: Centralized patterns allow us to track/attribute usage and optimize cloud spend.

Our architecture: Opinionated by design

We structured our deployment system into four interconnected subsystems. Each plays a role in balancing standardization with flexibility.

1. Continuous delivery

After years of wrangling bespoke, hand-crafted CI/CD pipelines, we had a strong desire for a Golden Path approach for our CI/CD system. We adopted Codefresh to unify CI/CD with centrally defined pipelines. Rather than teams handcrafting pipelines for each service, we provide reusable, maintained templates that support different technologies—Go, Python, Typescript, and more. We treat these pipelines like open source within the company: teams like Quality Engineering and SRE contribute features, improvements, and fixes. This collaborative approach lets us standardize fast without being prescriptive.

Outcome: We consolidated 600+ unique CI pipelines down to just 25 golden-path templates. This has reduced maintenance overhead and made enhancements (like SonarQube scanning) seamless across the company.

2. Kubernetes manifests

Kubernetes offers powerful primitives—but that flexibility can lead to divergence. We use jsonnet and tanka to generate Kubernetes manifests via a shared internal library.

With just a few lines of code, teams can generate compliant, auto-scaling, observable deployments. Features like pod disruption budgets, autoscalers, and tracing are built-in by default. Developers can override defaults as needed, using wrapper patterns to retain flexibility.

Outcome: Teams no longer need to be Kubernetes experts to deploy robust services. We’ve reduced the setup time for new services from days to hours.

3. Configuration management

As our platform grew, configuration drift and manual errors became a growing risk. To fix this, we built an internal configuration management system with a hydration API that injects environment-specific configuration into manifests at build time.

Infrastructure engineers seed a store with canonical values (e.g., database URLs), and our manifest generation tool handles the rest.

Outcome: We’ve automated configuration for thousands of deployments, accelerating new customer onboarding by over 80%.

4. Deployment orchestrator

Using ArgoCD and GitOps, we built a deployment orchestrator that manages over 4,800 deployments across 45+ clusters. The orchestrator:

Groups applications that deploy together
Maps those groups to clusters and environments
Generates Argo ApplicationSets with consistent config

We layered observability and testing into the system, turning deployment into a software engineering problem—not an ops fire drill.

Outcome: Teams gain real-time visibility into deployment health, and we’ve rolled out advanced capabilities like canary deployments using Argo Rollouts.

Case study: Node rotations before and after

Compute node rotations are essential for keeping systems secure—but they used to be painful. Our old CI/CD system made these rotations manual, risky, and slow. Pod disruption budgets (PDBs) had to be added manually, service by service.

With our new system, we added PDBs to every service using just 20 lines of code in our jsonnet library. We deployed this change across all clusters in minutes, with zero disruption.

Result: Rotations are now routine, safe, and fast—freeing up SRE time and improving our security posture.

A robust deployment and configuration system isn’t just infrastructure—it’s leverage. It enables teams to move fast, stay secure, and scale without chaos. By investing early and treating deployment like a product, we’ve laid a foundation that supports rapid innovation and enterprise trust. If you're grappling with scale, compliance, or developer experience, we hope our story offers inspiration—and a path forward.

‍

Recently Published

Browse Blog

For AI agent solutions, infrastructure that allows you to test, train, monitor, and govern your AI agents before and after they go live are must-haves. Here are the three capabilities to start with.

Discover 8 high-impact use cases for AI agents in retail CX—from returns to upsells—and how to boost service speed, scale, and satisfaction.

Learn what CX leaders in financial services should expect from AI agents—and why safety and security must go far beyond the basics.

The challenge isn’t cost—it’s trust. GenerativeAgent delivers enterprise-ready AI with tools for safe testing, human review, and live monitoring.

Measuring GenAI agents isn’t about sounding human. It’s about outcomes. Here’s what to track to protect your brand and bottom line.

A generative AI agent isn’t built to mimic humans—it’s built to deliver faster, safer, more consistent results in customer service.

How Assurant is using generative AI to boost CX, empower agents, and move toward agentic AI—starting with strategy, not shortcuts.

Discover 6 powerful use cases for AI agents in financial services to boost customer service, cut costs, and scale support with confidence.

Get full visibility into GenerativeAgent’s performance with tools that surface issues, show decision paths, and drive scalable CX quality and ROI.

How Tangerine Bank is using AI to boost CX, empower agents, and redefine the digital contact center—without losing the human touch.

Is your AI agent saving you money—or just creating the illusion of efficiency? Learn how to measure real impact with the metrics that matter.

Discover how autonomous AI agents solve key retail contact center challenges—scaling service, cutting costs, and improving customer experience.

Browse Blog