AppZen is the leader in autonomous spend-to-pay software. Its patented artificial intelligence accurately and efficiently processes information from thousands of data sources so that organizations can better understand enterprise spend at scale to make smarter business decisions. It seamlessly integrates with existing accounts payable, expense, and card workflows to read, understand, and make real-time decisions based on your unique spend profile, leading to faster processing times and fewer instances of fraud or wasteful spend. Global enterprises, including one-third of the Fortune 500, use AppZen’s invoice, expense, and card transaction solutions to replace manual finance processes and accelerate the speed and agility of their businesses. At AppZen, we value candidates who are actively using AI tools to enhance productivity, automate repetitive tasks, and solve problems more efficiently. Across all roles, we are looking for team members who leverage AI in meaningful ways to drive impact in their work. To learn more, visit us at .
As Manager, DevOps you will lead a devops team responsible for the AWS-based infrastructure, Kubernetes platform, CI/CD systems, production datastores (PostgreSQL, Elasticsearch, Redis, and more), and observability stack that power AppZen. You'll set technical direction, coach engineers, partner closely with Product Engineering and Security, and stay close enough to the work to tune a slow Postgres query, debug an Elasticsearch cluster under load, write Terraform, or review a Helm chart yourself.
This is a builder-manager role. We expect roughly 60% leadership and delivery management, and 40% hands-on technical contribution.
Responsibilities:
-
Manage, coach, and grow a team of 3-6 DevOps and platform engineers; own hiring, performance, growth plans, and 1:1s.
-
Set quarterly priorities aligned to engineering and business goals; communicate progress and risk clearly to leadership.
-
Build a healthy on-call culture: balanced rotations, blameless postmortems, and continuous reduction of toil.
-
Own the architecture, cost, and reliability of AppZen's AWS footprint across multiple regions and accounts.
-
Drive infrastructure-as-code standards using Terraform; champion modular, reviewable, version-controlled infrastructure.
-
Partner with Security and Compliance on SOC 2, ISO 27001, GDPR, and customer audit requirements; harden IAM, network, and secrets management.
-
Manage cloud spend: visibility, forecasting, and ongoing optimization (Savings Plans, rightsizing, multi-tenant efficiency).
-
Hands on ownership of PostgreSQL in production: schema reviews, index and query tuning, vacuum/bloat management, replication, failover, point-in-time recovery, and major-version upgrades (RDS / Aurora).
-
Run and scale Elasticsearch / OpenSearch clusters: shard and index design, JVM and heap tuning, snapshot strategy, hot-warm tiers, and incident response under heavy ingest or query load.
-
Operate supporting datastores such as Redis (caching, queues), Kafka or SQS/SNS (streaming and async), and S3-backed data lakes; define patterns for high availability, durability, and disaster recovery.
-
Partner with engineering on capacity planning, performance benchmarking, data tier cost optimization, backup/restore drills, and customer data isolation for multi-tenant workloads.
-
Operate and improve our EKS-based Kubernetes platform: cluster lifecycle, autoscaling, multi tenancy, and workload isolation.
-
Define golden paths for service teams using Helm, Kustomize, and GitOps tooling such as ArgoCD or Flux.
-
Set patterns for service mesh, ingress, and zero-downtime deployments.
-
Lead the design of internal developer platform capabilities so product teams can ship safely and quickly without infra friction.
-
Maintain and improve build, test, and deploy pipelines (e.g., GitHub Actions, Jenkins, ArgoCD); enforce supply-chain security and artifact provenance.
-
Drive measurable improvements in DORA metrics: lead time, deploy frequency, change failure rate, and MTTR.
-
Own the observability stack (e.g., Datadog, Prometheus, Grafana, OpenTelemetry); ensure consistent metrics, logs, and traces across services.
-
Define and operationalize SLOs and error budgets in partnership with service owners.
-
Lead incident command for high-severity events and convert learnings into durable systemic fixes.
What You Bring:
-
8+ years of experience in DevOps, SRE, infrastructure, or platform engineering, with at least 2 years leading or managing engineers (formal or tech-lead capacity).
-
Deep, hands-on AWS experience across compute, networking, IAM, data, and observability services; comfortable designing for multi-account, multi-region SaaS.
-
Strong production experience with Kubernetes (preferably EKS), including upgrades, autoscaling, and securing multi-tenant clusters.
-
Demonstrated hands on operations experience with PostgreSQL at scale — query and index tuning, replication, HA/failover, backups, and version upgrades — and with Elasticsearch / OpenSearch (cluster sizing, shard strategy, ingest tuning, and incident response).
-
Working knowledge of additional datastores commonly used in SaaS: Redis, Kafka or other message brokers, and object storage; comfortable evaluating tradeoffs between managed services (RDS, Aurora, ElastiCache, MSK, OpenSearch Service) and self-managed options.
-
Proficient with Terraform and modern IaC patterns; clear opinions on module design, state management, and PR-driven workflows.
-
Solid scripting and automation skills in at least one of Python, Go, or Bash.
-
Track record of designing and operating CI/CD pipelines at scale (GitHub Actions, Jenkins, ArgoCD, or similar).
-
Experience running production workloads under SOC 2 or comparable compliance frameworks; comfortable partnering with Security on audits and remediation.
-
Excellent communication and stakeholder skills; able to translate infrastructure tradeoffs into language product, finance, and customer teams understand.
Nice-to-Have:
-
Experience supporting AI/ML or data heavy SaaS workloads (GPU fleets, vector stores, large async pipelines).
-
Familiarity with service mesh (Istio, Linkerd) and progressive delivery (Argo Rollouts, feature flags).
-
Background scaling FinOps practices and managing cloud spend at $5M+ annual run-rate.
-
Experience operating multitenant SaaS with strict data isolation requirements for enterprise finance customers.
-
Exposure to multi-cloud or hybrid-cloud environments (Azure, GCP).
$240,000 - $280,000 a year
AppZen is committed to fair and equitable compensation practices.
The base pay range for this role is posted above. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to skill set, depth of experience, certifications, and specific work location. This may be different in other locations due to differences in the cost of labor.
The total compensation package for this position may also include annual performance bonus, stock, benefits and/or other applicable incentive compensation plans.
We are an equal opportunity employer and value diversity. All employment is decided on the basis of qualifications, merit and business need. You can find our Privacy Notice linked on the bottom of our website.We may use artificial intelligence tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans.