I build systems that don’t break.

Building scalable, reliable systems using modern cloud infrastructure, ensuring seamless operations and robust incident readiness.

Current focus:Terraform▍

View Projects Download Resume Contact

System Status: All services operational

Last incident: resolved • postmortem completed • learnings shipped

SLO: 99.9% • Error budget healthy

Platforms Handling $4B+ daily trades

100+ microservices deployed

MTTD(Mean Time to Detect) ↓ 40%

RTO(Recovery Time Objective) < 30m

About

A bit of story + a bit of philosophy.

I’m Shivang — a DevOps/Site Reliabilty engineer who thrives on building cloud-native platforms that are not just reliable but also efficient. From designing systems on AWS and Azure to implementing Kubernetes and Terraform solutions, I specialize in creating environments that run seamlessly under pressure.

My approach blends technical expertise with design thinking: I focus on creating systems that not only perform well but also give teams the tools they need to operate with confidence. Whether it’s through clear, actionable alerts or responsive dashboards, my goal is to make systems both powerful and simple to use.

Cloud InfrastructureAutomation & CI/CDIncident ManagementHigh Availability & Scalability

What’s new I’m working on:

Multi-region infrastructure
Advanced SLOs and error budget strategies
Serverless Architectures

How I think (and build)

Flip the switches. That’s basically my mindset in production.

Automation > Manual Ops

toggle to expand

I turn repeatable work into pipelines, scripts, and IaC—so humans focus on decisions, not clicks.

Observability First

toggle to expand

Metrics + logs + alerting before scale. If we can’t see it, we can’t own it.

SLO-driven Reliability

toggle to expand

Secure by Default

toggle to expand

Projects (the real work)

Short, outcome-focused, and production-relevant.

Modernized a $4B+ daily trade platform, enhancing runtime stability and performance.
Monolith → 100+ microservices on AWS EKS for scalability and faster releases.
Standardized deployments using ArgoCD + Jenkins across environments.

Toolbox

Tools I’ve used in production (not just “familiar with”).

Cloud

AWS (EKS, VPC, IAM, ALB/NLB, EC2, S3, CloudWatch)
Azure (DevOps, Backup, DR)

Platform

Kubernetes (EKS)
Docker
Helm
Ingress Controllers
Autoscaling

Delivery

Jenkins
ArgoCD
GitHub Actions
GitOps workflows

IaC

Terraform
Modular stacks
State management

Observability

Prometheus
Grafana
Loki
Alertmanager
SLI/SLOs

Security

RBAC/IAM
Audit trails
SIEM monitoring
WORM retention

Let’s build something reliable

If you’re hiring for DevOps/SRE, I’m happy to share deeper architecture + incident learnings.

Email LinkedIn GitHub