, logging, monitoring, and observability standards across applications. Accountable to institute non-functional requirements.../support operations work via SRE workflow automation Be the key architect for the SRE orchestration platform design...
responsibilities for coverage of critical applications. Sees problems as opportunities to improve Architect and implement... observability platforms and tools for proactive detection and continuous improvement. Lead the design and development of core...
software and systems engineering expertise to architect, build, and run large-scale, massively distributed, fault-tolerant... available, resilient, and scalable infrastructure and application architectures. Advanced Automation & Tooling: Architect and develop...
and Kubernetes. Hands-on experience with observability and monitoring tools (Prometheus, Grafana, AWS CloudWatch, ELK/Opensearch... APIs for seamless communication across front-end, back-end, microservices, and external platforms. Architect, deploy...
: Design and implement scalable, resilient cloud infrastructure to support AI/ML workloads. Develop and maintain observability... (Docker, Kubernetes). Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK, Datadog). Understanding...
reviews to strengthen platform resilience. Build and optimize observability and monitoring frameworks (CloudWatch, Grafana... Architect – Professional, DevOps Engineer – Professional, or SysOps Administrator. Experience with observability...
with observability and monitoring tools (CloudWatch, Grafana, Loki, Tempo, Prometheus). Familiarity with multi-cloud or hybrid-cloud... analysis, and post-incident reviews to strengthen platform resilience. Build and optimize observability and monitoring...