Troubleshoot issues across the entire stack - hardware, software, application, and network Work to improve the reliability... and performance of the next generation of distributed systems and containerized deployments * Work to improve the reliability...