Job posting has expired
Senior Software Engineer
![]() | |
![]() United States, Texas, Irving | |
![]() 7000 State Highway 161 (Show on map) | |
![]() | |
OverviewThe Microsoft Azure Artificial Intelligence and High Performance Computing (AI and HPC) team is seeking systems engineers to support customers in deploying, monitoring, profiling, and debugging their applications on hyperscale cloud infrastructure. Azure is enabling some of the largest supercomputing deployments in the public cloud, as demonstrated by its presence in rankings such as Top500, Machine Learning Performance (MLPerf), and Graph500.Operating at supercomputing scale requires specialized tools and techniques to ensure system reliability, runtime performance, and job health, while continuing to meet customer Service Level Agreements (SLAs). In this role, you will develop and apply advanced tools, identify operational gaps, and implement features that support the smooth operation of cloud-native supercomputers.As a Senior Software Engineer, you will help establish best practices, influence architectural decisions, and contribute to the roadmap of key software and hardware components. Your work will directly impact a broad range of users and drive the next wave of innovation in artificial intelligence and high performance computing in the cloud.Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesBe part of a comprehensive systems management team focused on operational excellence and customer success.Build tools and analyze key system metrics and telemetry to proactively identify and debug HPC system issues.Partner with customers, vendors, and other teams within Azure to drive comprehensive solutions for operating world class Supercomputers in the public cloud environment.Help ensure Azure platform is consistent on performance, can scale on-demand, and engineered to withstand the unparalleled computing demand from the customer workloads.Contribute to a test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality. |