Runpod, Inc. Technical Program Manager, Data Center Services Remote · Full time Company website

We are in search of a Technical Program Manager to oversee our relationships with data center providers, manage hardware installations, and ensure our data center resources operate smoothly and effectively. This role is highly technical, centered on data center management, and resembles a senior account manager position. At RunPod, you'll engage with the latest technologies, make a global impact, and thrive in a supportive, innovative environment.

Description

Job Title: Technical Program Manager, Data Center Services

Full time, Remote

Reports to: Head of Infrastructure

Salary Range: 110 - 150k


Company Overview:

Join RunPod, a rapidly expanding leader in AI and machine learning. As a well-funded, profitable company in hyper-growth, we operate in a fully remote environment with team members across the United States, Canada, and internationally. We are looking for a Technical Program Manager to manage our partnerships with data center providers, oversee hardware deployments, and ensure efficient and effective operation of our data center resources. It is a very technical, data center focused, senior account manager position.  At RunPod, you can work with cutting-edge technology, contribute to a global impact, and be part of a supportive, innovative culture.


What We Offer:

  • Competitive salary with stock options.
  • The flexibility of remote work with an inclusive, collaborative team.
  • An opportunity to grow with a company that values innovation and user-centric design.
  • Generous vacation policy to ensure work-life balance and well-being.
  • A chance to contribute to a company with a global impact based across multiple continents.


Why You'll Love Working with Us:

  • Your work will directly influence the customer experience and contribute to the success of our product.
  • You'll join a passionate team reshaping the cloud computing landscape.
  • We offer a supportive, challenging environment that fosters personal growth and learning.
  • Our culture is built on trust, innovation, learning, and growth, ensuring a fulfilling and enriching career experience.


Responsibilities:

  • Onboard and Guide Data Centers: Train new data center personnel for operational processes, emphasizing essential considerations.
  • Develop and Maintain Relationships: Foster strong relationships with existing and potential data center staff, meeting regularly to ensure alignment and address any issues.
  • Ensure Quality and Standards: Maintain high-quality servers, uptime, and performance, continuously improving quality and setting new standards for minimum requirements.
  • Plan and Execute Deployments: Decide and plan the deployment of new network storage locations and verify that newly onboarded servers meet specific specifications. Develop processes to optimize deployment cadence and defect rates. 
  • Coordinate Maintenance and Updates: Help providers plan and execute data center-wide maintenance or downtime, ensuring proper customer communication. Manage server updates to keep CUDA, kernel, and OS versions current.
  • Maintain Documentation: Collaborate with the technical writing team to keep host documentation current and relevant.
  • Legal and Communication: Oversee the signing of contracts with new providers and communicate any downtime or service disruptions to the community.
  • Capacity Planning and optimization: Collaborate with Core Engineering, Data Science, and Marketing to ensure that the capacity roadmap matches RunPod’s growth curve. 


Required Knowledge, Skills, and Experience:

  • Educational Background: A Bachelor’s degree in a relevant field is required. A Master’s degree or higher is an asset.
  • Technical Expertise: In-depth knowledge of data center operations, hardware solutions, and infrastructure design.
  • Experience: Proven experience managing relationships with data centers and overseeing hardware deployments.
  • Proficiency: Proficiency with data center management and monitoring tools; expertise in leveraging Linux GPU servers. SQL or Snowflake experience for high level reporting is a plus. 
  • Leadership and Communication: Strong leadership skills, excellent communication, and the ability to foster provider relationships.
  • Problem-solving and Project Management: Strong problem-solving abilities and the capability to manage multiple infrastructure projects strategically.


How to Apply:

We want to hear from you if you're ready to join a dynamic team and significantly impact the AI and machine learning industry! Please submit your resume and a cover letter detailing your experience and why you're the perfect fit for this role.


Non-Discrimination in Hiring Practices

RunPod is committed to maintaining a workplace free from discrimination and upholding the principles of equality and respect for all individuals. Our hiring practices are designed to ensure fairness, objectivity, and inclusiveness, adhering to all applicable laws and regulations regarding nondiscrimination.


RunPod strictly prohibits discrimination in any aspect of employment, including recruitment, hiring, training, promotion, compensation, benefits, and termination, based on race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or any other legally protected status. This policy applies to all employees, contractors, consultants, temporary workers, and job applicants.

Salary

$110,000 - $150,000 per year