- Home
- Remote Jobs
- Senior High Performance Computing Administrator
Date Posted:
7/4/2025
Remote Work Level:
Hybrid Remote
Location:
Hybrid Remote in New Haven, CT
Job Type:
Employee
Job Schedule:
Full-Time
Career Level:
Experienced
Travel Required:
No specification
Education Level:
Bachelor's/Undergraduate Degree,Professional Certification
Salary:
$81,900 - $163,425 Annually
Categories:
IT, Education & Training, Research, Cyber Security, System Administrator
Benefits:
Career Development
About the Role
Title: Senior High Performance Computing Administrator
Location: New Haven United States
Job Description:
Working at Yale means contributing to a better tomorrow. Whether you are a current resident of our New Haven-based community- eligible for opportunities through the New Haven Hiring Initiative or a newcomer, interested in exploring all that Yale has to offer, your talents and contributions are welcome. Discover your opportunities at Yale!
Salary Range
$81,900.00 - $163,425.00
Overview The Yale Center for Research Computing (YCRC) is looking for a versatile system administrator/engineer to help ensure that Yale's exceptional faculty and students have the infrastructure they need to propel discovery and scholarship to improve the world. Join our growing team of system specialists, research facilitators, and project administration experts, focusing your work especially on GPU infrastructure enhancements and improvements as part of Yale's comprehensive campus investment in AI.As an experienced subject matter expert, you will help lead the system design, deployment and support of YCRC's AI-focused research cluster and storage infrastructure. This role is both systems- and researcher-facing, so frequent interaction with other systems team members, research support specialists, and researchers is a routine part of the job. You will be expected to stay current on developments and trends in accelerator and overall high performance computing technologies, processes, and methodologies. We will look to you for insights on evolving tradeoffs in areas such as accelerator-based memory, precision, interconnects, power consumption, and cost. This is a hybrid position, with YCRC's office space being on the Yale campus. As part of the systems team, you will be expected to provide on-site equipment maintenance as needed. Infrastructure is hosted at a Yale data center in West Haven, CT, and at the Massachusetts Green High Performance Computing Center (MGHPCC) in Holyoke, MA. Required Skills and Abilities 1. Expertise in administration of HPC Linux clusters, including managing and configuring cluster provisioning and management tools, and batch scheduler. 2. Experience with high-speed networking such as InfiniBand and high-speed Ethernet. Experience with large storage systems and parallel file systems such as GPFS and Lustre. 3. Expertise in Linux system administration, including managing the operating system, networking, storage, and security. Expertise in automation and scripting in at least one scripting language. 4. Ability to work in a team environment in a fast moving technology field. Excellent verbal and writing skills. Ability to interact well with team members and end users. Ability to work independently and across units. 5. Attention to detail. Ability to take the care necessary to be entrusted with a system that hundreds of users depend on for research computation and the storage of research data. Preferred Education, Experience and Skills Experience with GPUs. Ability to specify new systems especially for AI and ML. Experience configuring, deploying, supporting large-scale systems in a research environment. Expertise in computer security in large, multi-user Linux environments. Experience with remote admin, installing and trouble-shooting hardware. Expertise securing large Linux environments.
Principal Responsibilities
Design, implement and advance core HPC systems such as the HPC provisioning system, the resource-management system, account/user lifecycle management, and user authentication and authorization systems. Design, configure and support HPC clusters. Install, administer and maintain hardware, system software, networking, accounts, and security measures. Design and implement our parallel storage and backup systems in collaboration with team. Diagnose and correct system issues, whether these be issues with correct operation or performance. Reinstate integrity of systems as quickly as possible following an outage in order to minimize downtime. Triage and solve user-submitted tickets, especially when they relate to infrastructure. Track resource usage using monitoring and queuing software. Develop and maintain documentation for team members and end users. Research developments in HPC architecture and new technologies, processes, and methodologies. Patch system firmware and software as needed. Determine specifications for new systems, and tailor these to meet business needs (together with team). Conduct training and user education. Perform other duties as assigned. Required Education and Experience Demonstrated ability to design, implement, and maintain a local, customized implementation and configuration of a core HPC system such as the HPC provisioning system, the resource-management system, account/user lifecycle management, or user authentication and authorization systems. Experience with technology in a research environment. Experience with backup of large storage systems. Expertise in computer security, preferably in the context of large, multi-user Linux environments. Experience with remote administration using tools such as IPMI. Experience in a data-center environment, installing and trouble-shooting hardware. Professional certifications related to the above. Graduate degree in a related field
Background Check Requirements
All candidates for employment will be subject to pre-employment background screening for this position, which may include motor vehicle, DOT certification, drug testing and credit checks based on the position description and job requirements. All offers are contingent upon the successful completion of the background check. For additional information on the background check requirements and process visit "Learn about background checks" under the Applicant Support Resources section of Careers on the It's Your Yale website.
Health Requirements
Certain positions have associated health requirements based on specific job responsibilities. These may include vaccinations, tests, or examinations, as required by law, regulation, or university policy.
Posting Disclaimer
Salary offers are determined by a candidate's qualifications, experience, skills, and education in relation to the position requirements, along with the role's grade profile and current internal and external market conditions.
The intent of this job description is to provide a representative summary of the essential functions that will be required of the position and should not be construed as a declaration of specific duties and responsibilities of the position. Employees will be assigned specific job-related duties through their hiring department.
The University is committed to basing judgments concerning the admission, education, and employment of individuals upon their qualifications and abilities and seeks to attract to its faculty, staff, and student body qualified persons from a broad range of backgrounds and perspectives. In accordance with this policy and as delineated by federal and Connecticut law, Yale does not discriminate in admissions, educational programs, or employment against any individual on account of that individual's sex, sexual orientation, gender identity or expression, race, color, national or ethnic origin, religion, age, disability, status as a special disabled veteran, veteran of the Vietnam era or other covered veteran.
Inquiries concerning Yale's Policy Against Discrimination and Harassment may be referred to the Office of Institutional Equity and Accessibility (OIEA).
Note
Yale University is a tobacco-free campus.