UNIX / HPC Systems Administrator

University of Maryland

Baltimore, MD

ID: 7257470
Posted: June 24, 2024
Application Deadline: Open Until Filled

Job Description

Position Overview:
The UNIX HPC Administrator is part of a team that is responsible for the day-to-day operations of the research computing infrastructure managed by the Division of Information Technology.

The position will utilize modern system administration techniques such as orchestration, automation, and containerization to build software, deploy updates, provision new systems, manage configuration states and troubleshoot existing deployments. Systems will exist in physical, virtual and cloud environments.

Why Work at UMBC?
UMBC offers competitive compensation. This role starts at $100,000 and has over 4 weeks of vacation for regular full time roles. Tuition remission is also available.

What is it like to work at UMBC? Check out Glassdoor or Indeed. And read about our recent award, UMBC is a 2023 Great College to Work for…. In every category.

Telework:
A hybrid telework schedule is available!

Responsibilities:
Specific responsibilities include:

Work closely with various IT groups and departments, including but not limited to, Networks, Windows Administration, Security, Application, and Middleware Administration to assist in overall architectural design, implementation and troubleshooting
Develop and maintain operational guidelines for the maintenance and support of the HPC/Research environments
Design, troubleshooting, and maintenance of the following Software: NVIDIA Bright Cluster Manager, Red Hat Enterprise, Slurm
Linux system administration, security patching, OS upgrades, troubleshooting problems, and ensuring maximum availability
Work both independently and collaboratively with teams to troubleshoot service issues
Assist researchers with software builds, environment configuration, and technical support
Provide excellent customer service skills and demonstrate the ability to work with all levels within the organization, assuring prompt, and effective responses to customer needs
Utilize standard communication, reporting and documentation tools to effectively and efficiently communicate with the team, and document technical solutions
Help develop project plans, effectively create/update issues and keep team leads and management informed of changes, impediments, and updates
Perform additional duties as assigned
Required Minimum Qualifications:
Bachelor’s Degree with at least 3-5 years experience working in a UNIX system administrator or engineering role
Experience in the installation, maintenance, operation, tuning and troubleshooting of Linux and related systems and software
Ability to install, modify, integrate, and configure commercial and open source software applications and utilities
Experience supporting customer requests and working with stakeholders to gather and fulfill project requirements
Capable of managing time effectively, working both independently and as part of a team
Enthusiasm for learning new skills and adapting to a dynamic environment
Strong interpersonal skills, enthusiasm for customer service, and the ability to work with students, staff, and faculty from diverse backgrounds
Excellent written and verbal communication skills
Preferred Qualifications:
Experience with NVIDIA Bright Cluster Manager or other cluster management software
Experience with Infiniband networking
Experience with versioning tools such as Git or Subversion
Install and/or configuration of CEPH, parallel or high performance file systems
Hypervisor and virtualization technologies including, but not limited to VMware, KVM and Docker
Slurm or other cluster computing job management
Experience with GPU and specialized hardware for Artificial Intelligence and Machine Learning
Server class hardware deployment and remote management
Knowledge with CUDA Programming Workflows, GPU programming and GPU support.