HPC Dev Ops Engineer
- - TX-Austin
- Austin, TX, USA
- Full Time
The HPC DevOps Engineer will provide support in advanced scientific computing systems for data scientists and researchers. The successful candidate will work closely with Information Technology and scientific staff to design and maintain high performance computing capabilities to facilitate seamless service workflow and provide solutions for users' computational needs. The HPC DevOps Engineer will also develop cloud computing solutions for research projects using AWS or Azure. The position will be a key service provider of computational research support across all business areas of the company.
Essential Duties and Responsibilities:
- Contribute to the development and optimization of advanced scientific computing and data storage systems, integrating local HPC clusters and cloud computing systems across a variety of locations and providers.
- Work with development and data analytics teams across all business areas of the company to provide technical support for using a variety of systems, including HPC clusters, secure scientific computing platforms, parallel file systems and other various storage solutions, virtualized environments, etc.
- Develop and implement cloud solutions in AWS and/or Azure to support computational requirements of researchers that fall outside the scope of local systems while remaining compliant with relevant cybersecurity regulations.
- Contribute to strengthening the corporate culture of best software engineering practices.
Required Knowledge, Skills & Abilities:
- Knowledge and experiences in scientific/research computing infrastructure design, integration, and management, and related components and applications is required.
- Knowledge and experience with HPC systems and administration, file system configuration and performance tuning, high-speed network, virtualization, and stateless system management is required.
- Experience with cluster software stack management (module, Lmod, Spack, etc.) and optimization (compilers, InfiniBand, MPI, numerical libraries and other scientific applications) is required.
- Experience working in a Unix/Linux environment, preferably including Linux Container solutions and system management, is required.
- Experience with virtualization, public cloud computing solutions (AWS, Azure), and private cloud deployment (OpenStack) is preferred.
- Experience integrating hybrid (cloud + on-promise) computing systems across a variety of locations and providers is preferred.
- Experience with a variety of storage technologies and solutions including object storage, on-premises and cloud, and automating backup processes is preferred.
- Experience maintaining systems in compliance with applicable cybersecurity regulations is preferred.
- Experience with large-scale data processing and data preparation is preferred.
- Experience with containers, such as Docker or Singularity is preferred.
- Broad knowledge of computer science and IT technology trends, and the ability to effectively apply the knowledge to the company's computational environment is desired.
- Experience programming in Python and R is desired.
- Skilled in using Git for version control and collaborative software development is desired.
- Familiarity developing and maintaining relational databases is desired.
- Proven ability to work across teams to build partnerships, develop stakeholder buy-in and support, and influence without direct authority is required.
- Able to balance multiple on-going tasks while operating efficiently and with minimal supervision is required.
- Demonstrated effectiveness in communication and collaboration with scientists, including data scientists, statisticians, biologists, chemists, engineers, etc., is preferred.
- Bachelor's Degree in Information Technology, Computer Science, or a related field is required. Four year's professional experience working in an IT-related role may be substituted in lieu of a degree.
- A minimum of four years' demonstrated professional experience programming, operating, and administering heterogeneous scientific computing systems, including, at a minimum, two years' experience administering Linux and HPC platforms. If professional experience is utilized to satisfy the education requirement, a total of at least eight years' experience is required.
Certificates and Licenses:
A certification identified in DoD 8570.01-M Appendix 3 at the IAM Level I or higher is preferred.
For a list of approved certifications, visit https://public.cyber.mil/cwmp/dod-approved-8570-baseline-certifications/.
This position requires that the candidate be willing and able to complete a successful background screening for a security clearance. Candidates with an active security clearance will receive preference.
Working Conditions/ Equipment:
- Incumbent will work in a general office environment with general office equipment available. Incumbent will be required to sit for extended periods at a desk and work on a computer for up to eight hours per day.
- Incumbent will be required to work a set schedule negotiated with the supervisor. Occasional after-hours work (i.e., nights or weekends) may be required based on specific circumstances.
- Incumbent must be able to lift up to 20 pounds without assistance and 40 pounds with assistance on a repetitive basis to move or install equipment.
The above job description is not intended to be an all-inclusive list of duties and standards of the position. Incumbents will follow any other instructions, and perform any other related duties, as assigned by their supervisor.