Principal Software Engineer-OpenShift AI

Posted 12 November 2024
SalaryNegotiable
Location
Job type Permanent
DisciplineIT
ReferenceBBBH7107_1731431885

Job description

Job Title: Principal Software Engineer-OpenShift AI Model Training
Job type: Permanent
Job Location: Ireland/Remote

Company Overview:
Our client is the world's leading provider of enterprise open-source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates have the flexibility to choose the work environment that suits their needs from in-office to fully remote to office-flex.

The Role:
Our client is looking for a highly motivated individual to join as a Principal Software Engineer with OpenShift AI. Lead the design and development of tools for next-generation Foundation Models and Large Language Models (LLMs). Contribute expertise in Kubernetes and MLOps to empower enterprise data science teams with distributed model training capabilities across hybrid cloud environments.

What You Will Do:

  • Lead engagement in machine learning upstream communities to ensure seamless integration with OpenShift AI.

  • Architect and implement scalable open-source solutions enabling Data Scientists to utilize distributed computing on OpenShift for training Machine Learning models.

  • Act as an MLOps Subject Matter Expert (SME) at engaging in customer discussions, technical conferences, and promoting OpenShift AI internally.

  • Design and implement new features for open-source communities such as KubeRay, PyTorch, Katib, and KubeFlow.

  • Provide technical leadership and vision on critical projects.

  • Mentor and coach a distributed team of engineers.

  • Present at conferences on OpenShift/Kubernetes and AI/ML technologies, both externally and within AI/ML communities.

What You Will Bring:

  • Active contributor in MLOps open-source projects like Ray/KubeRay, KubeFlow, PyTorch, Katib.

  • Experience in training and tuning ML models using tools such as Ray, Kubeflow training operator, Katib, MLFlow, or similar technologies.

  • Advanced proficiency with Kubernetes.

  • Advanced skills in Go or Python for software development.

  • Strong system understanding and troubleshooting abilities.

  • Innovation mindset with a deep passion for technology.

  • Technical leadership experience in a global team environment and proven mentorship capabilities.

  • Commitment to writing and maintaining high-quality code.

  • Excellent written and verbal communication skills in English.

Preferred Qualifications:

  • Bachelor's degree in statistics, mathematics, computer science, operations research, or a related quantitative field; master's or PhD is a plus.

  • Experience in engineering, consulting, or related fields focusing on distributed model training or data processing for customer environments or supporting data science teams.

  • Extensive expertise in OpenShift.

If you are interested in this role or would like to discuss it further, please call Nidhi at +353 1 645 5244 or email [email protected].