The AI Ops Engineer manages thedeployment, monitoring, and maintenance of AI models. This role involvesensuring the reliability, scalability, and performance of AI systems,collaborating with cross-functional teams to optimize AI operations, and troubleshootingissues as they arise.
Responsibilities and Duties
- Deploy,monitor, and maintain AI models and systems to ensure optimal performance andreliability.
- Implementand manage CI / CD pipelines for the continuous integration and delivery of AImodels.
- Collaboratewith data scientists, AI engineers, and other stakeholders to understand modelrequirements and ensure successful deployment.
- Monitorthe performance of AI models and systems, identifying and resolving issuespromptly.
- Developand maintain automated monitoring and alerting systems to ensure the health andperformance of AI systems.
- OptimizeAI models and infrastructure for scalability and efficiency
- Ensurecompliance with data governance, security, and regulatory standards in AIoperations.
- Documentdeployment procedures, monitoring processes, and maintenance activities.
- Stayupdated with the latest advancements in AI operations and infrastructuretechnologies.
- Providetechnical support and guidance to junior team members.
- Participatein project planning and contribute to the development of project timelines anddeliverables.
- Performother duties relevant to the job as assigned by the Sr. AI Ops Engineer orsenior management.
Requirements
Bachelor’sdegree in Computer Science, Information Technology, or a related fieldRelevantcertifications (e.g., AWS Certified DevOps Engineer, Google Cloud ProfessionalDevOps Engineer) are preferredMinimumof 3 years of experience in AI operations, DevOps, or related fieldsExperiencein managing the deployment and maintenance of AI modelsStrongprogramming skills in languages such as PythonProficiency in AI and machinelearning frameworks (e.g., TensorFlow, PyTorch)Experience with CI / CD tools (e.g.,Jenkins, GitLab CI)Excellent problem-solving andtroubleshooting skillsStrongcommunication and interpersonal skillsIn-depthknowledge of AI operations and infrastructure managementFamiliarity with cloud platforms(e.g., AWS, Azure, Google Cloud) and their AI servicesUnderstandingof data governance, security, and regulatory standardsAbility tomanage multiple tasks and prioritize effectivelyStrong attention to detail andcommitment to delivering high-quality workAbilityto work independently and as part of a teamProgramminglanguages (e.g., Python)AI and machine learning frameworks(e.g., TensorFlow, PyTorch)CI / CD tools (e.g., Jenkins, GitLabCI)Monitoring and logging tools (e.g.,Prometheus, ELK Stack)Collaborationand communication tools (e.g., Slack, Microsoft Teams)#J-18808-Ljbffr