Freelance Agent Evaluation Engineer

Elevare Search• South Africa

6 years - 10 years

$ 24/hour (varies by project and expertise)

Posted: Today

Other

Contract

Job Summary

Mindrift connects specialists with project-based AI opportunities for leading technology companies, focused on testing, evaluating, and improving AI systems. This freelance Agent Evaluation Engineer role involves designing challenging coding tasks, writing comprehensive functional tests, and analyzing AI model performance. The position is part-time, project-based, and suited for experienced Python developers and test automation specialists.

Job Description

Mindrift is seeking experienced software engineers to contribute to project-based AI evaluation initiatives. This freelance Agent Evaluation Engineer role focuses on creating rigorous coding challenges and functional test cases to assess and improve AI coding systems. Engagement is project-based and does not constitute permanent employment.

Responsibilities:
- Review and refine realistic coding tasks based on production-grade codebases and defined requirements.
- Write comprehensive functional and integration tests validating full end-to-end behavior and edge cases.
- Design complex, context-rich coding challenges requiring advanced reasoning across multiple files and sources.
- Analyze AI-generated outputs to identify strengths, weaknesses, and failure patterns.
- Iterate on tasks based on structured feedback from QA reviewers.
- Ensure tasks meet defined quality criteria and submission standards.

Requirements:
- Degree in Computer Science, Software Engineering, or related field.
- Minimum of 5 years of software development experience, primarily in Python.
- Strong proficiency with pytest, async/await, subprocess management, and file operations.
- Experience in full-stack development including React-based frontends and backend systems.
- Practical knowledge of Docker containers and CI/CD workflows such as GitHub Actions.
- English proficiency at B2 level or higher.

Benefits:
- Flexible, project-based work structure.
- Competitive compensation based on expertise and project scope.
- Opportunity to contribute to advanced AI system evaluation.
- Remote work with task-based deadlines.

Participation is based on successful qualification and project availability.

Keyskills

Python pytest Test Automation Full-Stack Development React Docker CI/CD GitHub Actions Functional Testing Integration Testing AI Evaluation

This site uses cookies

Freelance Agent Evaluation Engineer

Job Summary

Job Description

Keyskills