Mindrift is seeking experienced software engineers to contribute to project-based AI evaluation initiatives. This freelance Agent Evaluation Engineer role focuses on creating rigorous coding challenges and functional test cases to assess and improve AI coding systems. Engagement is project-based and does not constitute permanent employment.
Responsibilities:
- Review and refine realistic coding tasks based on production-grade codebases and defined requirements.
- Write comprehensive functional and integration tests validating full end-to-end behavior and edge cases.
- Design complex, context-rich coding challenges requiring advanced reasoning across multiple files and sources.
- Analyze AI-generated outputs to identify strengths, weaknesses, and failure patterns.
- Iterate on tasks based on structured feedback from QA reviewers.
- Ensure tasks meet defined quality criteria and submission standards.
Requirements:
- Degree in Computer Science, Software Engineering, or related field.
- Minimum of 5 years of software development experience, primarily in Python.
- Strong proficiency with pytest, async/await, subprocess management, and file operations.
- Experience in full-stack development including React-based frontends and backend systems.
- Practical knowledge of Docker containers and CI/CD workflows such as GitHub Actions.
- English proficiency at B2 level or higher.
Benefits:
- Flexible, project-based work structure.
- Competitive compensation based on expertise and project scope.
- Opportunity to contribute to advanced AI system evaluation.
- Remote work with task-based deadlines.
Participation is based on successful qualification and project availability.