Procurement Administrator (USA Healthcare) - EST Hours (Remote)

HorizenHire• South Africa

Any experience

$ 24/hr

Posted: 2 weeks ago

Other

Part-time

Job Summary

Mindrift is a global platform that connects subject-matter experts with advanced AI projects to help ethically shape the future of artificial intelligence. This role focuses on designing and evaluating structured test scenarios for large language model agents. The position supports AI quality and reliability by creating realistic workflows, defining expected behaviors, and assessing agent performance across reusable, well-scored evaluation frameworks.

Job Description

Mindrift is seeking a detail-oriented professional to support the evaluation and improvement of large language model agents through structured testing and analysis. In this role, you will design realistic scenarios that reflect human workflows and define clear success criteria to assess AI-generated behavior. You will collaborate within AI-focused projects, ensuring evaluation tasks are precise, repeatable, and aligned with production standards.

Responsibilities:
- Design structured test cases that simulate complex, real-world human tasks.
- Define gold-standard behaviors and scoring logic for agent evaluation.
- Review and analyze agent logs, decision paths, and failure modes.
- Work with repositories and testing frameworks to validate scenarios.
- Refine prompts, instructions, and test cases for clarity and difficulty.
- Ensure evaluation scenarios are reusable, scalable, and production-ready.

Requirements:
- Bachelor’s or Master’s degree in a relevant technical or analytical field.
- Background in quality assurance, software testing, data analysis, or NLP.
- Strong understanding of test design principles and edge case coverage.
- Excellent written English communication skills.
- Ability to work with structured formats such as JSON or YAML.
- Basic experience with Python and JavaScript.

Benefits:
- Flexible, remote freelance work schedule.
- Competitive hourly compensation based on skills and experience.
- Hands-on experience with advanced AI systems.
- Opportunity to build a strong portfolio in AI evaluation.

This role offers a unique opportunity to contribute directly to the development of responsible and high-quality AI technologies.

Keyskills

Test case design quality assurance data analysis NLP annotation Python JavaScript JSON YAML prompt evaluation analytical thinking

This site uses cookies

Procurement Administrator (USA Healthcare) - EST Hours (Remote)

Job Summary

Job Description

Keyskills