Tether is a leader in digital finance and advanced technology, combining blockchain innovation with AI-driven solutions to build scalable and secure global systems. Within its QVAC platform, Tether is developing next-generation on-device AI capabilities. The Lead AI Inference Engineer will play a key role in building and optimizing the runtime infrastructure that powers efficient, reliable AI inference across edge devices.
Responsibilities:
- Lead development of AI inference systems optimized for edge device performance
- Deploy machine learning models using frameworks such as llama.cpp, ggml, and ONNX
- Collaborate with researchers to transition models from research to production
- Integrate AI capabilities into products to enhance functionality and performance
- Manage a cross-functional team including C++, JavaScript, QA, and documentation engineers
- Ensure stable releases through structured processes and performance evaluation
Requirements:
- Strong programming expertise in C++
- Experience with inference engines such as llama.cpp and ggml
- Solid understanding of deep learning models including transformers and diffusion models
- Experience working with LLMs and deploying models to production environments
- Proven ability to manage small, specialized engineering teams
- Degree in Computer Science, AI, or related field with strong AI R&D experience
Benefits:
- Fully remote work with a globally distributed engineering team
- Opportunity to build cutting-edge edge AI and peer-to-peer technologies
- Work on high-impact systems powering next-generation AI applications
- Collaborative environment focused on innovation and performance
Join Tether to help define the future of on-device AI inference and scalable machine learning systems.