Work on research and development of tools and techniques for ML dataset curation
What is the key to building the most advanced AI models? Data quality.
Hyperparam’s goal is to build a tool so efficient that one engineer can curate large ML datasets single-handedly. We believe that the way to accomplish this is: 1) build a highly scalable and interactive frontend experience that enables exploration and curation of massive ML datasets in the browser, and 2) dataset-scale inference that uses models to reflect back on their own training set to assist with curation. We're building the next generation of tools for ML dataset curation, helping make LLM dataset curation orders of magnitude more efficient than current approaches. By creating the best quality datasets, we will enable the creation of the world’s most capable models.
This opportunity is hybrid in-person in Seattle at a seed-stage startup. You would be one of the very first employees, working side-by-side with an experienced team building a new kind of dataset curation tool. This will require intense work ethic, dedication, creativity, and independence that is necessary at an early stage startup. For the right candidate, this is a unique opportunity to build a company from the earliest idea stages to building a product used by real customers.
Responsibilities:
You might be a great fit if you have:
What We Offer:
The ideal candidate is deeply passionate about accelerating AI progress. You're excited by the potential of using LLMs as tools to improve the quality and efficiency of dataset curation, seeing this as a key lever for advancing ML capabilities. You think critically about dataset quality and how it impacts model performance, and you're motivated by the challenge of building automated systems that can help create better training data at scale. You likely have hands-on experience with ML models and understand firsthand how dataset quality influences model behavior. You don’t need to know frontend development, but you should be excited about the prospect of a more interactive, frontend-centric ML data platform. Most importantly, you're eager to work on systems that could help unlock the next generation of more capable AI models through better training data.
$180,000 - $240,000 per year