Machine Learning Engineering, Training Data Infrastructure
Company: Captions
Location: New York
Posted on: January 23, 2025
Job Description:
Captions is the leading video AI company, building the future of
video creation. Over 10 million creators and businesses have used
Captions to create videos for social media, marketing, sales, and
more. We're on a mission to serve the next billion.We are a rapidly
growing team of ambitious, experienced, and devoted engineers,
researchers, designers, marketers, and operators based in NYC.
You'll join an early team and have an outsized impact on the
product and the company's culture.We're very fortunate to have some
the best investors and entrepreneurs backing us, including Index
Ventures (Series C lead), Kleiner Perkins (Series B lead), Sequoia
Capital (Series A and Seed co-lead), Andreessen Horowitz (Series A
and Seed co-lead), Uncommon Projects, Kevin Systrom, Mike Krieger,
Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren
Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.Check
out our and some other coverage:The Information: Fast Company: The
New York Times: Business Insider: Time: ** Please note that all of
our roles will require you to be in-person at our NYC HQ (located
in Union Square) **OverviewCaptions seeks an exceptional Machine
Learning Engineer to drive innovation in training data
infrastructure. You'll conduct research on and develop
sophisticated distributed training workflows and optimized data
processing systems for massive video and multimodal datasets.
Beyond pure performance, you'll develop deep insight into our data
to maximize training effectiveness. As an early member of our ML
Research team, you'll build foundational systems that directly
impact our ability to train models powering video and multimodal
creation for millions of users.Key ResponsibilitiesInfrastructure
Development:
- Build performant pipelines for processing video and multimodal
training data at scale
- Design distributed systems that scale seamlessly with our
rapidly growing video and multimodal datasets
- Create efficient data loading systems optimized for GPU
training throughput
- Implement comprehensive telemetry for video processing and
training pipelinesCore Systems Development:
- Create foundation data processing systems that intelligently
cache and reuse expensive computations across the training
pipeline
- Build robust data validation and quality measurement systems
for video and multimodal content
- Design systems for data versioning and reproducing complex
multimodal training runs
- Develop efficient storage and compute patterns for
high-dimensional data and learned representationsSystem
Optimization:
- Own and improve end-to-end training pipeline performance
- Build systems for efficient storage and retrieval of video
training data
- Build frameworks for systematic data and model quality
improvement
- Develop infrastructure supporting fast research iteration
cycles
- Build tools and systems for deep understanding of our training
data characteristicsResearch & Product Impact:
- Build infrastructure enabling rapid testing of research
hypotheses
- Create systems for incorporating user feedback into training
workflows
- Design measurement frameworks that connect model improvements
to user outcomes
- Enable systematic experimentation with direct user feedback
loopsPreferred Qualifications:Technical Background:
- Bachelor's or Master's degree in Computer Science, Machine
Learning, or related field
- 3+ years experience in ML infrastructure development or
large-scale data engineering
- Strong programming skills, particularly in Python and
distributed computing frameworks
- Expertise in building and optimizing high-throughput data
pipelines
- Proven experience with video/image data pre-processing and
feature engineering
- Deep knowledge of machine learning workflows, including model
training and data loading systemsSystem Development:
- Track record in performance optimization and system
scaling
- Experience with cluster management and distributed
computing
- Background in MLOps and infrastructure monitoring
- Demonstrated ability to build reliable, large-scale data
processing systemsEngineering Approach:
- Love tackling hard technical problems head-on
- Take ownership while knowing when to loop in teammates
- Get excited about improving system performance
- Want to work directly with researchers and engineers who are
equally passionate about building great systemsTeam CultureYou'll
work directly alongside our research and engineering teams in our
NYC office. We've intentionally built a culture where
infrastructure and data work is highly valued - your success will
be measured by the reliability and performance of our systems, not
by your ability to navigate politics. We're a team that loves
diving deep into technical problems and emerging with practical
solutions.Our team values:
- Quick iteration and practical solutions
- Open discussion of technical approaches
- Direct access to decision makers
- Regular sharing of learnings, results, and iterative
workBenefits:
- Comprehensive medical, dental, and vision plans
- 401K with employer match
- Commuter Benefits
- Catered lunch multiple days per week
- Dinner stipend every night if you're working late and want a
bite!
- Doordash DashPass subscription
- Health & Wellness Perks (Talkspace, Kindbody, One Medical
subscription, HealthAdvocate, Teladoc)
- Multiple team offsites per year with team events every
month
- Generous PTO policy and flexible WFH daysCaptions provides
equal employment opportunities to all employees and applicants for
employment and prohibits discrimination and harassment of any type
without regard to race, color, religion, age, sex, national origin,
disability status, genetics, protected veteran status, sexual
orientation, gender identity or expression, or any other
characteristic protected by federal, state or local laws.Please
note benefits apply to full time employees only.
#J-18808-Ljbffr
Keywords: Captions, East Orange , Machine Learning Engineering, Training Data Infrastructure, Engineering , New York, New Jersey
Didn't find what you're looking for? Search again!
Loading more jobs...