Senior Software Engineer- AI Hardware
Company: Bloomberg L.P.
Location: New York
Posted on: February 1, 2025
Job Description:
Senior Software Engineer- AI Hardware in New York , New YorkThe
Role:We are seeking an engineer to join our hardware management
team. This team is responsible for the provisioning, monitoring,
and support for thousands of servers supporting dozens of teams
within Bloomberg, including the entire AI stack!The ideal candidate
will have experience in designing, implementing, and maintaining
system software that enables communication between GPUS, CPUs, and
storage in scale-out AI and HPC systems. This role will also be
responsible for overseeing the ongoing monitoring, support, and
maintenance of our HPC/AI clusters, ensuring peak performance and
reliability.We'll trust you to:
- Design, build, and maintain highly reliable, scalable, and
efficient infrastructure platforms that support our engineering
teams and business needs.
- Participate in system design discussions and contribute to
architectural decisions.
- Ensure code quality through standard methodologies, code
reviews, and alignment to clean code principles.
- Produce clear and consumable documentation for a wide
audience.
- Communicate effectively across diverse teams.
- Participate in on-call rotations as arranged.
- Be a self-starter, manage priorities, and work
independently.
- Stay up-to-date with the latest infrastructure technologies,
and industry standard processes, and evaluate their potential
impact on existing and future solutions.Who you are?
- Hold yourself to high standards.
- Exude our ambitious, collaborative, and empathetic values.
- A self-starter mentality with an eagerness to solve previously
unsolved problems.
- Excellent collaboration skills and are open to giving and
receiving critical feedback across teams.
- Scalability and reliability are hardwired into your DNA.
- You have publicly available writing samples, blog posts, demos,
or recordings of presentations on technical topics.What's in it for
you?
- A unique opportunity to be part of a rapidly growing team in
one of the most exciting engineering teams in Bloomberg.
- An inclusive and supportive work culture that fosters learning
and growth.
- Continuous professional development, product training, and
career pathing.
- Intra-departmental mentor and buddy program for in-house
networking.
- An inclusive company culture, ability to join our Community
Guilds.You'll need to have:
- 4+ years of proficiency in Kubernetes environments
(deployments, storage, services, jobs, ingress, egress, etc).
- BA, BS, MS, PHD, in Computer Science, Electrical Engineering or
related field.
- Hands-on management of GPU-based systems, including kernel and
driver management, and developing software tooling to automate
provisioning and maintenance of these systems.
- Design, implement, and maintain system software that enables
communication between GPUS, CPUs, and storage in scale-out AI and
HPC systems.
- Oversee the ongoing monitoring, support, and maintenance of our
HPC/AI clusters, ensuring peak performance and reliability.
- Drive system upgrades, customization, and seamless integration
with software developers, network operations, and data center
teams.
- Manage and maintain a diverse range of computer systems and
application software, ensuring they meet the highest standards of
functionality and efficiency.
- Develop and maintain expertise in low-latency/high-bandwidth,
interconnected infrastructure (including InfiniBand, Ethernet,
RDMA/RoCE, and others).
- Monitor and evaluate the efficiency and effectiveness of
infrastructure service delivery methods and procedures.
- Partner with internal teams to develop prioritization, metrics,
and processes around capacity planning and infrastructure
availability. Periodically present capacity planning and
performance reports to senior leaders during presentations and
meetings.
- Benchmark, analyze, and make recommendations for improvement of
IT infrastructure.We'd love to see:
- Expertise with Kubernetes design patterns (operators, helm
charts, kustomize, etc).
- Experience with data center planning, including rack
elevations, cabling plan, and cables/transceivers.
- Experience with data center operations and management.Bloomberg
is an equal opportunity employer and we value diversity at our
company. We do not discriminate on the basis of age, ancestry,
color, gender identity or expression, genetic predisposition or
carrier status, marital status, national or ethnic origin, race,
religion or belief, sex, sexual orientation, sexual and other
reproductive health decisions, parental or caring status, physical
or mental disability, pregnancy or parental leave, protected
veteran status, status as a victim of domestic violence, or any
other classification protected by applicable law.Bloomberg is a
disability inclusive employer. Please let us know if you require
any reasonable adjustments to be made for the recruitment process.
If you would prefer to discuss this confidentially, please email
amer_recruit@bloomberg.net
#J-18808-Ljbffr
Keywords: Bloomberg L.P., East Orange , Senior Software Engineer- AI Hardware, IT / Software / Systems , New York, New Jersey
Didn't find what you're looking for? Search again!
Loading more jobs...