Senior System Software Engineer, NCCL – Partner Enablement

Full-stack EngineerSoftware EngineerFull TimeRemoteTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 1 moreAll locations: California, Texas

Posted

56 days ago

Salary

$152K - $218.5K / year

Bachelor Degree5 yrs expEnglishAnsibleAWSAzureCloudDockerGoogle Cloud PlatformKubernetesLinuxNode.jsPython

Job Description

• Engage with our partners and customers to root cause functional and performance issues reported with NCCL • Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters • Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.) • Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters • Document and conduct trainings/webinars for NCCL • Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.

Job Requirements

  • B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience.
  • Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
  • Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
  • Experience working with engineering or academic research community supporting HPC or AI
  • Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control
  • Expert in Linux fundamentals and a scripting language, preferably Python
  • Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
  • Adaptability and passion to learn new areas and tools
  • Flexibility to work and communicate effectively across different teams and timezones

Benefits

  • Equity
  • Benefits

Related Job Pages

More Full-stack Engineer Jobs

Full Stack Engineer

Fieldwire by Hilti

The all-in-one jobsite management software for field to office communication.

Full-stack Engineer56 days ago
Full TimeRemoteTeam 51-200Since 2013H1B No Sponsor

Mid-Level Fullstack Engineer developing core features for construction management platform

AngularBootstrapRubyRuby on RailsRustSCSS
United States
$145K - $170K / year

Software Engineer – Support Experience

SeatGeek

Help the world experience more live.

Full-stack Engineer56 days ago
Full TimeRemoteTeam 501-1,000Since 2009H1B Sponsor

Software Engineer developing ticketing solutions at SeatGeek

United States
$121K - $175K / year

Software Engineer I, Fullstack, Risk Engineering

Flex

Flex splits your bills into smaller, stress-free payments throughout the month. Start today with your rent bill!

Full-stack Engineer56 days ago
Full TimeRemoteTeam 201-500Since 2019H1B Sponsor

Software Engineer I developing backend services and APIs for Flex's risk engineering systems

Distributed SystemsJavaReactReact NativeSpringSpring BootSpringBootSQLTypeScript
California + 2 moreAll locations: California, New Jersey, New York
$125K - $138K / year

Full-Stack Developer

HOLYWATER

We publish stories that inspire millions of people around the world

Full-stack Engineer56 days ago
Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

Full-Stack Developer at HOLYWATER creating AI-based entertainment products

AWSFirebaseGoogle Cloud PlatformJavaScriptNext.jsNode.jsReactTypeScript
United States