idealjobusa-logo.png

Software Engineer Data Infrastructure

Overview

We’re seeking a talented Software Engineer to strengthen the data operations of our AI team. In this role, you’ll be at the core of our machine learning model training pipeline—finding, processing, and scaling high-quality audio datasets. Your work will directly power the future of Speechify’s next-gen consumer and enterprise AI products.

Key Responsibilities

  • Discover and integrate new sources of large-scale audio data to fuel AI model development
  • Maintain and enhance our ingestion pipeline using GCP, Docker, and Terraform
  • Work alongside researchers to optimize the data collection process in terms of speed, cost, and quality
  • Shape the team’s long-term data roadmap and help push the boundaries of what our models can do
  • Ensure seamless collaboration across engineering, infrastructure, and leadership teams

What We’re Looking For

  • A degree in Computer Science or a related technical field (BS/MS/PhD)
  • 5+ years of experience building production-level software systems
  • Strong scripting abilities in Python and bash within Linux environments
  • Experience working with Docker and cloud-native infrastructure (we use GCP, Terraform)
  • Bonus: Experience with web scraping, large-scale data collection, or distributed data systems
  • Ability to work independently and shift gears quickly when priorities change
  • Clear and effective communication skills (written and spoken)