Sign In

Blog

Latest News

Infrastructure/DevOps Engineer (Full-time – Remote – USA or Canada)

Remote
Remote
Posted 4 days ago

A rapidly growing cybersecurity company, focused on protecting organizations from sophisticated email attacks like phishing, business email compromise, and account takeovers is looking for an Infrastructure/DevOps Engineer to join their IT team. This full-time, remote role is open to candidates living in the United States or Canada. In this high-impact position, you will be instrumental in enabling AI software engineers to move fast by building and maintaining reliable, scalable, and secure infrastructure. You will partner closely with IT, security, and AI/ML engineering teams, ensuring that foundational systems robustly support the experimentation, deployment, and monitoring of advanced AI tools and solutions.

This role is ideal for someone who thrives at the intersection of systems engineering and AI enablement, and who loves solving complex operational challenges to unlock innovation across the company. You’ll contribute directly to protecting the modern workplace from sophisticated threats with AI-native technology, working alongside talented colleagues in a culture of learning, ownership, and high performance.


Who You Are: A Collaborative, Security-Minded Builder

You’re a skilled Infrastructure/DevOps Engineer with a solid background in cloud infrastructure and platform engineering. You love building scalable systems that empower other engineers to work faster and more efficiently. You thrive in collaborative, cross-functional environments and embody a customer-first mindset – your internal teams are your customers, and your goal is to remove blockers and boost their productivity. You’re passionate about automation, self-service tools, and building highly reliable systems. You deeply value security, observability, and operational excellence, and you communicate clearly through both meticulous documentation and effective collaboration. You consistently stay up-to-date with the latest trends in DevOps and AI infrastructure, and you’re driven by impact, ownership, and continuous improvement, always moving fast without sacrificing quality.


What You Will Do: Accelerating AI/ML Development at Scale

As an Infrastructure/DevOps Engineer, you’ll be at the forefront of designing, implementing, and optimizing the foundational systems that power advanced AI/ML initiatives. Your responsibilities will blend architecture, automation, security, and performance tuning in a dynamic cloud environment.

  • Architect and Manage Infrastructure for AI/ML: You will architect and manage the foundational infrastructure that specifically supports AI/ML pipelines, critical AI tools, and vast data platforms. This involves designing scalable, resilient, and high-performance environments that meet the unique computational and data storage demands of machine learning and artificial intelligence workloads.
  • Implement and Maintain Containerization and Orchestration: You will actively implement and maintain containerization technologies like Docker (for packaging applications into portable containers) and orchestration platforms like Kubernetes (for managing, scaling, and deploying containerized applications). Your expertise ensures efficient resource utilization and reliable application delivery for AI/ML solutions.
  • Develop CI/CD Systems for Reproducible AI: You will be instrumental in developing robust CI/CD (Continuous Integration/Continuous Delivery) systems that are meticulously integrated with ML workflows. This ensures reproducible AI experiments by automating the build, test, and deployment processes for machine learning models and applications, promoting consistency and accelerating the iteration cycle.
  • Collaborate on Data Protection Standards: You will collaborate closely with security and compliance teams to ensure that all infrastructure components and configurations meet stringent data protection standards. This involves implementing security controls, adhering to regulatory requirements, and safeguarding sensitive AI/ML data throughout its lifecycle.
  • Automate Provisioning and Deployment with IaC Tools: You will drive automation by automating provisioning and deployment processes using cutting-edge Infrastructure as Code (IaC) tools such as Terraform or Pulumi. Your proficiency in IaC ensures that infrastructure can be deployed consistently, rapidly, and reliably across various environments, reducing manual effort and potential errors.
  • Monitor and Troubleshoot Infrastructure Issues: You will continuously monitor and troubleshoot infrastructure issues using industry-leading tools like Prometheus (for metrics collection), Grafana (for data visualization and dashboards), and the ELK stack (Elasticsearch, Logstash, Kibana) for centralized logging and analysis. Your proactive monitoring and expert troubleshooting will ensure high availability and performance of foundational systems.
  • Partner with AI and Software Engineers for Optimization: You will partner closely with AI and software engineers to optimize platform performance and resource utilization. This involves understanding their specific needs, identifying bottlenecks in AI/ML workloads, and implementing solutions (e.g., resource allocation adjustments, infrastructure scaling) that enhance the efficiency and speed of AI development and deployment.
  • Maintain Clear, Accessible Documentation: You will meticulously maintain clear, accessible documentation to effectively scale platform knowledge across the organization. This includes creating architectural diagrams, operational runbooks, troubleshooting guides, and best practices, empowering other teams to leverage the infrastructure effectively and ensuring long-term maintainability.

Must Haves: Your Essential Qualifications

To excel as an Infrastructure/DevOps Engineer, you’ll need extensive experience in DevOps, SRE, or infrastructure engineering roles, with strong cloud, containerization, scripting, and automation skills.

  • Extensive DevOps, SRE, or Infrastructure Experience: You possess 4+ years of demonstrable experience in DevOps, SRE (Site Reliability Engineering), or Infrastructure Engineering roles. This background indicates a solid understanding of building and maintaining scalable, reliable, and efficient IT infrastructure.
  • Cloud Providers, Kubernetes, and Docker Proficiency: You have strong proficiency with cloud providers (AWS preferred), indicating hands-on experience with core cloud services. Crucially, you are proficient with Kubernetes (for container orchestration) and Docker (for containerization), demonstrating your ability to manage modern application deployment environments.
  • Infrastructure as Code Tools Experience: You have hands-on experience with infrastructure as code tools such as Terraform, Ansible, or Pulumi. This demonstrates your ability to provision and manage infrastructure declaratively and automate deployment processes effectively.
  • Strong Scripting Skills: You possess strong scripting skills in Python, Bash, or similar languages. This proficiency is essential for automating administrative tasks, building custom tools, and streamlining operational workflows.
  • Familiarity with CI/CD Systems: You have familiarity with CI/CD (Continuous Integration/Continuous Delivery) systems such as GitHub Actions, Jenkins, or CircleCI. This indicates your understanding of automated build, test, and deployment pipelines.
  • Understanding of Cloud Networking, Security, and Identity Management: You have a solid understanding of networking, security, and identity management in cloud environments. This includes knowledge of virtual networks, security groups, firewalls, IAM roles, and best practices for securing cloud resources.
  • Experience Supporting ML Workloads and GPU Infrastructure: You have practical experience supporting ML workloads (machine learning models and applications) and GPU-based infrastructure. This demonstrates your ability to manage specialized hardware and software environments required for computationally intensive AI tasks.
  • Ability to Troubleshoot Complex System Issues: You possess the ability to troubleshoot complex system issues effectively within a distributed environment. This indicates strong analytical and problem-solving skills for diagnosing and resolving intricate technical problems.
  • Comfort Working Across Functional Teams: You are comfortable working across functional teams and communicating effectively with both technical and non-technical stakeholders. This includes translating complex technical concepts into understandable terms for various audiences, fostering collaboration.

Nice to Have: Enhancing Your AI Infrastructure Profile

  • MLOps Tools Familiarity: Familiarity with MLOps tools like MLflow, Kubeflow, or SageMaker would be a plus, indicating a deeper understanding of machine learning operations workflows.
  • AI Platform Infrastructure Experience: Experience with AI platform infrastructure, including model serving (deploying trained models for inference) and feature stores (managing reusable features for ML models), would be highly desirable.
  • Logging and Monitoring Frameworks Knowledge: Knowledge of logging and monitoring frameworks such as Fluentd or Loki (for collecting and aggregating log data) would be beneficial.
  • Data Platform Support Background: A background in supporting data platforms like Snowflake, Databricks, or Hadoop would be a plus, indicating broader data engineering capabilities.
  • Startup or High-Growth Tech Experience: Experience working in high-growth startups or tech companies, where ambiguity and rapid change are common, would be advantageous.
  • AWS Certified: Holding an AWS certification (e.g., Solutions Architect, DevOps Engineer) is a significant plus, validating your expertise in AWS cloud services.

Compensation & Benefits: Rewarding Your Impact

Certain roles are eligible for a bonus, restricted stock units (RSUs), and comprehensive benefits. Individual compensation packages are based on factors unique to each candidate, including their skills, experience, qualifications, and other job-related reasons. The base salary range for this position is $127,500—$150,000 USD. For positions based in San Francisco/New York, the base pay range is $140,300—$165,000 USD.

Our client’s compensation and benefits philosophy is designed to attract, motivate, and retain top talent. They pay competitively, and equity is an important and exciting part of their total compensation strategy as a pre-IPO startup, guided by the belief that team members should share in the financial success of the company. They offer flexible PTO, observe 12 paid holidays, and provide generous healthcare coverage (100% of employee healthcare premium costs in the US, and up to 100% for dependents, depending on the plan). As a globally-distributed, majority remote company, they prioritize a balance of deep focus time with virtual meetings and regular in-person events. As a fast-growing startup, they continuously review, improve, and personalize their benefits offerings based on team input.


If this Infrastructure/DevOps Engineer role aligns with your passion for building scalable, secure, and reliable infrastructure for AI platforms, your expertise in cloud environments and automation, and your drive to unlock innovation, we encourage you to learn more about this exciting full-time, remote opportunity.

Are you ready to accelerate AI/ML development in a leading cybersecurity company?

Job Features

Job CategoryEngineering, IT

Apply For This Job

A valid phone number is required.