21 days old

Senior Site Reliability Engineer

Tulsa, OK 74103
  • Job Code


As a Senior Site Reliability Engineer (SRE) you will develop a solid understanding of the full stack offering at Vetsource. You are generally obsessed with availability and reliability and excited about enabling other Engineers in Automation and System Reliability best practices.

You are someone who cares deeply about pets and ensuring they are healthy and happy. At Vetsource, we create a great relationship between veterinarians and their clients by enabling pet parents to get important medications and foods delivered directly to their homes.

We're looking for people with a background and interest in building successful products or systems, comfortable in dealing with lots of moving pieces, have amazing attention to detail, and you're comfortable learning new technologies and systems.

Teamwork & Leadership:

  • Work with the team to design, build, and maintain high-performance Infrastructure
  • Strong team player with a high degree of self-motivation and the ability to learn new systems & manage additional technical resources to meet the project requirements
  • Collaborate with development teams on best practices and infrastructure planning activities with a focus on reliability, performance and security
  • Participate in problem resolution activities; Troubleshoot issues across the entire stack - software, database and infrastructure.
  • Function well in a fast-paced and rapidly-changing environment

Execution and Skills:

  • Diagnose and troubleshoot complex distributed systems handling large volumes of data and develop solutions that have a significant impact at scale.
  • Maintain database systems through patching, reviews of schema modifications, queries, and performance optimizations
  • Participate in building advanced tooling for testing, monitoring, administration, and operations of multiple clusters across multiple geographically distributed data centers
  • Develop innovative ways to smartly measure, monitor & report application and infrastructure health
  • Experience improving the performance of micro-services and solve scaling/performance issues
  • Define and Monitor SLI/SLO Error Budgets
  • Drive efficiencies in systems and processes: capacity planning, configuration management, performance tuning, monitoring and root cause analysis.


  • Be ultimately accountable for the performance, capacity and high availability of the infrastructure
  • Reduce the time it takes to build, deploy, and configure Infrastructure & Applications
  • Facilitate knowledge sharing by creating and maintaining comprehensive documentation & diagrams
  • Write high quality code to deliver automated solutions across the entire stack.
  • Understanding the challenges and limitations of the existing system, tooling and architecture & contribute to the roadmap and design discussions.
  • Partner with the Engineering community to establish metrics, review & sign off on changes and introduction of new services and schema changes


  • BS degree in computer science or related engineering degree.
  • 3+ years of hands-on experience with cloud computing - including infrastructure, storage, platforms and data management
  • Experience with traditional enterprise data-center technologies, including compute, storage appliances, virtual machines, and networking
  • Experience working with scalable networking technologies such as Load Balancers/Firewalls and web standards (F5/Load balancers, REST APIs, RPC, web security mechanisms).
  • Broader Integration and management of DevOps ecosystem that includes Docker & Kubernetes and related deployment/orchestration tools (such as Docker, Kubernetes, Jenkins, Artifactory)
  • 3+ years of experience in Linux Systems and general programming/scripting (Python, Shell, Java) and automation frameworks.
  • Should be able to quickly identify the root cause and resolve critical issues by looking across multiple layers (storage, OS, network, and application / DB stack)
  • Position will require being available to perform occasional maintenance and be available for on-call rotation during non-business hours and over the weekends.

Working Conditions:

  • Environment where dogs are present
  • Able to work on a computer sitting for long periods of time

What We Offer:

  • Huge company vision and rapid growth opportunities, and teams made of smart, ambitious, and fun colleagues
  • Dog friendly office, sit-to-stand desks, and free gourmet coffee machine and Avanti market on site
  • Competitive salary and full benefit packages including PTO, medical, dental, vision, FSA, and 401K

Supervisory Responsibilities: None


In addition to the job-specific responsibilities listed above, all employees are expected to support and model Vetsource's Core Value Principles: Do the right thing every time; Treat others the way they want to be treated; Embrace Change; Be innovative; Get it done; Work hard, have fun! Employees will be held accountable for knowledge and effective application of these principles.


The statements herein are intended to describe the general nature and level of work being performed by employees, and are not to be construed as an exhaustive list of responsibilities, duties, and skills required of personnel so classified. Furthermore, they do not establish a contract for employment and are subject to change at the discretion of the employer.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.



Posted: 2020-07-17 Expires: 2020-08-17

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Senior Site Reliability Engineer

Tulsa, OK 74103

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast