AppkoEurope

Sr.DevOps/Site Reliability/Fullstack Engineer

Description

Coverage time: 4pm - 1am CET (Remote)

We are looking for a highly motivated DevOps/Site Reliability Engineer to join our exceptional team. The candidate we are looking for is ready to design, automate and support our GCP and JIRA server infrastructure, back-end systems and do technical integration with our client tools and apps that support autonomous software end to end testing processes. The ideal candidate would have strong performance tuning, technical design, fullstack skills and support of GCP and Jira Production Environments for 1000+ end user base. Jira is currently utilized for end to end automated test plan implementation/enhancements/maintenance via JIRA structures hierarchy and workstreams. (includes manual and automated processes).

Key Responsibilities: Design, build, and maintain resilient and scalable GCP infrastructure for Jira/Confluence, primarily using GKE. Manage and tune Cloud SQL databases (MySQL/PostgreSQL) backing the applications. Automate infrastructure provisioning and management using Terraform. Implement and manage comprehensive monitoring, logging, and alerting using tools like Elastic Stack/Prometheus/Grafana. Troubleshoot and perform root cause analysis for performance issues and outages across the application, database, and infrastructure layers. Ensure high availability, plan for disaster recovery, and manage capacity. Collaborate with engineering teams on operational readiness and best practices. Participate in on-call rotation.

Daily Activities: Take ownership of test and production environments for infrastructure for performance to meet customers and end users SLA requirements. Ensuring design, system reliability, availability, and serviceability of GCP Cloud and JIRA infrastructure. Improve the product life cycle through inception, design, deployment, operation and refinement Engineer and ensure high availability (HA) across all service components, including the application tier (on GKE), databases (Cloud SQL), and shared file systems (GCP Filestore). Implement multi-zone/regional architectures and automated failover processes. Continuously monitor resource utilization (compute, memory, network, storage), forecast future demands, and proactively scale resources, including GKE clusters, Cloud SQL instances, NFS servers, and GCP Filestore capacity, to meet performance SLAs and growth. Write automation code for provisioning and operating infrastructure at massive scale Establish end-to-end monitoring and alerting on all critical components of the applications, including availability, latency and overall system health Participate in the on-call rotation supporting the platform and/or the production application Direct root-cause-corrective-action analysis of critical business and production issues Develop standard methodology for Infra orchestration and troubleshooting application service in production Represent DevOps/SRE in design reviews and works with Engineering teams on operational readiness

Technical Qualifications:

Must-Have: GCP Expertise: Deep understanding and hands-on experience with Google Cloud Platform services. GCP security Kubernetes/GKE: Proven experience in deploying, managing, and troubleshooting applications on Google Kubernetes Engine. Infrastructure as Code: Strong experience with Terraform for managing cloud infrastructure. Database Administration: Experience with Cloud SQL (MySQL and/or PostgreSQL), including setup, administration, performance tuning, and troubleshooting. Monitoring & Logging:Experience configuring and managing Elastic Stack (Elasticsearch, Logstash, Kibana) for logging and monitoring. Understanding of GCPs internal monitoring philosophy.

Jira/Confluence Administration: Significant experience in administering Jira and Confluence Data Center versions, including application-level performance tuning, and troubleshooting. Experience with migration from on-prem/DataCenter to GCP/Cloud is critical. Operating Systems: Strong Unix/Linux internals and administration skills. Scripting: Proficiency in Python or another scripting language for automation. Networking: Solid understanding of TCP/IP, DNS, HTTP, load balancing, and virtual networking in GCP. CI/CD: Understanding of CI/CD principles. GitHub/Jenkins.

Nice-to-Have: Google Cloud Spanner: Experience with Spanner. Configuration Management: Experience with Puppet. Programming: C++ skills.

Other Qualifications: Ability to communicate effectively and succinctly Strong systematic problem solving skills and able to work in ambiguity Excellent written and verbal communication, able to collaborate and rally support Excellent interpersonal skills and the ability to work well in a team Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency and drive; positive attitude with the ability to quickly learn new technologies and effectively manage parallel projects Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions Passionate to learn, understand, and dissect new technologies quickly and independently

Preferred Qualifications: 5+ years of related experience BS Computer Science, Engineering or a related field, or equivalent professional experience Proven experience working with customers and vendors Proven leadership of small informal teams

Skills

PostgreSQLDevOpsPythoncplusplusKubernetesSecurityGCPJIRALogstashC++SQLJenkinsMySQLSREGrafanaConfluenceDNSKibanaUnixPrometheusCI/CDElasticsearchGitHubcppLinuxTerraform

Want AI to find more roles like this?

Upload your CV once. Get matched to relevant assignments automatically.

Try personalized matching