Job Description
Job Title: Site Reliability Engineer (SRE)
Location: New York/EST area Hybrid/Remote]
Duration: 6+ Months (Can be extendable)
Experience Required: 10+ Years
Certification: AWS Certification is Mandatory (e.g., AWS Certified DevOps Engineer, Solutions Architect, or SysOps Administrator)
Job Summary:
- We are seeking an experienced Senior Site Reliability Engineer (SRE) with a strong background in AWS Cloud, DevOps automation, and system reliability engineering. The ideal candidate should bring hands-on expertise in cloud infrastructure, CI/CD, monitoring, and automation, along with proven experience in supporting large-scale, high-availability systems within Telecom, Banking, or Retail industries.
- This role is responsible for ensuring platform stability, reliability, scalability, and continuous improvement of infrastructure through automation and DevOps best practices.
Key Responsibilities:
- Design, implement, and maintain highly available and scalable cloud infrastructure on AWS.
- Build and manage end-to-end CI/CD pipelines to enable efficient and reliable software delivery.
- Develop and maintain Infrastructure as Code (IaC) using Terraform, CloudFormation, or Ansible.
- Monitor, automate, and enhance system reliability, performance, and incident response processes.
- Implement observability solutions (Prometheus, Grafana, ELK/EFK, Splunk, or Datadog).
- Collaborate with development teams to improve application reliability and deployment processes.
- Participate in on-call rotations, incident management, and root cause analysis (RCA).
- Optimize infrastructure costs and ensure cloud security and compliance with enterprise standards.
- Develop automation scripts and tools using Python, Go, or Shell to eliminate manual tasks.
- Prepare and maintain documentation, including architecture diagrams and operational runbooks.
Primary Skills:
- Cloud Platform: AWS (EC2, S3, EKS, Lambda, CloudWatch, RDS, IAM, etc.)
- DevOps & SRE Practices: CI/CD, automation, monitoring, incident response, performance tuning
- Infrastructure as Code (IaC): Terraform, AWS CloudFormation, Ansible
- CI/CD Tools: Jenkins, GitLab CI, GitHub Actions, Argo CD, or Spinnaker
- Containers & Orchestration: Docker, Kubernetes, Helm, EKS, OpenShift
- Monitoring & Logging: Prometheus, Grafana, ELK / EFK, Splunk, Datadog, CloudWatch
- Scripting / Programming: Python, Go, Bash, or Shell
- Version Control: Git, GitHub, Bitbucket
- Networking & Security: VPC, VPN, Load Balancers, DNS, SSL, Security Groups, IAM
Required Qualifications:
- Bachelor's or Master's degree in Computer Science, Information Technology, or related field.
- 10+ years of hands-on experience in SRE / DevOps / Cloud Infrastructure roles.
- Mandatory AWS Certification (e.g., AWS Certified DevOps Engineer, Solutions Architect Associate/Professional, or SysOps Administrator).
- Proven experience in Telecom, Banking, or Retail domain infrastructure and platform operations.
- Strong expertise in microservices, distributed systems, and containerized environments.
- Experience in monitoring, alerting, observability, and automated remediation.
- Excellent problem-solving, incident management, and communication skills.
Preferred / Nice-to-Have:
- Experience with Kafka, RabbitMQ, or other messaging platforms.
- Familiarity with service mesh (Istio, Linkerd, Consul) and API Gateway solutions.
- Exposure to data pipeline management and streaming frameworks.
- Knowledge of FinOps and cost optimization strategies on AWS.
- Experience with security compliance frameworks (ISO, PCI-DSS, GDPR, etc.).
Job Tags
Remote work,