Job Description:
As a Site Reliability Engineer (SRE), you will focus on ensuring the reliability, scalability, and performance of systems and applications for portfolio companies and clients. This remote, part-time or full-time role offers the flexibility to work with global teams, combining software engineering and systems engineering expertise to build and maintain highly available systems. You will work closely with development and operations teams to proactively identify and resolve issues, automate processes, and improve system performance.
Job Responsibilities:
- Design, implement, and maintain systems and tools to ensure high availability, reliability, and performance.
- Monitor and troubleshoot production systems, identifying and resolving issues before they impact users.
- Develop and maintain automation scripts and tools to streamline operational processes.
- Collaborate with development teams to improve system architecture and application performance.
- Implement and manage monitoring, logging, and alerting systems to ensure proactive issue detection.
- Conduct post-incident reviews and implement improvements to prevent future issues.
- Optimize cloud infrastructure for cost efficiency, scalability, and performance.
- Stay updated on emerging technologies and trends, recommending improvements to existing systems.
Job Requirements:
- Education: Bachelor’s degree in Computer Science, Information Technology, or a related field; equivalent experience accepted.
- Experience: 2+ years of experience in site reliability engineering, DevOps, or a related role (open to less experienced candidates with strong potential).
- Skills:
- Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
- Proficiency in scripting languages (e.g., Python, Bash, or PowerShell).
- Familiarity with containerization technologies (e.g., Docker, Kubernetes).
- Experience with cloud platforms like AWS, Azure, or Google Cloud.
- Strong understanding of CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI).
- Excellent problem-solving, communication, and collaboration skills.
- Certifications: AWS Certified SysOps Administrator, Google Cloud Professional DevOps Engineer, or similar certifications are a plus but not required.