Senior Site Reliability Engineer (AWS, Security)
Company Description
KMS Technology was established in 2009 as a U.S.-based software services company. With development centers in Vietnam and Mexico, we have been trusted globally for the superlative quality of our software consulting & development services, technology solutions, and engineers' expertise. We pride ourselves on creating brilliant solutions for our clients by leveraging deep expertise, advanced technologies, and delivery excellence for a shared success where everyone can reach their fullest potential. With three Business Lines:
KMS Software: Leverage software domain expertise to help clients make better business decisions in technology platforms, increase speed-to-market, and gain critical development support through innovative technology solutions.
KMS Solutions: Empower BFSI businesses to embrace the digital finance revolution and expedite clients’ journey towards complete digitalization, technology consulting, data analytics, software development, and software quality.
KMS Healthcare: Build transformative next-gen technologies to solve healthcare’s most challenging problems, providing innovative tools and expertise to providers, payers, life sciences, and medical technology vendors.
Job Description
- Design, deploy, and maintain scalable, reliable AWS infrastructure
- Automate infrastructure management using IaC tools (Terraform, CloudFormation, Ansible)
- Optimize system performance, capacity planning, and incident management through best practices and automation.
- Lead incident response, root cause analysis (RCA), and postmortem processes.
- Manage and optimize AWS services for performance and cost efficiency
- Develop and manage DataDog dashboards, metrics, and alerts to monitor system health, analyze performance, and support infrastructure optimization
- Work with development, DevOps, and IT teams to boost system reliability and efficiency, and ensure thorough documentation of architecture, monitoring, and incident workflows.
Qualifications
General requirements:
- Upper-Intermediate level of English level
- Ability to effectively consult with clients to understand their needs, propose tailored solutions, and persuasively communicate their value to gain approval
- Ability to obtain deep knowledge of the project technologies and work independently with minimum guidance
- Ability to handle multiple tasks, communicate effectively with team members and clients
- Strong logical thinking and problem solving skills
- Ability to self-learn and adapt to new technologies quickly
Technical requirements:
- 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role
- Experienced in complex systems and design scalable solutions for operational excellence
- Extensive hands-on experience with AWS cloud infrastructure (EC2, S3, RDS, Lambda, CloudWatch, CloudTrail)
- Proficient in using DataDog for APM, infrastructure dashboards, and alert configurations
- Familiarity with NOC environments and incident management protocols
- Strong experiences in networking concepts (DNS, Load Balancers, VPNs, etc.) and cloud security best practices
- Capable of deploying detection rules in CloudTrail and configuring logs for enhanced security insights
- Proficiency with Infrastructure as Code (IaC) tools (Terraform, CloudFormation).
- Strong scripting skills in languages such as Python, Bash, or similar
- Solid understanding of monitoring, logging, and alerting tools (Prometheus, Grafana, ELK stack, or similar)
- Experience with containerization (Docker) and orchestration tools (Kubernetes, ECS)
- Able to identify incident types, follow response checklists, escalate appropriately, and document incidents clearly
Additional Information
BENEFITS & PERKS
- Working in one of the Best Places to Work in Vietnam, Top 10 ITC Company in Vietnam
- Flexible working model: Flexible time & Hybrid working from Ho Chi Minh or Da Nang city or working remotely from any location in Vietnam
- Attractive Salary & Benefits, full salary in probation, social insurance on full gross salary
- Performance appraisal twice a year, 13th-month salary and performance bonus
- Premium healthcare insurance for you and your loved ones
- Working 5 days/week, from Monday to Friday
- 18+ paid leave days/year
- Diverse careers opportunities with Software Services, Software Product Development
- Working and growing in a values-driven, international working environment and standard Agile culture with passionate and talented teams
- Onsite opportunities: short-term and long-term assignments in U.S
- Various training on hot-trend technologies, best practices and soft skills
- Company trip, big annual year-end party every year, team building, etc.
- Fitness & sports activities: football, tennis, table tennis, badminton, yoga, swimming…
- Joining community development activities: 1% Pledge, charity every quarter, blood donation, public seminars, career orientation talks,…
- Free in-house entertainment facilities (football, ping pong, gym…), coffee, and snacks (instant noodles, cookies, candies…)
And much more, join us and let yourself explore other fantastic things!

More from Company













