-
SalaryAs per market standardsLocationMumbai, Mumbai (Maharashtra), IndiaIndustryComputer SoftwareJob Description
A major player in the tech industry, which specializes in retail technology, AI, ML, and big data, is seeking new talent. Established by alumni from a top engineering institute, this organization manages a vast network of brands and stores. Headquartered in Mumbai, it is recognized for its innovation and expertise across multiple tech domains.
What will you do?
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Be the 1st person to report the incident.
- Debug production issues across services and levels of the stack.
- Envisioning the overall solution for defined functional and non-functional requirements, and being able to define technologies, patterns and frameworks to realise it.
- Building automated tools in Python / Java / GoLang / Ruby etc.
- Help Platform and Engineering teams gain visibility into our infrastructure.
- Lead design of software components and systems, to ensure availability, scalability, latency, and efficiency of our services.
- Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs are met and driving Problem Management for permanent remediation.
- Participate in on-call rotation to ensure coverage for planned/unplanned events.
- Perform other task like load-test & generating system health reports.
- Periodically check for all dashboards readiness.
- Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results.
- Working with your SRE and Engineering counterparts for driving Game days, training and other response readiness efforts.
- Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving complex issues with thorough root cause analysis on customer and SRE production environments
- Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the services in production and consistently achieve our market leading SLA.
- Improving the scalability and reliability of our systems in production.
- Evaluating, designing and implementing new system architectures.
Some specific Requirements:
- B.E./B.Tech. in Engineering, Computer Science, technical degree, or equivalent work experience
- At least 3 years of managing production infrastructure. Leading / managing a team is a huge plus.
- Experience with cloud platforms like - AWS, GCP.
- Experience developing and operating large scale distributed systems with Kubernetes, Docker and and Serverless (Lambdas)
- Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP)
- Comfortable with Python, Go, or any relevant programming language.
- Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus / Garafana / cloudwatch / Kafka / PagerDuty etc.
- Experience with one or more orchestration, deployment tools, e.g. CloudFormation / Terraform / Ansible / Packer / Chef.
- Experience with configuration management systems such as Ansible / Chef / Puppet.
- Knowledge of load testing methodologies, tools like Gating, Apache Jmeter.
- Work your way around Unix shell.
- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
- A focus on delivering high-quality code through strong testing practices.
What do we offer?
Growth
Growth knows no bounds, as we foster an environment that encourages creativity, embraces challenges, and cultivates a culture of continuous expansion. We are looking at new product lines, international markets and brilliant people to grow even further. We teach, groom and nurture our people to become leaders. You get to grow with a company that is growing exponentially.
Flex University
We help you upskill by organising in-house courses on important subjects
Learning Wallet: You can also do an external course to upskill and grow, we reimburse it for you.
Culture
Community and Team building activities
Host weekly, quarterly and annual events/parties.
Wellness
Mediclaim policy for you + parents + spouse + kids
Experienced therapist for better mental health, improve productivity & work-life balance
We work 5 days from the office and we make sure people have everything they need:-
Free meals
Snacks, goodies & a lot of fun culture
Check Your Resume for Match
Upload your resume and our tool will compare it to the requirements for this job like recruiters do.
Check for Match
It has come to our attention that clients and candidates are being contacted by individuals fraudulently posing as Antal representatives. If you receive a suspicious message (by email or WhatsApp), please do not click on any links or attachments. We never ask for credit card or bank details to purchase materials, and we do not charge fees to jobseekers.