Site Reliability Engineer-monitoring

Company Description

Launched in 1998, this pioneering British-born brand has specialised in creating unforgettable experiences - from city breaks, hotels and holidays to theatre and spa days. is among the worldwide leaders in the field, helping hundreds of thousands of customers every year find, and do, "whatever makes them pink". is part of lm group, a publicly-traded multinational Group, among the worldwide leaders in the online travel industry. Each month, our websites and mobile apps (available in 17 languages and 40 countries) reach 43 million unique users that search for and book their travel and leisure experiences.

More than 1,200 people enjoy working with us and contribute to providing our audience with a comprehensive and inspiring offering of travel-related products and services

Job Description  is looking for a Site Reliability Engineer for its SRE’s  team based in Chiasso. The candidate will be specifically part of the monitoring team.

“Hope is not a strategy. Engineering solutions to design, build, and maintain efficient large-scale systems is a true strategy, and a good one.”

Key Responsibilities will include

  • Be responsible for the availability, performances, monitoring and incident response of the platform and services running on multiple environments.
  • Collaborate closely with architects, developers, database administrators in order to handle the reliability and scalability of the infrastructure.
  • Improve the monitoring system , elaborate metrics to prevent incidents
  • Improve the logging system , elaborate logs to prevent incidents
  • Participate in periodic on-call duties



  • 5+ years experience in the role, on multinational companies
  • Experience with Linux operating systems (Ubuntu, RHEL) internals and administration (e.g., filesystems, inodes, system calls) and networking (e.g., TCP/IP, routing, network topologies)
  • Strong experience with Centralized logs solutions (Fluentd, Logstash, Graylog)
  • Strong experience with Monitoring stacks (grafana/graphite, Prometheus, Nagios and forks)
  • Familiar with Website monitoring tools (Site24x7)
  • Familiar with alerting and on-call management tools (Opsgenie).
  • Familiar with Time Series Database and Whisper Database
  • Familiar with Opentracing , Zipkin, Jaeger
  • Familiar with GCP stackdriver and AWS Cloudwatch
  • Experience in monitoring specifically JavaEE applications
  • Experience with Centralized logging. Big Data Log Management Tools (Data Analytics Platforms)
  • Good knowledge of Elasticsearch
  •  Experience with data discovery
  • Familiarity with scripting languages (Python, Ruby, bash, GO)
  • Production Support Experience (Systems administration / deployments)
  • Good knowledge of Docker
  • Experience on Virtualization technologies (VMware, Rhev)
  • Familiarity with version system tools (preferably Git )
  • Ability to debug and optimise code and automate routine tasks
  • Good exposure on Agile/Scrum practices


  • Travel domain experience
  • Certifications in the area of expertise (OS and App Server related)
  • Familiarity with microservices
  • Knowledge of Kubernetes
  • Knowledge of machine learning engineers and big data algorithms


  • Good communication skills written and verbal
  • Enthusiasm to learn new technologies
  • Attitude to teamwork and ability to work in multi location teams