OBSERVABILITY ENGINEER

Requisition ID: 78430

City: MAIA

Date: Jul 3, 2026

Brand: MC

Area: STRUCTURES

sonae.pt

Thank you for your interest in this opportunity! Your application will only be considered complete after completing a digital assessment. At the end of your application, and provided you meet the role requirements, you will receive a link via your registered email address to access it. This step is mandatory, as our recruitment process is based on competency assessment.

We are a company of everyone and for everyone. At MC, we place people at the center: customers, employees, and the community. We create value together, innovate at every step, and lead with proximity, recognizing the unique qualities of each individual. Different stories and ideas unite us with the same purpose: to grow and become the best version of ourselves. Because everything we are, we achieve together.

MC Digital is MC’s Information Technology division. We firmly believe that technology can revolutionize the retail sector, bringing greater convenience and exceeding customer expectations. In this context, we are focused on unlocking the potential of Artificial Intelligence across our products and technology systems, helping accelerate MC’s digital transformation.

Want to learn more about what we do? Visit Caixa Central, MC Digital’s blog and podcast: https://caixacentral.mcdigital.tech/.

For all these reasons, join us!

We are looking for an Observability Engineer to join our Architecture, Platform Engineering & Infrastructure team, and we believe we’ll be #betterwithyou!

The mission of this role is to build and operate the company’s observability platform as a product—ensuring end-to-end management of metrics, logs, traces, and instrumentation, enabling all engineering teams to detect, diagnose, and resolve issues with confidence while truly understanding the behavior of their systems in production.

Within the Observability & Platform team, our mission is to empower all engineering teams across the company to understand their production systems—not just identify that something failed, but understand why.

We build and operate a company-wide observability platform based on a fully open-source stack: Prometheus and Mimir for metrics, Loki for logs, Tempo for traces, OpenTelemetry for ingestion pipelines, and Grafana for visualization.

We view observability as a product, not a service desk. This means providing self-service tools, golden paths, clear instrumentation standards, and a platform capable of scaling to high telemetry volumes without requiring teams to become observability experts.

We operate at scale across Kubernetes environments and beyond, valuing developer experience, cost control, and reliability equally. You will be part of a senior and specialized team, where your decisions will have a direct impact on how the entire technology organization understands its systems.

We count on you to…

Design, develop, and operate our observability platform—metrics (Prometheus/Mimir), logs (Loki), traces (Tempo), ingestion pipelines (OpenTelemetry), and dashboards (Grafana)—as reliable, self-service products;
Define the company’s observability technical direction by establishing instrumentation standards, golden paths, and reference architectures used by engineering teams;
Automate the entire platform lifecycle—provisioning, configuration, upgrades, and rollouts—through GitOps and Infrastructure as Code;
Collaborate with development teams to understand observability needs and transform them into platform capabilities, avoiding one-off solutions;
Operate distributed systems at scale, managing cardinality, retention, query performance, multi-tenancy, and costs related to large telemetry volumes;
Lead complex end-to-end initiatives such as backend migrations, major version upgrades, and capacity planning, while promoting solution reviews and technical mentoring;
Treat internal teams as customers by monitoring adoption, service levels, and satisfaction, continuously improving the platform experience;
Stay up to date with the CNCF and observability ecosystem (OpenTelemetry, eBPF-based telemetry, Perses, continuous profiling, and emerging trends), incorporating relevant solutions into the platform.

So, bring with you…

Solid hands-on experience operating observability or monitoring platforms in large-scale production environments, ideally within the Grafana ecosystem (Grafana, Prometheus, Mimir, Loki, and Tempo), including operational challenges such as cardinality, query performance, retention, and multi-tenancy;
Strong expertise with OpenTelemetry, including instrumentation, collectors, and the design of pipelines for metrics, logs, and traces;
Advanced Kubernetes experience, including running stateful and high-throughput production workloads (CKA certification or equivalent is valued);
Proven experience with Git, CI/CD (GitHub Actions or equivalent), and GitOps tools such as Argo CD;
Strong programming skills in Go and/or Python, actively contributing to the development and evolution of platform tooling;
Strong Linux knowledge and hands-on experience with one major cloud provider (preferably Google Cloud Platform);
More than 8 years of professional experience in Platform Engineering, Infrastructure, or SRE roles;
Degree in Computer Science, Information Systems, or equivalent through a combination of education and professional experience;
Excellent teamwork skills, building trusted relationships and promoting collaboration across different stakeholders;
Artificial Intelligence competencies;
Availability to work within a DevOps operating model, including participation in an on-call rotation.

We also value…

High curiosity, multitasking ability, proactivity, and autonomy;
Excellent written and verbal communication skills, with the ability to explain complex technical concepts to non-technical audiences;
Flexibility, adaptability, and resilience in demanding and ever-changing environments;
Continuous improvement mindset;
Strong problem-solving orientation;
Fluency in English.

What we offer you…

Meal Allowance on Dá Card;
Telecommunications plan including voice, data, and equipment (for permanent employees);
Flex it Up Program – Extra Days Off, Unpaid Leave, Flexible Working Model (where applicable);
Health and Life Insurance (for permanent employees), with the possibility of extending Health Insurance to family members under advantageous conditions;
Flexible Benefits Program (where applicable);
Onboarding and Initial Training Plan, Continuous Learning Platform, and Financial Literacy Program;
School Awards and Merit Scholarships for employees’ children (mainstream and inclusive education), as well as Holiday Programs during school breaks;
New Baby Welcome Kit;
Internal Mobility Programs for talent development;
Flu Vaccination Program including administration (voluntary enrollment);
Somos Sonae Program, providing psychosocial, financial, and legal support to employees;
Ergocoaching sessions;
Mental Health Promotion Programs and Nutrition Consultations;
Discounts and Partnerships Program across more than 300 leading brands;
Free coffee and fruit available at the workplace;
Competitive salary;
Collaborative and dynamic work environment.

#BETTERTOGETHER #BETTERWITHYOU

MC Sonae D&I Commitment:

We work to create a workplace enriched by diverse backgrounds and perspectives, focused on individuality, ensuring that everyone feels respected, valued for their skills, and confident in the organization.