Data Engineer
About Us
Luzia is Europe’s fastest-growing AI consumer company, redefining conversational AI with cutting-edge large language models and an unparalleled user experience. With a strong and rapidly expanding presence in Brazil, we’re on a mission to make Luzia the go-to AI personal assistant, empowering millions of users to simplify and enhance their everyday lives.
At Luzia, we ship fast: keep it simple, learn from every experiment, own outcomes, and build what our users need. We thrive in a fast-paced environment and expect everyone to use AI daily—automating the routine and amplifying creativity—because using AI isn’t optional; it’s how we win.
⚡ Hypergrowth startup with 80M+ users worldwide in under 3 years
🌎 Latin America as a core market with more than 60% of our user base
🚀 Backed by top-tier investors (Khosla Ventures, Prosus Ventures, A-Star, Monashees and more)
Role Overview
As a Data Engineer, you’ll operate at the intersection of data and AI, building the systems that power not only analytics and decision-making, but also LLM-driven product features and internal AI tools.
You’ll play a key role in shaping how data is used across Luzia—from reliable dashboards to evaluation and observability pipelines for LLM systems, and from structured datasets to agentic interfaces that help teams interact with data using natural language.
This is not a traditional data engineering role: you’ll help define how data supports AI, and how AI feeds back into better data and decisions.
Responsibilities
Data Pipelines & Architecture: Design, build, and maintain scalable data pipelines and architectures supporting analytics, experimentation, and AI/ML use cases.
LLM Data Systems & Evaluation: Help design and maintain evaluation frameworks and datasets for LLM-powered features, including pipelines for collecting, labeling, and analyzing model outputs.
LLM Observability: Contribute to building observability systems for LLM workflows, including logging inputs/outputs, tracking prompt and model versions, and monitoring performance over time.
LLM-Enabled Workflows: Develop and productionize data workflows powered by LLMs (e.g. classification, enrichment, summarization of user inputs).
Data Quality & Reliability: Implement best practices to ensure data quality, consistency, and trust across all data workflows.
Internal Data Tools & Agentic Systems: Support and evolve internal tools (e.g. data copilots / agentic systems) that enable teams to query and interpret data using natural language.
Cross-Functional Collaboration: Work closely with product managers, engineers, analysts, and data scientists to align data solutions with business and product needs.
Operational Excellence: Monitor, troubleshoot, and continuously improve data pipelines and AI-related workflows.
Stack Evolution: Help evolve the data and AI stack, introducing new tools and deprecating outdated ones as the company scales.
Requirements
Experience: 4+ years of experience as a Data Engineer, Analytics Engineer, or similar role.
Core Skills: Strong proficiency in Python and SQL.
Data Warehousing: Experience with modern data warehouses (e.g. Redshift, BigQuery, Snowflake).
Cloud Platforms: Hands-on experience with cloud environments (AWS -preferred-, GCP, or Azure).
Data Pipelines: Proven experience building and maintaining end-to-end data pipelines.
Data Modeling: Solid understanding of data modeling concepts (e.g. dimensional modeling, slowly changing dimensions).
LLM / AI Data Experience (Important)
Experience working with LLMs or AI-powered systems in production or near-production environments, including:
Working with unstructured data (e.g. text, logs, user inputs)
Building or supporting LLM-powered workflows (classification, enrichment, etc.)
Understanding of evaluation approaches (e.g. testing outputs, measuring quality, iterating on results)
Familiarity with monitoring or debugging AI systems (e.g. analyzing outputs, identifying failure patterns)
APIs & Integrations: Experience integrating and maintaining data from external APIs, including handling changes and failures over time.
Communication: Strong communication skills and fluency in English.
Product Mindset: Ability to translate business needs into scalable data solutions that drive real impact.
Education: Bachelor’s degree in Computer Science, Engineering, or related field.
Nice-to-Haves
Experience with vector databases or embeddings
Familiarity with LLM tooling or frameworks (e.g. LangChain, OpenAI APIs, OpenClaw).
Experience with observability tools for data or AI systems
Experience with Infrastructure as Code (e.g. Terraform)
Experience with orchestration tools (Airflow, Prefect)
Experience with transformation frameworks (dbt, SQLMesh)
Exposure to machine learning pipelines or experimentation systems
Experience working with large-scale B2C datasets
- Department
- Backend
- Locations
- Madrid Office
- Remote status
- Hybrid