📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal launched AMÁLIA, a €5.5 million European Portuguese language model, which outperforms many benchmarks but prompts three hard questions about its openness, native data, and objectives. These questions have broader implications for Europe’s sovereign LLM efforts.
Portugal’s €5.5 million AMÁLIA language model is now operational, with the base version released in late September 2025, marking a significant step in the country’s AI development efforts. However, critical questions about its openness, native-language data, and strategic goals are emerging, raising broader concerns for Europe’s sovereign LLM initiatives.
AMÁLIA is a consortium project involving approximately 60 researchers from Portugal’s leading academic institutions, including NOVA, IST, and IT. It is built as a continuation of the EuroLLM multilingual foundation, rather than training from scratch, and handles Portuguese text only, with multimodal features planned for future updates. The model’s training involved 107 billion tokens, with a small portion (about 5.8 billion tokens) from Portugal’s web archive Arquivo.pt, representing roughly 5.5% of the total pre-training data.
Performance results show AMÁLIA surpasses previous open models on European Portuguese benchmarks and outperforms Qwen 3-8B on most Portuguese-specific tasks, though it still trails Qwen on ALBA, its primary benchmark. The final version is scheduled for release in June 2026, with ongoing development and evaluation.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.

Brazilian Portuguese Vocabulary 75 – Guided Language Learning Set with Audio | Ages 13+ | Essential Words & Sentences for Structured Study, Travel & Classroom Use
FOR AGES 13 AND UP – TEEN & ADULT LANGUAGE LEARNERS Designed for learners ages 13+, this guided…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.

OUNEYTO AI Language Translator Device, 138+ Languages Supported, Magnetic Design,Easy to Carry, Portable Two-Way Real-Time Language Translator/Photo/AI Translator Devicefor Travel Business Learning
【140+ Languages Online Translation & Permanently free】 Break down language barriers with our language translator device. With support…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.
AI training datasets for Portuguese
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.

HRH Spanish ESP Language Silicone Keyboard Cover Protector for MacBook Pro 13 inch 2020 (Model A2289 / A2251 / A2338 M1 Chip) and for Pro 16" 2019 (Model A2141),European Version
1)【European Layout only】COMPATIBLE WITH: (1) Pro 13-Inch 2020 with M1 processor (model: A2338)(2) Pro 13-Inch Early 2020 released…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Broader Implications for European Sovereign LLM Strategies
The development of AMÁLIA exemplifies Portugal’s strategic investment in native-language AI, reflecting a broader European push for sovereign LLMs. However, the project raises three critical questions about transparency, native data sufficiency, and strategic objectives, which are central to evaluating the success and direction of Europe’s AI sovereignty efforts. Addressing these questions is vital for shaping future policies and ensuring that national investments deliver meaningful, open, and strategically aligned models.
European Sovereign LLM Efforts and Structural Challenges
Across Europe, countries like Italy, Germany, France, and Norway are investing heavily in developing sovereign language models, often with public funds. These efforts typically involve either training models from scratch or extending multilingual foundations. The discourse has largely focused on technical benchmarks, but experts like Duarte O.Carmo have highlighted the need to scrutinize the underlying structural questions—particularly around openness, native data, and strategic goals—that remain largely unaddressed in public debates. Portugal’s AMÁLIA project is a key case study illustrating these issues, especially given its public funding and national scope.
“The three questions—how open is ‘fully open,’ how much native data is enough, and what should we optimize for—are fundamental to evaluating any sovereign LLM effort.”
— Duarte O.Carmo
Unanswered Questions About AMÁLIA’s Strategic and Technical Foundations
It remains unclear how open AMÁLIA truly is in terms of data and model access, whether the native Portuguese data used is sufficient for long-term performance, and what the strategic priorities are beyond benchmarking success. The final version’s development may address some gaps, but these issues are still under discussion and evaluation.
Next Milestones and Ongoing Evaluations for AMÁLIA
The final version of AMÁLIA is scheduled for release in June 2026, with ongoing assessments of its performance, openness, and strategic alignment. Researchers and policymakers will closely monitor how the model evolves, whether it addresses current gaps, and how it influences broader European sovereign LLM initiatives. Public discussions and transparency efforts are expected to deepen in the coming months.
Key Questions
What are the main concerns about AMÁLIA’s openness?
Experts question whether AMÁLIA’s data and model access are truly open, given the limited native Portuguese data used and the model’s restricted release scope.
Is the native Portuguese data sufficient for long-term performance?
It is still uncertain whether the approximately 5.8 billion tokens from Portuguese sources are enough for sustained, high-quality performance across diverse tasks.
What are the strategic goals of Portugal’s AMÁLIA project?
The official aim is to develop a high-performing Portuguese language model, but broader strategic questions about openness, data sovereignty, and AI policy remain under discussion.
How does AMÁLIA compare to other European sovereign models?
While AMÁLIA outperforms many benchmarks, its approach of building on a multilingual foundation contrasts with models trained from scratch, raising questions about long-term competitiveness and openness.
What is the significance of these questions for Europe’s AI future?
Addressing these structural questions is crucial for ensuring that Europe’s sovereign LLM efforts lead to transparent, capable, and strategically aligned AI systems.
Source: ThorstenMeyerAI.com