📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s government-funded AMÁLIA language model is now active and outperforms several benchmarks. However, experts question its openness, native data sufficiency, and optimization goals, raising broader concerns about national LLM strategies.
Portugal’s €5.5 million investment in the AMÁLIA large language model has resulted in a functioning, publicly accessible system that surpasses many benchmarks in Portuguese language tasks, marking a significant milestone for the country’s AI efforts.
AMÁLIA, developed through a consortium of approximately 60 researchers across Portugal’s top institutions, was officially launched in October 2025. It is based on a continuation of the EuroLLM multilingual foundation, rather than training from scratch, with the model currently handling text-only tasks and knowledge up to the end of 2023. The model is available to 450,000 academic users via the FCT’s IAedu platform.
Technical evaluations show that AMÁLIA outperforms previous open models on European Portuguese benchmarks and surpasses Qwen 3-8B on most Portuguese-specific tasks, although it still trails Qwen on certain benchmarks like ALBA. The project aims for a final version by June 2026, with ongoing assessments of its capabilities and limitations.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.
Portuguese language large language model
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.
AI language model for Portuguese
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.
European Portuguese NLP tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.
open source language models for Portuguese
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Broader Implications of AMÁLIA for European Sovereign LLMs
The development of AMÁLIA underscores critical issues faced by European countries in building native-language large language models, particularly regarding transparency, data sufficiency, and strategic goals. While Portugal has achieved a functional model, experts highlight that fundamental questions about openness, native data reliance, and optimization priorities remain unaddressed, reflecting a broader pattern across Europe’s sovereign AI initiatives.
This matters because national investments in LLMs are increasingly seen as strategic assets, influencing language preservation, technological sovereignty, and AI competitiveness. The unresolved questions could impact future policy, funding, and development directions across Europe, making transparent evaluation essential for informed decision-making.
Structural Challenges in European Sovereign LLM Development
European countries have launched multiple sovereign LLM projects, such as Italy’s Minerva, Germany’s Aleph Alpha, France’s Mistral, and others, often with similar structural frameworks. Many of these initiatives involve large public investments, but they face recurring questions about how open these models truly are, how much native-language data is enough, and what the models should be optimized for. Portugal’s AMÁLIA exemplifies this pattern, being a continuation of a multilingual foundation rather than a from-scratch model, raising questions about strategic design choices.
While initial benchmarks show promising results, the broader European discourse has yet to fully grapple with these fundamental issues, which are critical for ensuring that these models serve national interests and language preservation goals effectively.
“AMÁLIA is an impressive piece of work. But the real questions about its openness, native data, and goals remain unanswered.”
— Duarte O.Carmo
Unresolved Questions About AMÁLIA’s Openness and Goals
It remains unclear how open the AMÁLIA model truly is, especially regarding access to training data and model weights. Additionally, questions about whether the current native-language data volume is sufficient for long-term robustness, and what the primary optimization objectives should be, have not been publicly addressed. The final version’s capabilities and strategic direction are still evolving, and further disclosures are expected before June 2026.
Upcoming Evaluations and Policy Discussions on AMÁLIA
The next 12-24 months will see continued technical assessments, including benchmarking and transparency initiatives. Portugal’s research teams and policymakers are likely to face increasing scrutiny regarding the model’s openness, data strategy, and alignment with national AI priorities. The final release in June 2026 will be a key milestone for evaluating whether AMÁLIA addresses these foundational questions or if further clarifications are needed.
Key Questions
What makes AMÁLIA different from other European LLMs?
AMÁLIA is based on a continuation of a multilingual foundation rather than training from scratch, and it is the first major Portuguese-language model publicly available with significant institutional backing and benchmarks.
Why are questions about openness and native data important?
Open models and native data are critical for transparency, sovereignty, and ensuring the model truly serves the linguistic and cultural needs of the country, rather than being a proprietary or opaque system.
What are the main risks of unresolved questions in national LLM projects?
Unanswered questions can lead to models that are less transparent, less aligned with national interests, and potentially less robust or trustworthy, impacting policy and technological sovereignty.
When will the final version of AMÁLIA be available?
The final version is expected in June 2026, with ongoing evaluations and potential adjustments based on initial assessments.
How does AMÁLIA compare to models like Qwen or Minerva?
AMÁLIA outperforms many open models on Portuguese benchmarks and beats Qwen 3-8B on most tasks, but Qwen still surpasses it on certain benchmarks like ALBA. Its strategic design differs from models trained from scratch, like Minerva, focusing instead on continued pretraining.
Source: ThorstenMeyerAI.com