Teuken-7B: A European Beacon in AI

AI language models have fundamentally transformed the way people interact and work with artificial intelligence (AI). Yet for many European companies, existing solutions raise concerns regarding data protection and information security. In collaboration with a broad network of academic and industry partners, Fraunhofer IIS has developed Teuken-7B—an efficiently trained European language model that is available as open source and fully compliant with EU data protection regulations.

449 million people, 27 countries, 24 official languages—the European Union is as diverse as its economy. Whether on Bulgaria’s Black Sea coast or in France’s overseas territories in the Pacific, security standards are essential for European enterprises and their workforces. Yet the rapid technological shift driven by AI presents many businesses with a dilemma: while they are keen to adopt language models for their operational benefits, they also require strict adherence to data protection regulations. Especially in safety-critical sectors such as the automotive industry, healthcare or finance, it is essential that data remains within the company and is handled in accordance with the highest security and compliance standards,” says Fabian Küch, Head of Natural Language Processing group at Fraunhofer IIS. This need was addressed by the OpenGPT-X project, funded by the former German Federal Ministry for Economic Affairs and Climate Action (BMWK), which has released Teuken-7B—a fully European open-source language model designed for secure deployment.

24 languages, seven billion parameters

Teuken-7B was trained from the ground up in all 24 official languages of the European Union and comprises seven billion parameters. “What truly sets it apart is the nearly 50 percent share of non-English pretraining data,” Küch explains. This multilingual foundation ensures stable and consistent performance across a wide range of languages. The model also features a specially developed multilingual tokenizer, optimized for energy and cost efficiency and designed to work equally well across all languages. Tokenizers break down words into smaller units—called tokens—which the AI model can then process. Thanks to its multilingual approach, Teuken-7B handles complex linguistic structures, such as those found in German, with ease. Teuken has been trained more efficiently compared to similar multilingual models.

At its core, Teuken-7B is a technology ready to be put into practice—with a wide range of potential use cases. “By training the model on application-specific data, companies can develop tailored AI solutions that operate without black-box components,” explains Prof. Dr.-Ing. Bernhard Grill, Director of Fraunhofer IIS. The most obvious use case is chat applications, for which Teuken-7B has already been adapted through a process known as instruction tuning. The OpenGPT-X partners have deliberately taught Teuken-7B to understand user instructions.

Deutsche Telekom leads by example

Deutsche Telekom is already demonstrating how this works in practice. The company has integrated Teuken-7B into its Business GPT product, enabling enterprises to operate their own AI systems with high levels of data confidentiality and security. Employees can summarize texts, search for information, or collaborate across language barriers. “And above all, data protection remains paramount: Teuken-7B is a beacon of security built to meet German standards,” says scientist Fabian Küch.

Development far from finished

Work on Teuken-7B marks the next chapter in Fraunhofer IIS’s long history of AI innovation. The model and its successors are being prepared for deployment in public administration, the justice system, and industrial applications. “Together with Friedrich-Alexander-Universität Erlangen-Nürnberg, we’re currently building dedicated AI computing infrastructure at Fraunhofer IIS,” Küch notes. In the coming weeks and months, Teuken-7B will continue to improve. Plans for 2025 include models supporting 48 languages and up to 70 billion parameters.

Written by Julian Hörndlein, freelance journalist and PR copywriter.

Teuken-7B: A European Beacon in AI

24 languages, seven billion parameters

Deutsche Telekom leads by example

Development far from finished

Read more

Explore more

Allinga

Series: Artificial Intelligence

ADELIA: Analog technology creates efficient AI accelerator

“We will all be using generative AI”

Contact

Contact

Newsletter

Homepage

Contact Press / Media

Mandy Garcia

Contact Press / Media

Dr.-Ing. Fabian Küch