GlossAPI is an open-source infostracture developed by Open Technologies Alliance (GFOSS)to transform raw Greek text from public consultation, science, education, literature, and culture into clean, well-documented, AI-ready data. As the Greek language remains underrepresented in large-scale AI datasets, GlossAPI provides the tools and workflows needed to create high-quality linguistic resources that are openly accessible and fully reproducible.
The project builds a foundational infrastructure for Greek Natural Language Processing by combining a robust, modular processing pipeline with a strong commitment to open standards. The pipeline covers every stage of document processing, automated downloading, text extraction, section segmentation, classification, and annotation, supporting multiple file formats while preserving structure and metadata.
High-quality datasets produced by GlossAPI are already available on Hugging Face, enabling research, education, digital humanities, NLP applications, and the development of Greek language models. GlossAPI is also being used in European projects to improve the understanding and processing of the Greek language in real-world contexts.
Beyond a tool, GlossAPI is a community: researchers, developers, linguists, and students collaborate in an open, participatory, and ethically aligned ecosystem for Greek language technology. Our goal is to foster a sustainable, transparent, and collaborative environment for Greek NLP.
Whether you are training models, building smarter search systems, or exploring Greek digital heritage, GlossAPI provides the foundations to build scalable, transparent, and socially responsible AI applications.
All datasets are released under Creative Commons licenses, and the source code is openly available on GitHub.
glossAPI