We provide a range of models on our huggingface:
Based on gpt architecture:
- first version of the model in small architecture (we recommend the successor, version two): radlab/polish-gpt2-small
- second version of the model in small architecture: radlab/polish-gpt2-small-v2
- second version of the model in medium architecture (the first version is no longer publicly available due to low accuracy): radlab/polish-gpt2-medium-v2
Classification and extraction models:
- model for extracting answers to questions from any text: radlab/polish-qa-v2
- a model that detects the polarity of information from news articles, running on playgroundzie and available on huggingface: radlab/polarity-3c
Models based on T5 architecture
- model in t5-base architecture for text cleaning: radlab/polish-denoiser-t5-base
Encoder models:
- bi-encoder for texts written in Polish (we recommend the newer version of this model described below): radlab/polish-sts-v2
- a newer version of the bi-encoder, with much higher correlation during training with an averaged pooling layer: radlab/polish-bi-encoder-mean
- cross-encoder for re-evaluation: radlab/polish-cross-encoder
Models GenAI:
- radlab/pLLama3-8B-creator, a model that provides fairly short, specific answers to user queries;
- radlab/pLLama3-8B-chat – a model that is a chatty version, reflecting the behavior of the original meta-llama/Meta-Llama-3-8B-Instruct.
- radlab/pLLama3-70B – probably the largest PL model to date?!
- radlab/pLLama3.1-8B-content: this model, following SFT and DPO, provides short and concise answers.
- radlab/pLLama3.1-8B-chat is a more talkative version of the model (after SFT and DPO), ideal for chatting.
- radlab/pLLama3.1-8B-base-ft-16bit this is the pLLama3.18B model directly after SFT with LoRa.
- radlab/pLLama-L31-adapters-MIX-SFT-DPO experimental model with transfer of adaptive layers between models.
- radlab/pLLama3.2-1B – pLLama3.2 models in 1B architecture only ofter fine-tuning.
- radlab/pLLama3.2-1B-DPO pLLama3.2 models in 1B architecture only ofter fine-tuning and DPO.
- radlab/pLLama3.2-3B – pLLama3.2 model in 3B architecture only after fine-tuning
- radlab/pLLama3.2-3B-DPO pLLama3.2 models in 3B architecture only ofter fine-tuning and DPO.
Other models derived from the learning process:
- fast tokenizer trained on a large volume (approx. 30 GB of text in Polish): radlab/polish-fast-tokenizer
Based on word2vec vector models, we have developed a semantic similarity list.