Model Gallery

570 models from 1 repositories

Filter by type:

Filter by tags:

huihui-glm-4.6v-flash-abliterated
**Huihui-GLM-4.6V-Flash (Abliterated)** A text-based large language model derived from the **zai-org/GLM-4.6V-Flash** base model, featuring reduced safety filters and uncensored capabilities. Designed for text generation, it supports conversational tasks but excludes image processing. **Key Features:** - **Base Model**: GLM-4.6V-Flash (original author: zai-org) - **Quantized Format**: GGUF (optimized for efficiency). - **No Image Support**: Only text-based interactions are enabled. - **Custom Training**: Abliterated to remove restrictive outputs, prioritizing openness over safety. **Important Notes:** - **Risk of Sensitive Content**: Reduced filtering may generate inappropriate or controversial outputs. - **Ethical Use**: Suitable for research or controlled environments; not recommended for public or commercial deployment without caution. - **Legal Responsibility**: Users must ensure compliance with local laws and ethical guidelines. **Use Cases:** - Experimental text generation. - Controlled research environments. - Testing safety filtering mechanisms. *Note: This model is not suitable for production or public-facing applications without thorough review.*

Repository: localai

qwen3-coder-30b-a3b-instruct-rtpurbo-i1
The model in question is a quantized version of the original **Qwen3-Coder** large language model, specifically tailored for code generation. The base model, **RTP-LLM/Qwen3-Coder-30B-A3B-Instruct-RTPurbo**, is a 30B-parameter variant optimized for instruction-following and code-related tasks. It employs the **A3B attention mechanism** and is trained on diverse data to excel in programming and logical reasoning. The current repository provides a quantized (compressed) version of this model, which is suitable for deployment on hardware with limited memory but loses some precision compared to the original. For a high-fidelity version, the unquantized base model is recommended.

Repository: localai

glm-4.5v-i1
The model in question is a **quantized version** of the **GLM-4.5V** large language model, originally developed by **zai-org**. This repository provides multiple quantized variants of the model, optimized for different trade-offs between size, speed, and quality. The base model, **GLM-4.5V**, is a multilingual (Chinese/English) large language model, and this quantized version is designed for efficient inference on hardware with limited memory. Key features include: - **Quantization options**: IQ2_M, Q2_K, Q4_K_M, IQ3_M, IQ4_XS, etc., with sizes ranging from 43 GB to 96 GB. - **Performance**: Optimized for inference, with some variants (e.g., Q4_K_M) balancing speed and quality. - **Vision support**: The model is a vision model, with mmproj files available in the static repository. - **License**: MIT-licensed. This quantized version is ideal for applications requiring compact, efficient models while retaining most of the original capabilities of the base GLM-4.5V.

Repository: localaiLicense: mit

qwen3-vl-30b-a3b-instruct
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on-demand deployment. #### Key Enhancements: * **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. * **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos. * **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. * **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. * **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers. * **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc. * **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. * **Text Understanding on par with pure LLMs**: Seamless text–vision fusion for lossless, unified comprehension. #### Model Architecture Updates: 1. **Interleaved-MRoPE**: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning. 2. **DeepStack**: Fuses multi‑level ViT features to capture fine-grained details and sharpen image–text alignment. 3. **Text–Timestamp Alignment:** Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling. This is the weight repository for Qwen3-VL-30B-A3B-Instruct.

Repository: localaiLicense: apache-2.0

qwen3-vl-30b-a3b-thinking
Qwen3-VL-30B-A3B-Thinking is a 30B parameter model that is thinking.

Repository: localaiLicense: apache-2.0

qwen3-vl-4b-instruct
Qwen3-VL-4B-Instruct is the 4B parameter model of the Qwen3-VL series.

Repository: localaiLicense: apache-2.0

qwen3-vl-32b-instruct
Qwen3-VL-32B-Instruct is the 32B parameter model of the Qwen3-VL series.

Repository: localaiLicense: apache-2.0

qwen3-vl-4b-thinking
Qwen3-VL-4B-Thinking is the 4B parameter model of the Qwen3-VL series that is thinking.

Repository: localaiLicense: apache-2.0

qwen3-vl-2b-thinking
Qwen3-VL-2B-Thinking is the 2B parameter model of the Qwen3-VL series that is thinking.

Repository: localaiLicense: apache-2.0

qwen3-vl-2b-instruct
Qwen3-VL-2B-Instruct is the 2B parameter model of the Qwen3-VL series.

Repository: localaiLicense: apache-2.0

huihui-qwen3-vl-30b-a3b-instruct-abliterated
These are quantizations of the model Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF

Repository: localaiLicense: apache-2.0

huggingfacetb_smollm3-3b
SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens followed by supervised fine-tuning and alignment via Anchored Preference Optimization (APO).

Repository: localaiLicense: apache-2.0

moondream2-20250414
Moondream is a small vision language model designed to run efficiently everywhere.

Repository: localaiLicense: apache-2.0

smolvlm-256m-instruct
SmolVLM-256M is the smallest multimodal model in the world. It accepts arbitrary sequences of image and text inputs to produce text outputs. It's designed for efficiency. SmolVLM can answer questions about images, describe visual content, or transcribe text. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks. It can run inference on one image with under 1GB of GPU RAM.

Repository: localaiLicense: apache-2.0

smolvlm-500m-instruct
SmolVLM-500M is a tiny multimodal model, member of the SmolVLM family. It accepts arbitrary sequences of image and text inputs to produce text outputs. It's designed for efficiency. SmolVLM can answer questions about images, describe visual content, or transcribe text. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks. It can run inference on one image with 1.23GB of GPU RAM.

Repository: localaiLicense: apache-2.0

smolvlm-instruct
SmolVLM is a compact open multimodal model that accepts arbitrary sequences of image and text inputs to produce text outputs. Designed for efficiency, SmolVLM can answer questions about images, describe visual content, create stories grounded on multiple images, or function as a pure language model without visual inputs. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks.

Repository: localaiLicense: apache-2.0

smolvlm2-2.2b-instruct
SmolVLM2-2.2B is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 5.2GB of GPU RAM for video inference, it delivers robust performance on complex multimodal tasks. This efficiency makes it particularly well-suited for on-device applications where computational resources may be limited.

Repository: localaiLicense: apache-2.0

smolvlm2-500m-video-instruct
SmolVLM2-500M-Video is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 1.8GB of GPU RAM for video inference, it delivers robust performance on complex multimodal tasks. This efficiency makes it particularly well-suited for on-device applications where computational resources may be limited.

Repository: localaiLicense: apache-2.0

smolvlm2-256m-video-instruct
SmolVLM2-256M-Video is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 1.38GB of GPU RAM for video inference. This efficiency makes it particularly well-suited for on-device applications that require specific domain fine-tuning and computational resources may be limited.

Repository: localaiLicense: apache-2.0

qwen3-30b-a3b
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-30B-A3B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 30.5B in total and 3.3B activated Number of Paramaters (Non-Embedding): 29.9B Number of Layers: 48 Number of Attention Heads (GQA): 32 for Q and 4 for KV Number of Experts: 128 Number of Activated Experts: 8 Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Repository: localaiLicense: apache-2.0

qwen3-235b-a22b-instruct-2507
We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring the following key enhancements: Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. Substantial gains in long-tail knowledge coverage across multiple languages. Markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Enhanced capabilities in 256K long-context understanding.

Repository: localaiLicense: apache-2.0

Page 1