BOOK THIS SPACE FOR AD
ARTICLE ADGenerative AI (Gen AI) has advanced significantly since its public launch two years ago. The technology has led to transformative applications that can create text, images, and other media with impressive accuracy and creativity.
Also: We have an official open-source AI definition now
Open-source generative models are valuable for developers, researchers, and organizations wanting to leverage cutting-edge AI technology without incurring high licensing fees or restrictive commercial policies. Let's find out more.
Open-source vs. proprietary models
Open-source AI models offer several advantages, including customization, transparency, and community-driven innovation. These models allow users to tailor them to specific needs and benefit from ongoing enhancements. Additionally, they typically come with licenses that permit both commercial and non-commercial use, which enhances their accessibility and adaptability across various applications.
Also: The best free AI courses in 2024
However, open-source solutions are not always the best choice. In industries that demand strict regulatory compliance, data privacy, and specialized support, proprietary models often perform better. They provide stronger legal frameworks, dedicated customer support, and optimizations tailored to industry requirements. Closed-source solutions may also excel in highly specialized tasks, thanks to exclusive features designed for high performance and reliability.
When organizations require real-time updates, advanced security, or specialized functionalities, proprietary models can offer a more robust and secure solution, effectively balancing openness with the rigorous demands for quality and accountability.
The Open Source AI Definition
The Open Source Initiative (OSI) recently introduced the Open Source AI Definition (OSAID) to clarify what qualifies as genuinely open-source AI. To meet OSAID standards, a model must be fully transparent in its design and training data, enabling users to recreate, adapt, and use it freely.
Also: Can AI even be open source? It's complicated
However, some popular models, including Meta's LLaMA and Stability AI's Stable Diffusion, have licensing restrictions or lack transparency around training data, preventing full compliance with OSAID.
As part of the OSAID validation process, OSI assessed the following:
Compliant models: Pythia (Eleuther AI), OLMo (AI2), Amber and CrystalCoder (LLM360), and T5 (Google). Potentially compliant models: Bloom (BigScience), Starcoder2 (BigCode), and Falcon (TII) could meet OSAID standards with minor adjustments to licensing terms or transparency.Non-compliant models: LLaMA (Meta), Grok (X/Twitter), Phi (Microsoft), and Mixtral (Mistral) lack the necessary transparency or impose restrictive licensing terms.LLaMA and other non-compliant architectures
The Meta LLaMA architecture exemplifies noncompliance with OSAID due to its restrictive research-only license and lack of full transparency about training data, limiting commercial use and reproducibility. Derived models, like Mistral's Mixtral and the Vicuna Team's MiniGPT-4, inherit these restrictions, propagating LLaMA's noncompliance across additional projects.
Also: Want to work in AI? How to pivot your career in 5 steps
Beyond LLaMA-based models, other widely used architectures face similar issues. For example, Stability Diffusion by Stability AI employs the Creative ML OpenRAIL-M license, which includes ethical restrictions that deviate from OSAID's requirements for unrestricted use. Similarly, Grok by xAI combines proprietary elements with usage limitations, challenging its alignment with open-source ideals.
These examples underscore the difficulty of meeting OSAID's standards, as many AI developers balance open access with commercial and ethical considerations.
Implications for organizations: OSAID compliance vs. non-compliance
Choosing OSAID-compliant models gives organizations transparency, legal security, and full customizability features essential for responsible and flexible AI use. These compliant models adhere to ethical practices and benefit from strong community support, promoting collaborative development.
In contrast, non-compliant models may limit adaptability and rely more heavily on proprietary resources. For organizations that prioritize flexibility and alignment with open-source values, OSAID-compliant models are advantageous. However, non-compliant models can still be valuable when proprietary features are required.
Understanding licensing in open-source AI models
Open-source AI models are released under licenses that define usage, modification, and sharing conditions. While some licenses align with traditional open-source standards, others incorporate restrictions or ethical guidelines that prevent full OSAID compliance. Key licenses include:
Apache 2.0: A permissive license that allows free use, modification, and distribution, along with a patent grant. Apache 2.0 is OSI-approved and popular for open-source projects, providing flexibility and legal protection. MIT: Another permissive license that only requires attribution for reuse. Like Apache 2.0, MIT is OSI-approved, widely adopted, and offers simplicity and minimal restrictions.Creative ML OpenRAIL-M: A license designed for AI applications, allowing broad use but imposing ethical guidelines to prevent harmful use. OpenRAIL-M is not OSI-approved because it includes usage restrictions that conflict with the OSI's principles of unrestricted freedom. However, it is valued by developers aiming to prioritize ethical use in AI.CC BY-SA: The Creative Commons Share-Alike license permits free use and requires derivative works to remain open source. While it encourages open collaboration, it's not OSI-approved and is more commonly used for content rather than code, as it lacks some flexibility for software applications.CC BY-NC 4.0: A Creative Commons license that permits free use with attribution but restricts commercial applications. This license, used for certain model weights (like Meta's MusicGen and AudioGen), limits the models' usability in commercial environments and does not align with OSI's open-source standards.Custom licenses: Many models on our list, such as IBM's Granite and Nvidia's NeMo, operate under proprietary or custom licenses. These models often impose specific conditions for use or modify traditional open-source terms to align with commercial goals, making them non-compliant with open-source principles.Research-only licenses: Certain models, such as Meta's LLaMA and Codellama series, are available only under research-use terms. These licenses restrict use to academic or non-commercial purposes and prevent broad community-driven projects, as they do not meet OSI's open-source criteria.Requirements for running open-source AI models
Running open-source Gen AI models requires specific hardware, software environments, and toolsets for model training, fine-tuning, and deployment tasks. High-performance models with billions of parameters benefit from powerful GPU setups like Nvidia's A100 or H100.
Also: How open source attracts some of the world's top innovators
Essential environments typically include Python and machine learning libraries like PyTorch or TensorFlow. Specialized toolsets, including Hugging Face's Transformers library and Nvidia's NeMo, simplify the processes of fine-tuning and deployment. Docker helps maintain consistent environments across different systems, while Ollama allows for the local execution of large language models on compatible systems.
The following chart highlights essential toolsets, recommended hardware, and their specific functions for managing open-source AI models:
Toolset | Purpose | Requirements | Use |
Python | Primary programming environment | N/A | Essential for scripting and configuring models |
PyTorch | Model training and inference | GPU (e.g., Nvidia A100, H100) | Widely used library for deep learning models |
TensorFlow | Model training and inference | GPU (e.g., Nvidia A100, H100) | Alternative deep learning library |
Hugging Face Transformers | Model deployment and fine-tuning | GPU (preferred) | Library for accessing, fine-tuning, and deploying models |
Nvidia NeMo | Multimodal model support and deployment | Nvidia GPUs | Optimized for Nvidia hardware and multimodal tasks |
Docker | Environment consistency and deployment | Supports GPUs | Containerizes models for easy deployment |
Ollama | Running large language models locally | macOS, Linux, Windows, supports GPUs | Platform to run LLMs locally on compatible systems |
LangChain | Building applications with LLMs | Python 3.7+ | Framework for composing and deploying LLM-powered applications |
LlamaIndex | Connecting LLMs with external data sources | Python 3.7+ | Framework for integrating LLMs with data sources |
This setup establishes a robust framework for efficiently managing Gen AI models, from experimentation to production-ready deployment. Each tool set possesses unique strengths, enabling developers to tailor their environments for specific project needs.
Choosing the right model
Selecting the right gen AI model depends on several factors, including licensing requirements, desired performance, and specific functionality. While larger models tend to deliver higher accuracy and flexibility, they require substantial computational resources. Smaller models, on the other hand, are more suitable for resource-constrained applications and devices.
Also: IBM will train you in AI fundamentals for free, and give you a skill credential - in 10 hours
It's important to note that most models listed here, even those with traditionally open-source licenses like Apache 2.0 or MIT, do not meet the Open Source AI Definition (OSAID). This gap is primarily due to restrictions around training data transparency and usage limitations, which OSAID emphasizes as essential for true open-source AI. However, certain models, such as Bloom and Falcon, show potential for compliance with minor adjustments to their licenses or transparency protocols and may achieve full compliance over time.
The tables below provide an organized overview of the leading open-source generative AI models, categorized by type, issuer, and functionality, to help you choose the best option for your needs, whether a fully transparent, community-driven model or a high-performance tool with specific features and licensing requirements.
Language models
Language models are crucial in text-based applications such as chatbots, content creation, translation, and summarization. They are fundamental to natural language processing (NLP) and continually improve their understanding of language structure and context.
Notable models include Meta's LLaMA, EleutherAI's GPT-NeoX, and Nvidia's NVLM 1.0 family, each known for their unique strengths in multilingual, large-scale, and multimodal tasks.
Google T5 | Small to XXL | Apache 2.0 | High-performance language model, OSAID Compliant |
EleutherAI Pythia | Various | Apache 2.0 | Interpretability-focused, OSAID Compliant |
Allen Institute for AI (AI2) OLMo | Various | Apache 2.0 | Open language research model, OSAID Compliant |
BigScience BLOOM | 176B | OpenRAIL-M | Multilingual, responsible AI, OSAID Potential |
BigCode Starcoder2 | Various | Apache 2.0 | Code generation, OSAID Potential |
TII Falcon | 7B, 40B | Apache 2.0 | Efficient and high-performance, OSAID Potential |
AI21 Labs Jamba Series | Mini to Large | Custom | Language and chat generation |
AI Singapore Sea-Lion | 7B | Custom | Language and cultural representation |
Alibaba Qwen Series | 7B | Custom | Bilingual model (Chinese, English) |
Databricks Dolly 2.0 | 12B | CC BY-SA 3.0 | Open dataset, commercial use |
EleutherAI GPT-J | 6B | Apache 2.0 | General-purpose language model |
EleutherAI GPT-NeoX | 20B | MIT | Large-scale text generation |
Google Gemma 2 | 2B, 9B, 27B | Apache 2.0 | Language and code generation |
IBM Granite Series | 3B, 8B | Custom | Summarization, classification, RAG |
Meta LLaMA 3.2 | 1B to 405B | Research-only | Advanced NLP, multilingual |
Microsoft Phi-3 Series | Mini to Medium | MIT | Reasoning, cost-effective |
Mistral AI Mixtral 8x22B | 8x22B | Apache 2.0 | Sparse model, efficient reasoning |
Mistral AI Mistral 7B | 7B | Apache 2.0 | Dense, multilingual text generation |
Nvidia NVLM 1.0 Family | 72B | Custom | High-performance multimodal LLM |
Rakuten RakutenAI Series | 7B | Custom | Multilingual chat, NLP |
xAI Grok-1 | 314B | Apache 2.0 | Large-scale language model |
Image generation models
Image generation models create high-quality visuals or artwork from text prompts, which makes them invaluable for content creators, designers, and marketers.
Stability AI's Stable Diffusion is widely adopted due to its flexibility and output quality, while DeepFloyd's IF emphasizes generating realistic visuals with an understanding of language.
Stability AI Stable Diffusion 3.5 | 2.5B to 8B | OpenRAIL-M | High-quality image synthesis |
DeepFloyd IF | 400M to 4.3B | Custom | Realistic visuals with language comprehension |
OpenAI DALL-E 3 | Not disclosed | Custom | State-of-the-art text-to-image synthesis |
Google Imagen | Not disclosed | Custom | High-fidelity image generation from text |
Midjourney | Not disclosed | Custom | Artistic and stylized image generation |
Adobe Firefly | Not disclosed | Custom | Integrated AI image generation within Adobe products |
Vision models
Vision models analyze images and videos, supporting object detection, segmentation, and visual generation from text prompts.
Also: How Claude's new AI data analysis tool compares to ChatGPT's version (hint: it doesn't)
These technologies benefit several industries, including healthcare, autonomous vehicles, and media.
Meta SAM 2.1 | 38.9M to 224.4M | Apache 2.0 | Video editing, segmentation |
NVIDIA Consistency | Not disclosed | Custom | Character consistency across video frames |
NVIDIA VISTA-3D | Not disclosed | Custom | Medical imaging, anatomical segmentation |
NVIDIA NV-DINOv2 | Not disclosed | Non-commercial | Image embedding generation |
Google DeepLab | Not disclosed | Apache 2.0 | High-quality semantic image segmentation |
Microsoft Florence | 0.23B, 0.77B | MIT | General-purpose visual model for computer vision |
OpenAI CLIP | 400M | MIT | Text and image comprehension |
Audio models
Audio models process and generate audio data, enabling speech recognition, text-to-speech synthesis, music composition, and audio enhancement.
Coqui.ai TTS | N/A | MPL 2.0 | Text-to-speech synthesis, multi-language support |
ESPnet ESPnet | N/A | Apache 2.0 | End-to-end speech processing toolkit |
Facebook AI wav2vec 2.0 | Base (95M), Large (317M) | Apache 2.0 | Self-supervised speech recognition |
Hugging Face Transformers (Speech Models) | Various | Apache 2.0 | Collection of ASR and TTS models |
Magenta MusicVAE | N/A | Apache 2.0 | Music generation and interpolation |
Meta MusicGen | N/A | MIT / CC BY-NC 4.0 | Music generation from text prompts |
Meta AudioGen | N/A | MIT / CC BY-NC 4.0 | Sound effect generation from text prompts |
Meta EnCodec | N/A | MIT / CC BY-NC 4.0 | High-quality audio compression |
Mozilla DeepSpeech | N/A | MPL 2.0 | End-to-end speech-to-text engine |
NVIDIA NeMo (Speech Models) | Various | Apache 2.0 | ASR and TTS models optimized for Nvidia GPUs |
OpenAI Jukebox | N/A | MIT | Neural music generation with genre/artist conditioning |
OpenAI Whisper | 39M to 1.6B | MIT | Multilingual speech recognition and transcription |
TensorFlow TFLite Speech Models | N/A | Apache 2.0 | Speech recognition models optimized for mobile devices |
Multimodal models
Multimodal models combine text, images, audio, and other data types to create content from various inputs.
Also: How AI hallucinations could help create life-saving antibiotics
These models are effective in applications requiring language, visual, and sensory understanding.
Allen Institute for AI (AI2) Molmo | 1B, 70B | Apache 2.0 | A multimodal AI model that processes text and visual inputs, OSAID-compliant |
Meta ImageBind | N/A | Custom | Integrates six data types: text, images, audio, depth, thermal, and IMU. |
Meta SeamlessM4T | N/A | Custom | Provides multilingual translation and transcription services. |
Meta Spirit LM | N/A | Custom | Combines text and speech to produce natural-sounding outputs. |
Microsoft Florence-2 | 0.23B, 0.77B | MIT | Handles computer vision and language tasks proficiently. |
NVIDIA VILA | N/A | Custom | Processes vision-language tasks effectively. |
OpenAI CLIP | 400M | MIT | Excels in text and image comprehension. |
Vicuna Team MiniGPT-4 | 13B | Apache 2.0 | Capable of understanding both text and images. |
Retrieval-augmented generation (RAG)
RAG models merge generative AI with information retrieval, allowing them to incorporate relevant data from extensive datasets into their responses.
BAAI BGE-M3 | N/A | Custom | Dense and sparse retrieval optimization |
IBM Granite 3.0 Series | 3B, 8B | Custom | Advanced retrieval, summarization, RAG |
Nvidia EmbedQA & ReRankQA | 1B | Custom | Multilingual QA, GPU-accelerated retrieval |
Specialized models
Specialized models are optimized for specific fields, such as programming, scientific research, and healthcare, offering enhanced functionality tailored to their domains.
Meta Codellama Series | 7B, 13B, 34B | Custom | Code generation, multilingual programming |
Mistral AI Mamba-Codestral | 7B | Apache 2.0 | Focused on coding and multilingual capabilities |
Mistral AI Mathstral | 7B | Apache 2.0 | Specialized in mathematical reasoning |
Guardrail models
Guardrail models ensure safe and responsible outputs by detecting and mitigating biases, inappropriate content, and harmful responses.
NVIDIA NeMo Guardrails | N/A | Apache 2.0 | Open-source toolkit for adding programmable guardrails |
Google ShieldGemma | 2B, 9B, 27B | Custom | Safety classifier models built on Gemma 2 |
IBM Granite-Guardian | 8B | Custom | Detects unethical or harmful content |
Choose open-source models
The landscape of generative AI is evolving rapidly, with open-source models crucial for making advanced technology accessible to all. These models allow for customization and collaboration, breaking down barriers that have limited AI development to large corporations.
Also: 4 ways to turn generative AI experiments into real business value
Developers can tailor solutions to their needs by choosing open-source Gen AI, contributing to a global community, and accelerating technological progress. The variety of available models -- from language and vision to safety-focused designs -- ensures options for almost any application.
Supporting open-source AI communities will be essential for promoting ethical and innovative AI developments, benefiting individual projects, and advancing technology responsibly.