What "open" means — weights, data, code, license
"Open" is not one thing. A model release is really four separable artifacts, and a given model can be open on some axes and shut on others. Reading a release correctly means asking which of these you actually get:
| Artifact | What it is | What having it buys you |
|---|---|---|
| Weights | the trained parameters | Run it yourself, fine-tune it, inspect it, keep it forever — the load-bearing piece. |
| Training data | the corpus it learned from | Reproduce training, audit for contamination or bias, understand what it knows. Almost never released. |
| Training code | the recipe (data pipeline, hyperparameters) | Re-train or extend from scratch. Sometimes released, often partially. |
| License | the legal terms on all of the above | Decides whether you may use it commercially, redistribute it, or train other models on its outputs. |
The crucial and widely misunderstood point: "open weights" says nothing about open data. When Meta, Mistral, or DeepSeek ship a model, you get a file of numbers and a license — not the trillions of tokens they trained on. That is why purists distinguish open-weight (weights downloadable, data secret) from genuinely open-source (weights, data, and code all released under an OSI-style license). The Open Source Initiative's 2024 definition of "open-source AI" requires enough information to recreate the system; by that bar, most "open" LLMs are open-weight, not open-source. Truly data-open models — OLMo, Pythia, the SmolLM family — exist but are the exception.
Open-weight: the parameters are downloadable; you can run and fine-tune them. Open-source AI: weights plus the data and code needed to reproduce them, under a recognized open license. Closed / proprietary: reachable only as a hosted API — you rent inference, you never possess the model.
Where a model sits on this spectrum cascades downstream. With the weights in hand you can run offline, send no data to a third party, fine-tune freely, quantize to fit your hardware, and pin a version that will never change under you. Without them, you trade all of that for someone else's operations team, frontier quality, and a metered bill. Neither is "better" in the abstract — the rest of this chapter is about matching the trade to the job.
The closed frontier — GPT, Claude, Gemini
The three labs that most often define the capability frontier — OpenAI (GPT), Anthropic (Claude), and Google DeepMind (Gemini) — ship their flagship models as closed, API-only products. You never see the weights; you send tokens in and get tokens out, billed per token. This is a deliberate posture, motivated by a mix of commercial moat, safety (harder to strip alignment from a model you can't download), and the sheer operational cost of serving models that may exceed a trillion parameters.
What you get for closing the box is real: the strongest available models on hard reasoning, coding, and multimodal tasks; a managed service with no GPUs to babysit; instant access to the newest version; and features — long context, tool use, structured output, content moderation — maintained by a large team. What you give up is everything that requires possession: your prompts leave your premises, you cannot fine-tune the base weights (only the limited adapters the provider exposes), the model can change or be deprecated underneath you, and at high volume you pay forever on a per-token meter.
An honest caveat on "open-ness" at the frontier. The closed labs do release a great deal of research — model cards, system cards, safety evaluations, and detailed technical reports — even when the weights stay private. And the open/closed line is blurring from both sides: OpenAI returned to open weights in 2025 with the gpt-oss family, and frontier-class open-weight models from DeepSeek and others have repeatedly narrowed the gap to the best closed systems. "Closed" describes the weights, not the secrecy of the whole enterprise.
The open ecosystem — Llama, Mistral, DeepSeek, Qwen
The open-weight world is no longer a poor cousin of the frontier; for many tasks it is competitive, and for privacy-sensitive or high-volume deployments it is the obvious default. Four families anchor it:
| Family | Maker | Character | License posture |
|---|---|---|---|
| Llama | Meta | The release that catalyzed the open ecosystem; broad sizes, huge fine-tune community. | Custom community license (commercial OK; an acceptable-use policy applies; a >700M-MAU clause). |
| Mistral | Mistral AI | Efficient dense and mixture-of-experts models; several flagships under a true open license. | Apache-2.0 for the open releases; some models are licensed-research-only. |
| DeepSeek | DeepSeek-AI | Frontier-class MoE and reasoning models trained at a fraction of typical cost. | Permissive (MIT-style) weights — among the most open of the frontier-adjacent releases. |
| Qwen | Alibaba | Very wide size ladder (sub-1B to hundreds of B), strong multilingual and coding variants. | Mostly Apache-2.0 on recent releases; a few older/largest under custom terms. |
Two structural facts make this ecosystem powerful. First, weights compound. Because anyone can download and adapt them, a single base model spawns thousands of community fine-tunes, quantizations, and merges — a base like Llama or Qwen becomes a platform, not a product. Second, open weights set a price floor. When a free, near-frontier model exists, no API can charge frontier prices for commodity work; the open releases discipline the entire market's pricing, which is part of why per-token costs keep collapsing.
"Open" here still means open-weight, not open-data: Llama, Mistral, DeepSeek, and Qwen all ship parameters and a license without the training corpus. DeepSeek's published reports are unusually detailed about method (architecture, training procedure, cost), which is why its releases feel especially transparent — but the data itself remains undisclosed, exactly as §1.1 warns.
Licenses & their fine print
The license is the part teams skim and later regret. With closed APIs the terms are a service agreement; with open weights the license travels with the file and governs what you may build. Four questions cover most of it:
- Commercial use. May you use it in a paid product at all? Apache-2.0 and MIT: unambiguously yes. "Research-only" / non-commercial (CC-BY-NC, some Mistral and Gemma research releases): no.
- Redistribution. May you re-host or re-ship the weights, and under what notices? Permissive licenses allow it with attribution; community licenses add naming and labeling requirements.
- Acceptable use. Nearly every modern release — open or closed — attaches an acceptable-use policy banning specific harms. This is a real, enforceable restriction even on "open" weights.
- Training on outputs / scale clauses. Some licenses restrict using the model's outputs to train competitors; Meta's community license adds a famous clause requiring a separate agreement if your product exceeds 700 million monthly active users.
The headline distinction is between a true open-source license (OSI-approved: Apache-2.0, MIT — minimal conditions, no field-of-use limits) and a community / source-available license (Llama's terms, many "open" releases — commercially usable but with carve-outs, acceptable-use policies, or scale clauses). Both let a startup ship a product. They differ in the edge cases — the hyperscaler-grade scale clause, the can't-train-a-competitor clause — that only bite a minority of users but bite hard when they do.
"Open" is not a synonym for "no rules." A community-licensed model can forbid certain uses, require you to display the model's name, or demand a special agreement at extreme scale — and a non-commercial license forbids the one thing a business needs most. Always read the actual file's LICENSE and acceptable-use policy before you build; the family name (Llama, Mistral, Qwen) does not by itself tell you the terms, because different models in the same family ship under different licenses.
Choosing for a use case
The decision rarely turns on raw benchmark scores. It turns on four forces — privacy, cost, control, capability — and which one your specific job cares about most. A quick map of where each side wins:
| If your job is most about… | Lean | Why |
|---|---|---|
| Data privacy / sovereignty | OPEN | Weights run inside your network; no prompt ever leaves. Decisive for regulated, medical, on-prem, or air-gapped settings. |
| Lowest cost at high volume | OPEN | Self-hosting trades a fixed cost for a tiny marginal cost; past a breakeven volume it is cheaper (EQ OM1.2). |
| Maximum capability, fast | CLOSED | The frontier APIs still lead on the hardest reasoning and multimodal tasks, with zero ops burden. |
| Version stability / customization | OPEN | Pin a version forever; fine-tune the base; quantize to your hardware. The API can change or deprecate under you. |
| Low volume, fast iteration | CLOSED | Below breakeven the per-token bill is trivial and there is nothing to operate — ideal for prototypes and spiky traffic. |
The cost axis has the cleanest math, so it is worth making explicit. Renting (EQ OM1.1) is all marginal: price per token times volume. Self-hosting flips the shape — a large fixed cost (the GPU you rent or buy, running whether you use it or not) plus a very small marginal cost per token (electricity, amortized hardware). The two cost curves cross at a breakeven volume:
# EQ OM1.2: API (pure marginal) vs self-host (fixed + marginal), find breakeven
import numpy as np
p = 3.0 / 1e6 # API price: $3 per 1,000,000 tokens
F = 1500.0 # self-host fixed cost: one GPU node, $/month
m = 0.20 / 1e6 # self-host marginal: $0.20 per 1,000,000 tokens (power + amort.)
n_star = F / (p - m) # breakeven monthly token volume
print(f"breakeven volume n* = {n_star/1e6:,.1f} M tokens / month")
vols = np.array([100, 300, n_star/1e6, 800, 2000]) * 1e6 # tokens/month
print("\n volume(M) API $/mo self-host $/mo cheaper")
for n in vols:
api = p * n
self = F + m * n
who = "API" if api < self else ("self-host" if self < api else "TIE")
print(f" {n/1e6:7.1f} {api:9,.0f} {self:11,.0f} {who}")
print(f"\nbelow {n_star/1e6:,.0f}M tok/mo rent; above it, own the weights.")
plot_xy([v/1e6 for v in vols], [p*v for v in vols]) # API cost curve (linear)
# Score candidates against a weighted use-case rubric (0-10 per criterion)
import numpy as np
criteria = ["privacy", "cost@scale", "capability", "control", "ops_ease"]
weights = np.array([0.30, 0.25, 0.20, 0.15, 0.10]) # this team prizes privacy
assert abs(weights.sum() - 1.0) < 1e-9
candidates = {
"Closed API (GPT/Claude/Gemini)": [2, 4, 10, 3, 9],
"Open self-host (Llama/Qwen)": [10, 9, 7, 9, 4],
"Open via managed endpoint": [5, 6, 7, 6, 8],
}
print(f"weights: {dict(zip(criteria, weights))}\n")
ranked = []
for name, s in candidates.items():
score = float(np.dot(weights, np.array(s, float)))
ranked.append((score, name))
print(f" {name:34s} weighted score = {score:5.2f} / 10")
ranked.sort(reverse=True)
print(f"\nwinner for a privacy-first team: {ranked[0][1]} ({ranked[0][0]:.2f})")
print("flip the weights toward 'capability' and the closed API wins instead.")
You have decided to hold the weights — now you have to run them. Chapter 02 turns the choice into practice: the runtimes (llama.cpp, vLLM, Ollama), how quantization shrinks a model to fit the hardware you actually have, and the throughput-versus-latency knobs that decide what serving really costs.
References
- Touvron, H., Martin, L., Stone, K. et al. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models.
- Jiang, A. Q., Sablayrolles, A., Mensch, A. et al. (2023). Mistral 7B.
- DeepSeek-AI (2024). DeepSeek-V3 Technical Report.
- Yang, A., Yang, B., Hui, B. et al. (2024). Qwen2 Technical Report.
- Groeneveld, D., Beltagy, I., Walsh, P. et al. (2024). OLMo: Accelerating the Science of Language Models.
- Open Source Initiative (2024). The Open Source AI Definition 1.0.
- Meta (2024). Llama 3 Community License Agreement.