Open vs Closed Weights — AI Encyclopedia

1.1

What "open" means — weights, data, code, license

"Open" is not one thing. A model release is really four separable artifacts, and a given model can be open on some axes and shut on others. Reading a release correctly means asking which of these you actually get:

Artifact	What it is	What having it buys you
Weights	the trained parameters	Run it yourself, fine-tune it, inspect it, keep it forever — the load-bearing piece.
Training data	the corpus it learned from	Reproduce training, audit for contamination or bias, understand what it knows. Almost never released.
Training code	the recipe (data pipeline, hyperparameters)	Re-train or extend from scratch. Sometimes released, often partially.
License	the legal terms on all of the above	Decides whether you may use it commercially, redistribute it, or train other models on its outputs.

The crucial and widely misunderstood point: "open weights" says nothing about open data. When Meta, Mistral, or DeepSeek ship a model, you get a file of numbers and a license — not the trillions of tokens they trained on. That is why purists distinguish open-weight (weights downloadable, data secret) from genuinely open-source (weights, data, and code all released under an OSI-style license). The Open Source Initiative's 2024 definition of "open-source AI" requires enough information to recreate the system; by that bar, most "open" LLMs are open-weight, not open-source. Truly data-open models — OLMo, Pythia, the SmolLM family — exist but are the exception.

TERMS

Open-weight: the parameters are downloadable; you can run and fine-tune them. Open-source AI: weights plus the data and code needed to reproduce them, under a recognized open license. Closed / proprietary: reachable only as a hosted API — you rent inference, you never possess the model.

Where a model sits on this spectrum cascades downstream. With the weights in hand you can run offline, send no data to a third party, fine-tune freely, quantize to fit your hardware, and pin a version that will never change under you. Without them, you trade all of that for someone else's operations team, frontier quality, and a metered bill. Neither is "better" in the abstract — the rest of this chapter is about matching the trade to the job.

True or false: a model being released as "open weights" guarantees that its training data is also public. (Answer true or false.)

Open weights means only the trained parameters are downloadable. The training corpus is a separate artifact and is almost never released — Llama, Mistral, DeepSeek, and Qwen all ship weights without their data. The claim is false.

1.2

The closed frontier — GPT, Claude, Gemini

The three labs that most often define the capability frontier — OpenAI (GPT), Anthropic (Claude), and Google DeepMind (Gemini) — ship their flagship models as closed, API-only products. You never see the weights; you send tokens in and get tokens out, billed per token. This is a deliberate posture, motivated by a mix of commercial moat, safety (harder to strip alignment from a model you can't download), and the sheer operational cost of serving models that may exceed a trillion parameters.

What you get for closing the box is real: the strongest available models on hard reasoning, coding, and multimodal tasks; a managed service with no GPUs to babysit; instant access to the newest version; and features — long context, tool use, structured output, content moderation — maintained by a large team. What you give up is everything that requires possession: your prompts leave your premises, you cannot fine-tune the base weights (only the limited adapters the provider exposes), the model can change or be deprecated underneath you, and at high volume you pay forever on a per-token meter.

EQ OM1.1 — THE COST OF RENTING INFERENCE $$ C_{\text{api}} \;=\; p \times n $$

$C_{\text{api}}$ is your monthly bill, $p$ is the blended price per token (input and output mixed), and $n$ is your monthly token volume. The defining feature of the closed model is that this is purely marginal: there is no fixed cost, but every token costs money forever, so the bill scales linearly and without bound as you grow. Prices have fallen by roughly an order of magnitude per year — but the structure stays linear.

An honest caveat on "open-ness" at the frontier. The closed labs do release a great deal of research — model cards, system cards, safety evaluations, and detailed technical reports — even when the weights stay private. And the open/closed line is blurring from both sides: OpenAI returned to open weights in 2025 with the gpt-oss family, and frontier-class open-weight models from DeepSeek and others have repeatedly narrowed the gap to the best closed systems. "Closed" describes the weights, not the secrecy of the whole enterprise.

1.3

The open ecosystem — Llama, Mistral, DeepSeek, Qwen

The open-weight world is no longer a poor cousin of the frontier; for many tasks it is competitive, and for privacy-sensitive or high-volume deployments it is the obvious default. Four families anchor it:

Family	Maker	Character	License posture
Llama	Meta	The release that catalyzed the open ecosystem; broad sizes, huge fine-tune community.	Custom community license (commercial OK; an acceptable-use policy applies; a >700M-MAU clause).
Mistral	Mistral AI	Efficient dense and mixture-of-experts models; several flagships under a true open license.	Apache-2.0 for the open releases; some models are licensed-research-only.
DeepSeek	DeepSeek-AI	Frontier-class MoE and reasoning models trained at a fraction of typical cost.	Permissive (MIT-style) weights — among the most open of the frontier-adjacent releases.
Qwen	Alibaba	Very wide size ladder (sub-1B to hundreds of B), strong multilingual and coding variants.	Mostly Apache-2.0 on recent releases; a few older/largest under custom terms.

Two structural facts make this ecosystem powerful. First, weights compound. Because anyone can download and adapt them, a single base model spawns thousands of community fine-tunes, quantizations, and merges — a base like Llama or Qwen becomes a platform, not a product. Second, open weights set a price floor. When a free, near-frontier model exists, no API can charge frontier prices for commodity work; the open releases discipline the entire market's pricing, which is part of why per-token costs keep collapsing.

"Open" here still means open-weight, not open-data: Llama, Mistral, DeepSeek, and Qwen all ship parameters and a license without the training corpus. DeepSeek's published reports are unusually detailed about method (architecture, training procedure, cost), which is why its releases feel especially transparent — but the data itself remains undisclosed, exactly as §1.1 warns.

1.4

Licenses & their fine print

The license is the part teams skim and later regret. With closed APIs the terms are a service agreement; with open weights the license travels with the file and governs what you may build. Four questions cover most of it:

Commercial use. May you use it in a paid product at all? Apache-2.0 and MIT: unambiguously yes. "Research-only" / non-commercial (CC-BY-NC, some Mistral and Gemma research releases): no.
Redistribution. May you re-host or re-ship the weights, and under what notices? Permissive licenses allow it with attribution; community licenses add naming and labeling requirements.
Acceptable use. Nearly every modern release — open or closed — attaches an acceptable-use policy banning specific harms. This is a real, enforceable restriction even on "open" weights.
Training on outputs / scale clauses. Some licenses restrict using the model's outputs to train competitors; Meta's community license adds a famous clause requiring a separate agreement if your product exceeds 700 million monthly active users.

The headline distinction is between a true open-source license (OSI-approved: Apache-2.0, MIT — minimal conditions, no field-of-use limits) and a community / source-available license (Llama's terms, many "open" releases — commercially usable but with carve-outs, acceptable-use policies, or scale clauses). Both let a startup ship a product. They differ in the edge cases — the hyperscaler-grade scale clause, the can't-train-a-competitor clause — that only bite a minority of users but bite hard when they do.

FINE PRINT

"Open" is not a synonym for "no rules." A community-licensed model can forbid certain uses, require you to display the model's name, or demand a special agreement at extreme scale — and a non-commercial license forbids the one thing a business needs most. Always read the actual file's LICENSE and acceptable-use policy before you build; the family name (Llama, Mistral, Qwen) does not by itself tell you the terms, because different models in the same family ship under different licenses.

INSTRUMENT OM1.1 — LICENSE COMPARISON MATRIXPERMITTED USES · OPEN VS CLOSED

LICENSE

TYPICAL EXAMPLE

—

PERMITS (OF 6)

—

Each column is a license posture; each row is a thing you might want to do. Green ✓ = generally permitted, red ✗ = forbidden, amber ~ = permitted with conditions (attribution, naming, a scale clause, or provider policy). Note that even the most permissive open license still carries an acceptable-use expectation, and the closed API forbids the two things that require the file itself: holding the weights and fine-tuning the base. This is a teaching simplification — the real LICENSE file always governs.

True or false: every model labeled "open" may be used freely in a commercial product with no restrictions. (Answer true or false.)

Some "open" releases are non-commercial (forbidding paid use entirely), and community licenses attach acceptable-use policies and scale clauses. Only OSI-approved licenses like Apache-2.0 and MIT come close to "no restrictions" — and even they expect lawful use. The blanket claim is false.

1.5

Choosing for a use case

The decision rarely turns on raw benchmark scores. It turns on four forces — privacy, cost, control, capability — and which one your specific job cares about most. A quick map of where each side wins:

If your job is most about…	Lean	Why
Data privacy / sovereignty	OPEN	Weights run inside your network; no prompt ever leaves. Decisive for regulated, medical, on-prem, or air-gapped settings.
Lowest cost at high volume	OPEN	Self-hosting trades a fixed cost for a tiny marginal cost; past a breakeven volume it is cheaper (EQ OM1.2).
Maximum capability, fast	CLOSED	The frontier APIs still lead on the hardest reasoning and multimodal tasks, with zero ops burden.
Version stability / customization	OPEN	Pin a version forever; fine-tune the base; quantize to your hardware. The API can change or deprecate under you.
Low volume, fast iteration	CLOSED	Below breakeven the per-token bill is trivial and there is nothing to operate — ideal for prototypes and spiky traffic.

The cost axis has the cleanest math, so it is worth making explicit. Renting (EQ OM1.1) is all marginal: price per token times volume. Self-hosting flips the shape — a large fixed cost (the GPU you rent or buy, running whether you use it or not) plus a very small marginal cost per token (electricity, amortized hardware). The two cost curves cross at a breakeven volume:

EQ OM1.2 — SELF-HOST BREAKEVEN $$ p\,n_{\!*} \;=\; F + m\,n_{\!*} \quad\Longrightarrow\quad n_{\!*} \;=\; \frac{F}{\,p - m\,} $$

$p$ is the API price per token, $F$ is the fixed monthly self-host cost, $m$ is the self-host marginal cost per token, and $n_{\!*}$ is the monthly volume at which the two are equal. Below $n_{\!*}$, renting is cheaper; above it, self-hosting wins and keeps winning, because the fixed cost is amortized over ever more tokens. Self-hosting is high fixed cost, low marginal cost — the mirror image of the pure-marginal API. The formula needs $p > m$, which is essentially always true once you are at the volume where this question matters.

True or false: self-hosting an open model has a high fixed cost but a low marginal cost per token, the opposite of the pure-marginal API bill. (Answer true or false.)

Self-hosting pays for a GPU node up front ($F$, incurred whether or not you send a single token) and then almost nothing per token ($m$, just electricity and amortization). The API has no fixed cost but charges $p$ on every token. So the claim is true — and it is exactly why breakeven volume $n_{\!*}=F/(p-m)$ exists.

PYTHON · RUNNABLE IN-BROWSER

# EQ OM1.2: API (pure marginal) vs self-host (fixed + marginal), find breakeven
import numpy as np

p = 3.0 / 1e6     # API price: $3 per 1,000,000 tokens
F = 1500.0        # self-host fixed cost: one GPU node, $/month
m = 0.20 / 1e6    # self-host marginal: $0.20 per 1,000,000 tokens (power + amort.)

n_star = F / (p - m)                       # breakeven monthly token volume
print(f"breakeven volume n* = {n_star/1e6:,.1f} M tokens / month")

vols = np.array([100, 300, n_star/1e6, 800, 2000]) * 1e6   # tokens/month
print("\n volume(M)   API $/mo   self-host $/mo   cheaper")
for n in vols:
    api  = p * n
    self = F + m * n
    who  = "API" if api < self else ("self-host" if self < api else "TIE")
    print(f"  {n/1e6:7.1f}  {api:9,.0f}    {self:11,.0f}     {who}")

print(f"\nbelow {n_star/1e6:,.0f}M tok/mo rent; above it, own the weights.")
plot_xy([v/1e6 for v in vols], [p*v for v in vols])   # API cost curve (linear)

edits are live — break it on purpose

INSTRUMENT OM1.2 — COST / CONTROL / PRIVACY EXPLOREREQ OM1.2 · LIVE BREAKEVEN

MONTHLY VOLUME 300M tok

API PRICE ($/1M tok) $3.00

SELF-HOST FIXED ($/mo) $1,500

API COST / MO

—

SELF-HOST / MO

—

BREAKEVEN VOLUME

—

CHEAPER HERE

—

Drag volume across the breakeven line and watch the verdict flip. Cost is only one axis: self-hosting also keeps prompts on-premises (privacy) and pins a version you control (control), which is why teams sometimes self-host below breakeven and accept a higher bill. The dashed line marks $n_{\!*}=F/(p-m)$; marginal self-host cost is held at $\$0.20$ per 1M tokens.

PYTHON · RUNNABLE IN-BROWSER

# Score candidates against a weighted use-case rubric (0-10 per criterion)
import numpy as np

criteria = ["privacy", "cost@scale", "capability", "control", "ops_ease"]
weights  = np.array([0.30, 0.25, 0.20, 0.15, 0.10])   # this team prizes privacy
assert abs(weights.sum() - 1.0) < 1e-9

candidates = {
    "Closed API (GPT/Claude/Gemini)": [2, 4, 10, 3,  9],
    "Open self-host (Llama/Qwen)":    [10, 9,  7, 9,  4],
    "Open via managed endpoint":      [5, 6,  7, 6,  8],
}

print(f"weights: {dict(zip(criteria, weights))}\n")
ranked = []
for name, s in candidates.items():
    score = float(np.dot(weights, np.array(s, float)))
    ranked.append((score, name))
    print(f"  {name:34s}  weighted score = {score:5.2f} / 10")

ranked.sort(reverse=True)
print(f"\nwinner for a privacy-first team: {ranked[0][1]} ({ranked[0][0]:.2f})")
print("flip the weights toward 'capability' and the closed API wins instead.")

edits are live — break it on purpose

INSTRUMENT OM1.3 — OPEN-VS-CLOSED DECISION TREEFOUR QUESTIONS · ONE RECOMMENDATION

MUST DATA STAY ON-PREM?

HIGH, STEADY VOLUME?

NEED THE FRONTIER?

HAVE MLOps CAPACITY?

RECOMMENDATION

—

DECIDING FACTOR

—

Flip the four switches and the path lights up. Privacy is the hard override — if data cannot leave, you are self-hosting open weights regardless of everything else. Otherwise high steady volume pushes toward open (cross EQ OM1.2's breakeven), a true frontier requirement pulls toward closed, and thin MLOps capacity nudges you to a managed open endpoint as the middle path.

You have decided to hold the weights — now you have to run them. Chapter 02 turns the choice into practice: the runtimes (llama.cpp, vLLM, Ollama), how quantization shrinks a model to fit the hardware you actually have, and the throughput-versus-latency knobs that decide what serving really costs.

1.R

References

Touvron, H., Martin, L., Stone, K. et al. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 — the release (and custom community license) that catalyzed the open-weight ecosystem.
Jiang, A. Q., Sablayrolles, A., Mensch, A. et al. (2023). Mistral 7B. arXiv:2310.06825 — an Apache-2.0 dense model that set the efficiency bar for small open models.
DeepSeek-AI (2024). DeepSeek-V3 Technical Report. arXiv:2412.19437 — a frontier-class MoE trained at a fraction of typical cost, with unusually open methodology.
Yang, A., Yang, B., Hui, B. et al. (2024). Qwen2 Technical Report. arXiv:2407.10671 — the Qwen family's wide size ladder and (mostly) Apache-2.0 licensing.
Groeneveld, D., Beltagy, I., Walsh, P. et al. (2024). OLMo: Accelerating the Science of Language Models. arXiv:2402.00838 — a genuinely open-source model: weights, data, and training code all released.
Open Source Initiative (2024). The Open Source AI Definition 1.0. Official text — the bar separating open-source AI from merely open-weight releases.
Meta (2024). Llama 3 Community License Agreement. Primary source — the commercial terms, acceptable-use policy, and 700M-MAU scale clause discussed in §1.4.