Jump to content

Talk:Llama (language model)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

opensource

[edit]

I dont get it why there is the GPLv3 reference and the claim on opensource, since it clearly is *NOT* opensource as can be seen on the license https://github.com/facebookresearch/llama/blob/main/LICENSE Bertboerland (talk) 20:40, 30 November 2023 (UTC)[reply]

I've removed this article from Category:Open-source artificial intelligence, as Llama is source-available but its license has restrictions that prevent it from being open-source, per sources such as Ars Technica. — Newslinger talk 21:07, 8 December 2024 (UTC)[reply]

restored 4chan references

[edit]

@DIYeditor: per your indication at metawiki, I have restored a version of this article to one that includes the 4chan links. Any subsequent edits may need to be redone. — billinghurst sDrewth 20:49, 6 December 2023 (UTC)[reply]

IP range block on this article for the person just removing the references and not communicating about their changes. — billinghurst sDrewth 11:33, 27 December 2023 (UTC)[reply]

Move page to "Llama (language model)"

[edit]

All current versions since Llama 2 no longer use the capitalization we currently see for the article title. Adding (language model) is inline with similar articles, Claude (language model), and Gemini (language model).

Should we move to reflect this change? Nuclearelement (talk) 11:45, 13 May 2024 (UTC)[reply]

ranging between 1B and 405B

[edit]

Explain this. 1 banana oder 1 billion. 1 billion what exactly parameters? Kr 17387349L8764 (talk) 09:05, 1 October 2024 (UTC)[reply]

context length is not changed during fine-tuning

[edit]

"Unlike GPT-4 which increased context length during fine-tuning, Llama 2 and Code Llama - Chat have the same context length of 4K tokens."

GPT4 did not increase context length during fine tuning. afaik no LLMs change context length like that. vvarkey (talk) 04:49, 16 October 2024 (UTC)[reply]

Training data cutoff date

[edit]

This is original research, so can't be used in the article, but the training data cutoff date is September 7, 2023. Llama can answer questions about events on and before Sept 7, but events on Sept 8 are unknown or get entirely hallucinated answers. --21:00, 21 February 2025 (UTC) 91.64.59.91 (talk) 21:00, 21 February 2025 (UTC)[reply]

Concerns

[edit]

There's quite a lot of concerns out there, including from at least two regulators. Should this have a section of its own or be under "Reception"? 209.93.202.123 (talk) 07:59, 6 June 2025 (UTC)[reply]

If there are notable concerns that are not already covered, I guess it would be under the "Reception" section (or potentially in the section corresponding to the model, e.g. "Llama 4"). Alenoach (talk) 01:48, 7 June 2025 (UTC)[reply]

Awkward is the phrasing

[edit]

This sentence in the section "Background" seems awkward:

"An empirical investigation of the Llama series was the scaling laws."

Would this be better?

"An empirical investigation was made of Llama's scaling laws."

Or maybe just this...

"The model's scaling laws were investigated."

Hyperlinks omitted, here. Very new to LLMs; reluctant to be bold with this. azwaldo (talk) 14:09, 3 December 2025 (UTC)[reply]

There was indeed an issue, thanks for reporting this. I changed the paragraph. Alenoach (talk) 03:23, 4 December 2025 (UTC)[reply]

Notes on readability

[edit]

This article has been a difficult read, much of it seemingly focused on highly technical aspects of Llama's history and development. Wikipedia encourages that "...every effort should be made to also render articles accessible and pleasant to read for the broadest audience" (from Help:How to write a readable article).

Here are several questions or suggestions stemming from the lead section, alone:

  1. "...from 1 billion to 2 trillion parameters" (What is a parameter in this context, and how is it significant?)
  2. "...released instruction fine-tuned versions" (This seems awkward. Would "instruction" work better as "instructional", or could it be that "instruction" is redundant with "fine-tuned"?)
  3. "Model weights for the first version of Llama..." (What is a model weight in this context, and why is it significant?)
  4. The significance of this line might be lost on many readers: "Subsequent versions of Llama were made accessible outside academia..." (If not mistaken, this implies that anyone can download, install, and operate their own instance of an LLM? Don't think that is possible with ChatGPT or Gemini.)
  5. The notability of Llama could be highlighted with several items from this article's "Applications" section. (An article at Medium claims: Meta’s Llama AI Models Gain Traction Among Major Corporations, with mention of a number of additional use cases.)
azwaldo (talk) 22:35, 5 December 2025 (UTC)[reply]
A lot of articles are too complex for readers who discover the topic, and the second paragraph is indeed quite technical.
1. There should probably be an article to explain what a parameter is in machine learning. The parameters are basically the values that are modified during the training. 1 billion is very little for a LLM, 2 trillion is quite a lot (needs nearly a hundred GPUs to run the model).
2. The term "instruction fine-tuned" is correct. If the term is too complicated, maybe it could be changed to "fine-tuned" if clearer.
3. Same as a parameter, so we could reuse that term instead. Using two different terms to refer to the same thing can be confusing for unfamiliar readers.
4. Yes, that's what it means.
5. Depends on whether there are examples of applications are particularly notable. Alenoach (talk) 10:15, 6 December 2025 (UTC)[reply]