ChatGPT在做什么第2章概率从何而来?

00:00

04:54

ChatGPT在做什么第2章概率从何而来? Where Do the Probabilities Come From?

ChatGPT总是根据概率选择下一个单词。 OK, so ChatGPT always picks its next word based on probabilities. 但这些概率从何而来? But where do those probabilities come from? 让我们从一个简单的问题开始。 Let’s start with a simpler problem. 让我们考虑一次生成一个字母(而不是单词)的英语文本。 Let’s consider generating English text one letter (rather than word) at a time. 我们怎么算出每个字母的概率是多少? How can we work out what the probability for each letter should be?我们能做的最简单的事情就是取一个英文文本样本，计算不同字母出现的频率。 A very minimal thing we could do is just take a sample of English text, and calculate how often different letters occur in it. 因此，例如，这在维基百科关于“猫”的文章中计算字母:这对“狗”也做同样的事情:结果相似，但不相同(“o”毫无疑问在“狗”的文章中更常见，因为它毕竟出现在“狗”这个词本身)。 So, for example, this counts letters in the Wikipedia article on “cats”:And this does the same thing for “dogs”:The results are similar, but not the same (“o” is no doubt more common in the “dogs” article because, after all, it occurs in the word “dog” itself). 不过，如果我们选取足够大的英语文本样本，我们可以期望最终得到至少相当一致的结果: Still, if we take a large enough sample of English text we can expect to eventually get at least fairly consistent results:

下面是一个例子，如果我们只是以这些概率生成一个字母序列:我们可以通过添加空格将其分解为“单词”，就好像它们是具有一定概率的字母一样:我们可以通过强制“单词长度”的分布与英语中的“单词”一致来更好地制造“单词”:我们在这里没有得到任何“实际单词”，但结果看起来要好一些。 Here’s a sample of what we get if we just generate a sequence of letters with these probabilities: We can break this into “words” by adding in spaces as if they were letters with a certain probability: We can do a slightly better job of making “words” by forcing the distribution of “word lengths” to agree with what it is in English: We didn’t happen to get any “actual words” here, but the results are looking slightly better. 为了更进一步，我们需要做的不仅仅是随机选择每个字母。 To go further, though, we need to do more than just pick each letter separately at random. 例如，我们知道如果我们有一个“q”，那么下一个字母基本上就是“u”。 And, for example, we know that if we have a “q”, the next letter basically has to be “u”. 下面是字母单独出现的概率图:下面是典型英语文本中成对字母(“2g”)出现的概率图。 Here’s a plot of the probabilities for letters on their own: And here’s a plot that shows the probabilities of pairs of letters (“2-grams”) in typical English text. 可能的第一个字母显示在整个页面，第二个字母显示在页面下面:例如，我们在这里看到，“q”列是空白的(零概率)，除了“u”行。 The possible first letters are shown across the page, the second letters down the page: And we see here, for example, that the “q” column is blank (zero probability) except on the “u” row. 好了，现在不是一次生成一个字母的“单词”，让我们一次生成两个字母，使用这些“2克”概率。 OK, so now instead of generating our “words” a single letter at a time, let’s generate them looking at two letters at a time, using these “2-gram” probabilities. 下面是结果的一个样本，其中碰巧包含了一些“实际单词”:有足够多的英文文本，我们不仅可以很好地估计单个字母或字母对(2g)的概率，还可以很好地估计更长的字母组合。 Here’s a sample of the result—which happens to include a few “actual words”: With sufficiently much English text we can get pretty good estimates not just for probabilities of single letters or pairs of letters (2-grams), but also for longer runs of letters. 如果我们生成的“随机单词”具有越来越长的n -gram概率，我们会看到它们变得越来越“真实”:但是让我们现在假设——或多或少像ChatGPT那样——我们处理的是整个单词，而不是字母。 And if we generate “random words” with progressively longer n -gram probabilities, we see that they get progressively “more realistic”: But let’s now assume—more or less as ChatGPT does—that we’re dealing with whole words, not letters. 英语中大约有40000个相当常用的单词。 There are about 40,000 reasonably commonly used words in English . 通过查看一个庞大的英语文本语料库(比如几百万本书，总共有几千亿个单词)，我们可以估计出每个单词的常见程度。 And by looking at a large corpus of English text (say a few million books, with altogether a few hundred billion words), we can get an estimate of how common each word is . 使用这个，我们可以开始生成“句子”，其中每个单词都是随机选择的，与它出现在语料库中的概率相同。 And using this we can start generating “sentences”, in which each word is independently picked at random, with the same probability that it appears in the corpus. 以下是我们得到的一个例子:毫不奇怪，这是无稽之谈。 Here’s a sample of what we get: Not surprisingly, this is nonsense. 那么我们怎样才能做得更好呢? So how can we do better? 就像字母一样，我们不仅可以考虑单个单词的概率，还可以考虑成对或更长的n -g单词的概率。 Just like with letters, we can start taking into account not just probabilities for single words but probabilities for pairs or longer n -grams of words. 两人一组，下面是我们得到的5个例子，所有例子都是从“猫”这个词开始的:看起来更“理智”了。 Doing this for pairs, here are 5 examples of what we get, in all cases starting from the word “cat”: It’s getting slightly more “sensible looking”. 我们可以想象，如果我们能够使用足够长的n -grams，我们基本上就会“得到一个ChatGPT”——从某种意义上说，我们会得到一些东西，可以生成文章长度的单词序列，并具有“正确的文章总体概率”。 And we might imagine that if we were able to use sufficiently long n -grams we’d basically “get a ChatGPT”—in the sense that we’d get something that would generate essay-length sequences of words with the “correct overall essay probabilities”. 但问题是:甚至没有足够多的英语文本能够推断出这些概率。 But here’s the problem: there just isn’t even close to enough English text that’s ever been written to be able to deduce those probabilities. 在网络的爬行中，可能有几千亿个单词; In a crawl of the web there might be a few hundred billion words; 在已经数字化的书籍中，可能还有1000亿字。 in books that have been digitized there might be another hundred billion words. 但在4万个常用词中，即使是可能的2克也已经有16亿个，而可能的3克则有60万亿个。 But with 40,000 common words, even the number of possible 2-grams is already 1.6 billion—and the number of possible 3-grams is 60 trillion. 所以我们没有办法从现有的文本中估计所有这些的概率。 So there’s no way we can estimate the probabilities even for all of these from text that’s out there. 当我们达到20个单词的“文章片段”时，可能性的数量比宇宙中的粒子数量还要多，所以在某种意义上，它们不可能全部写下来。 And by the time we get to “essay fragments” of 20 words, the number of possibilities is larger than the number of particles in the universe, so in a sense they could never all be written down. 那么我们能做什么呢? So what can we do? 最重要的想法是建立一个模型，让我们估计序列应该出现的概率——即使我们从来没有在我们看过的文本语料库中明确地看到过这些序列。 The big idea is to make a model that lets us estimate the probabilities with which sequences should occur—even though we’ve never explicitly seen those sequences in the corpus of text we’ve looked at. ChatGPT的核心正是所谓的“大型语言模型”(LLM)，它被构建来很好地估计这些概率。 And at the core of ChatGPT is precisely a so-called “large language model” (LLM) that’s been built to do a good job of estimating those probabilities.

以上内容来自专辑

主播信息

港中大硕士吴鹏

香港中文大学硕士，中国政法大学学士；曾师从罗翔。10公里47分49秒，半马1小时49分钟，均达29岁以下组国家二级；【逐章】解读书籍，全网少有。有时解读英文新书、期刊等。声音温和，是枚暖男。

4665

加关注

还没有评论，快来发表第一个评论！

ChatGPT在做什么第2章概率从何而来?

ChatGPT在做什么…以及它为什么好使逐章解读

港中大硕士吴鹏

第1484章从何而来

焦虑从何而来

3013 从何而来

焦虑从何而来

艾灸从何而来？

ChatGPT在做什么 第2章概率从何而来?

ChatGPT在做什么…以及它为什么好使 逐章解读

港中大硕士吴鹏

第1484章 从何而来

焦虑从何而来

3013 从何而来

焦虑从何而来

艾灸从何而来？

ChatGPT在做什么第2章概率从何而来?

ChatGPT在做什么…以及它为什么好使逐章解读

第1484章从何而来