提高认知才能开阔眼界

认人待物，不断学习，提高商业和经济方敏锐的洞察力

首页文章列表博文目录

个人资料

胡雪盐8

给我悄悄话

博客访问：

开爱和变形，回顾

(2025-01-31 12:54:41) 下一个

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2,?it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention".

GPT-4 is an LVM that processes images and text as input and generates text as output.?It uses an architecture based on Transformer, a model consisting of blocks of stacked decoders that use different neural networks and incorporate the attention mechanism.

A?transformer?is a?deep learning?architecture that was developed by researchers at?Google?and is based on the multi-head?attention?mechanism, which was proposed in the 2017 paper "Attention Is All You Need".^[1]?Text is converted to numerical representations called?tokens, and each token is converted into a vector via lookup from a?word embedding?table.^[1]?At each layer, each?token?is then?contextualized?within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key?tokens?to be amplified and less important tokens to be diminished.

Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier?recurrent neural architectures?(RNNs) such as?long short-term memory?(LSTM).^[2]?Later variations have been widely adopted for training?large language models?(LLM) on large (language)?datasets, such as the?Wikipedia?corpus?and?Common Crawl.^[3]

Transformers were first developed as an improvement over previous architectures for?machine translation,^[4]^[5]?but have found many applications since. They are used in large-scale?natural language processing,?computer vision?(vision transformers),?reinforcement learning,^[6]^[7]?audio,^[8]?multimodal learning,?robotics,^[9]?and even playing?chess.^[10]?It has also led to the development of?pre-trained systems, such as?generative pre-trained transformers?(GPTs)^[11]?and?BERT^[12]?(bidirectional encoder representations from transformers).

[ 打印 ]

[ 加入书签 ]

阅读 () ┆ 评论

目前还没有任何评论

登录后才可评论.