计算机工程

我是一位长期从事计算机工作的工程师, 我主要是进行多种金融软件的开发, 也开发了一些其他领域的软件.
正文

DeepSeek疑似抄袭ChatGPT: 技术及数据源对比分析

(2025-02-04 20:27:56) 下一个

随着DeepSeek的出现,有关其是否在开发过程中抄袭了ChatGPT的技术的讨论逐渐增多。本文将基于对比实验,探讨DeepSeek是否借用ChatGPT的技术,并揭示其在技术实现方面可能存在的相似性和差异性。

一、验证DeepSeek的思路

一般来说,验证两个系统是否相同,最直接的方法是比较它们在相同输入条件下的输出结果。如果两个系统在处理同一问题时给出的答案完全一致,则可以推测这两个系统在算法或架构上存在高度的相似性,甚至可能是相同的。在本研究中,主要采用以下两种验证方法:

  1. 信息一致性检验
    首先,通过从两个不同的数据库中调取相同的信息,观察其输出结果。如果两个数据库返回的结果完全一致,那么这两个数据库的底层结构很可能是相同的。
  2. 特殊变量【MASK】的使用
    利用特殊的变量【MASK】获得可能性词汇,检验两种算法是否等同.  具体来说,[MASK] 是一个占位符,表示在这个位置需要填充一个词语。模型会根据句子中的其他词语(即上下文)推理算法,预测最合适的词语,并将其替换到 [MASK] 的位置。通过比较DeepSeek与ChatGPT在相同输入下对【MASK】位置的填充结果,检验两者的推理机制是否一致。

随机抽取了67个检测样本进行对照检验, 发现DeepSeek与ChatGPT具有高度的相似性。下面举具体验证例子实例和结果.

实例1

输入句子:
Up to 30 [MASK] and babies died at Furness General Hospital because of failings by staff and management, a damning report is [MASK] to reveal.

ChatGPT输出:
Up to 30 mothers and babies died at Furness General Hospital because of failings by staff and management, a damning report is expected to reveal.

DeepSeek输出:
Up to 30 mothers and babies died at Furness General Hospital because of failings by staff and management, a damning report is expected to reveal.

在这个示例中,DeepSeek和ChatGPT在预测【MASK】位置时,给出填充“mothers”和“expected”结果完全一致。

实例2

输入句子:
Taking in the sights of [MASK] is nothing short of [MASK], with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

ChatGPT输出:
Taking in the sights of Marrakech is nothing short of breathtaking, with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

DeepSeek输出:
Taking in the sights of Marrakech is nothing short of breathtaking, with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

在这个例子中,DeepSeek和ChatGPT也给出了完全相同的预测词汇“breathtaking”和“Marrakech”,验证了两者在推理和预测时的一致性。

实例3

输入句子:
Ultimately, the goal of life is a [MASK] that each individual must define for themselves based on their own beliefs, values, and experiences.

ChatGPT输出:
Ultimately, the goal of life is a journey that each individual must define for themselves based on their own beliefs, values, and experiences.

DeepSeek输出:
Ultimately, the goal of life is a journey that each individual must define for themselves based on their own beliefs, values, and experiences.

在这个实例中,“journey”这一填充词在DeepSeek和ChatGPT的输出中完全一致. 

实例4和实例5是把由不同部分组成的大段讯息进行比较, 任何微小部分的不同, 都可以导致信息的差异性产生. 同时如果算法不同, 也可以导致结果不同. 但是如果信息完全一致且算法也一致的情况下, 结果应是等同的.

实例4

输入句子:
I purchased the variety pack of Martinsons and the Brown Gold they are both from the same vendor. I did not like the taste of the Martinson product at all. I threw it away, not the product I expected at all. The Brown Gold was more palatable, however certain flavors with this brand also feel short in particular the Costa Rican named product was extremely too bitter. I would agree with most that this does give you a lower cost per serving, but when the coffee is inferior and the taste match is also, I think the best descriptor is [MASK].

ChatGPT输出:
cheap

DeepSeek输出:
cheap

这个例子展示了两者都将【MASK】位置填充为“cheap”,进一步证明它们在算法和推理机制上是相同的。

 

实例5

输入句子:
Not bad. "These are small and very salty. The taste is good, but very strong, so it's a good thing the package contains a small amount. It only takes a few little crisps to cure my salty/crunchy craving. I can snack on one package for an entire day. Of course, these would not be a good snack if you're very hungry, because there isn't enough there to fill you up. For less than $1 per pack, it's an [MASK].

ChatGPT输出:
"For less than $1 per pack, it's an okay deal."

DeepSeek输出:
"For less than $1 per pack, it's an okay deal."

在这个示例中,DeepSeek和ChatGPT在预测【MASK】位置时,给出填充“For less than $1 per pack, it's an okay deal.”结果完全一致。

从上述的对比实验和技术分析可以得出结论,在使用【MASK】变量的测试中,DeepSeek和ChatGPT在所有样本中的输出结果完全一致,表明它们采用了相同的推理算法, 技术框架和数据源。由于DeepSeek与ChatGPT之间高度的相似性,DeepSeek的技术可能涉嫌抄袭。

 

[ 打印 ]
阅读 ()评论 (1)
评论
ahhhh 回复 悄悄话 采用了相同的推理算法, 技术框架和数据源?你对AI的理解是错的。
登录后才可评论.