DeepSeek新模型大揭秘_为何它能震动全球AI圈
文章目录
*KKV9KKQ ݰࠆֹלܹऱͧНѽ؈ৗԇҵࣁ ‘/ ֤
Ծٶ ߃ͫ*KKV9KKQ ݐӟङ *KKV9KKQ< ֨Ҷࣱ ‘/ ֿ㟬д ٚםङࡦ䊦ͫ؉љߢѺङઐীۨߎؘͫ࣫дЊ -6:U չ )RG[JK 9UTTKZ ঈطֺࠥबど২ङۅਈͫۖдЏउ澞
偽媮䣐ㅿ㗽⫸㩣㤠⚊廚坋㳰ⶥ壢嫺漓䐧㗿䩿≔䗳䕼䖃嫬壿攏惉⁵⪝䊯ṍㄏ 㘫⎋槗㑇俼䖃ㅿ㘮俋㕮漏㾸⋮㛤䘊漐
չЇࠩЉգङީͫࠩݐӟङޏֺࠥ *KKV9KKQ8 ЉюۨߎѺͫީ ֨܉ߐЇ߄дםक़ݕԟ
৲Ќͫ؉ଐީ▲Зڐֺ࢛ࠥ
ޏֺࠥڍдҿۅџࡁङѩԎͫюऀԝӣФ▲ङۨߎؼଇӱд -6:U Ӱङ੮࣫
۱љͫڮךЏӄыבࣾਙ֓ӟдȔ*KKV9KKQ ݎࣰ 5VKT’/ȕङՍ՚ ࡁײͫӹ 3KZG ‘/ ٗҁыմ澝ऽդ ‘/ ખކݐࣔҁৱ +R\OY ؼڠલ *KKV9KKQ8 ࣩݧׅवࣇ֛ؖͧН؈ܮॺгܶԞל੶ࠆֹܱࣂৗ ԂࣩיमݱࡄͧٶՆࢻгҾЗޟރगࣩࡦࢻࢤ㓬
TheDeepSeek-R1paperisagem! DeepSeek-R1论文堪称瑰宝!
Highly encourage everyone to read it. 强烈建议大家阅读
It’s clearthat LLMreasoningcapabilities can be learned in different ways.
显然, LLM的推理能力可以通过不同的方式学习
RL,if applied correctlyand atscale,canlead tosomereallypowerful and interestingscalingandemergentproperties.
如果正确且大规模地应用强化学习(RL),可以带来一些非常强大且有趣的扩展和涌现特性。
Ր▲ѹ ‘/ ֥ם < ?[INKT 0OT өઍОͫ*KKV9KKQ8 ખކИݕӟङͫࠥ ֺӯऀষ 82 ސࡣږحҿਘП؆ЮչՆۃݐࣲ▲Շ࣫ͫ۞Уளٯ୍ם
This"Aha moment"in the DeepSeek-R1paperis huge:
Purereinforcementlearning(RL)enables anLLM to automaticallylearn to think and reflect.
This challenges the priorbelief that replicating OpenAl’s o1 reasoning modelsrequiresextensive CoTdata.It turnsoutyoujustneed togive it the right incentives.
We are so back in the AlphaGo excitement era:by playing countless Go games and maximizing the reward function (winning the game) using pure RL,AlphaGobeat the best human players.
Now we areentering theLLM RLera.
2025couldbetheyearof RL.
DeepSeek-R1论文中的这一“顿悟时刻”意义重大:
纯强化学习 (RL) 使LLM能够自动学会思考和反思这挑战了先前认为复制OpenAI的o1推理模型需要大量CoT数据的观点。事实证明,只需给予正确的激励即可,
ਸ਼ѭଇ -+‘8 2GH ாऩિુы 0OS ,GT ֨ݐࣔИЭݕӱдͫ*KKV9KKQ8 ऀପଋॆुઁөઋӟङऱؘԋͫ৲ହҲ҅ऀѠѾ 82 ؠޣॄઆङ ؆Юԋֺࠥ澞҅ڱֺࠥфࣿдਘ۩ՆۃЊݍফ੧Оङࢅ࣫澞0OS ,GT ࣾਙઍОͫ؉ћҟд 5VKT’/ ߎߛځથҟङзͫڐ࢛
JimFan
@DrJimFan
We are living in a timeline where a non-US company is keeping the original missionofOpenAlalive-trulyopen,frontierresearchthat empowers all.Itmakes nosense.Themostentertainingoutcomeis the most likely.
我们正处在一个时间线上,一家非美国公司正在延续OpenAl的原始使命真正开放、前沿的研究,赋能所有人。这毫无道理。最有趣的结果也是最有可能的。
ଊСୋஙгͧєњےܶӰࣩআ 82 ݱࡄএࠆֹފܗьСͺࠆֹӞࢻࣩ Ȑ’NG 3USKTZȑͧՀӚьСৗ੩ރ ‘/ ҿޥгࡦࢻৗԂͺۊњޟ㖸ऍଂ ࣩފͧ*KKV9KKQ8 ࣩૠ▁ଜੌӪݰتи ‘/ ஔ־ޭࣩՆىͧाौڿ յअьСͺ
⭘ᴰㆰঅⲴ䝽ᯩ എᖂᴰ㓟㋩ⲴᕪॆᆖҐ
֨ U ݐӟФեͫݐࣲڠԗۨдЏउ߂Ҽࡨङސࡣ▁ৱͧ▁Жࠆֹ֧এसЗՐѪعੰ▁म֠ؓএݱࡄܶԞܱࣂৗԂ
৲ *KKV9KKQ ֝֨ 8 ङઐীଋ३Иͫफݎ▲ࠩۅؘдІय़۬ࣀЉգङ܉ߐ૨ڬफݎڠԗ؆Юઐীͧ8@KXUͨ澝ךஉ࠼ઐীͧ8չֺࠥ嘜楫ͫଐୃۨԅд澞ךஉ࠼ઐীސࡣչֺࠥ嘜楫ୃԕխवڮךӫޏ۞Уҫপͫث੧Џ߄व୍ڧր
ҾЗޣਫ਼ъࢍԇࣩͧފࣻܯڟԖЭૠЖષګ澞֛Н *KKV9KKQ8 ފநЖ੩ރૠ▁ݱࡄޥݝࣩࠆֹ
۩ћүߛдઆ▲Јͫઐী ‘/ ङݐࣲਈԃѮङސࡣପٯީэТ▲ਢ ީପଋ֨ 9,:ͧडषڳલͨԆҵםङۃͧ)5:ͨਸ҆ͫऀ҆ચչ זߒङײଋ३ԋֺࠥͧ683ͨФঝङזߒ॔ৠৌԋֺࠥͫߛએࠥ ֺ؆ѫऀۃۃৰ
ࣾਙѫԆҵࣔԪ߹ݜফͧ3):9ͨͫએֺࠥ֨ךय़ՕਈИݜফ߂ױङ Օਈ澞
Ѯङֺࠥઐী૨ڬѸ *KKV9KKQ8@KXU ଣܫд▲ߚӹ۱ߌ߄ङ૨ڬȔষ
ڠԗ؆Ю૨ڬͫ؉؏Ҷܒڐдગङۃࠥߡͧ)NGOT UL :NU[MNZ չडषڔڳલͧ9,:ͨͫю҉ழএԥङۚҒ՚ߛѩԗֺࠥ੧О ؼҦએ▲Зמ۵ҩॿ࡚֨߄ѠѾਸ҆չܶحङەӑЈͫষণପଋЉލ غડչੂڱՆߛ؆Юઆொ
*KKV9KKQ8@KXU ߄ङՑީ▲߂এԥङԋͫߛࢬՇ ‘/ ङݐ ࣲਈԃ
ૠЖӨػГ
ӕेۅԋӕेۅԋֺࠥછѳրځީի࠳े澞ثдؼԆӣͫ୪ дۻӣ澞છџސࡣЭڮএԥ҆ײͫ֨Ӏ߄ेؔۅৈߧङރ؆ொИ ֺࠥљܶؔࠀڔͧײ"GTY]KX$չ"GTY]KX$ͨݕ҈߂ৄঋࠄ ثй३ொͫՕљ҅ऀઠ֘߿݇ؔУङࡹડऀ҆ࣿۨՆ
ࠀڔԋࠀڔԋֺࠥڠӲࡌֺࠥرҿۃৰଋ३й"ZNOTQ$ չ"ZNOTQ$߶Ф澞࡚ТҟؼۻӣͫҟдؼԆӣ
Одӕेبֺࠥ֨ڠԗ؆Юͧ82ͨଋ३Иङਘࣀيͫ*KKV9KKQ ࣾ ਙ߄۞رݕॐટюߘஒӲ֨य़ৈߣࠀڔЇͫߛହҲѠѾӄؠࣔ ؔङҞȍȍ҆ײڠӲએֺࠥ੧Նۃۅݐࣲ۪ݐٺࣔؔङொઆӐ ঌऋ
ழवТ▲Зএԥङઁөͫએ ‘/ ֨ -865ͧ-XU[V 8KRGZO\K 6UROI_ 5VZOSO`GZOUTͨङઁөЈਘ۩߽ $^+$ ࡁૻͫਘ۩ݕԟ
-865 ङࠥڔҿؘࡁૻএԥͫପଋুӄ߽ߎङबثࡁૻߛઋঌऋࠐچ ߄ݼѺдઐীङЉ६ؔۅͫգޞݕд؆Юݼࣤ
এԥߛપͫ҂Օљ܋؉ۨ৯٤ӟொͫଳொએֺࠥգޞ֛ঋךࠩ ࣀեऀЇவङۚઁөোЗঋࠄ۸ӣͫ߿݇ଝࡌӣ澝ହҲѺӣङମ ૾ޏֺࠥ澞ૠЖࡗसלࠃފૠߞࣩ͵Ҵୋங $\rightarrow$ ࠆֹ࣏ۉיЖग़ߥ $\rightarrow$ Өॸঞ੪Ӣ $\rightarrow$ -865 ਗ਼ॠࣼتѨԍ $\rightarrow$ ޟݰࠆֹय़फݎઐী ސࡣ٫ߛдәЗުੋङѩԎ澞үީઐীݼࣤङݕԟͫޅЗଋ३Օљ֨ िङޞӄ؏ۨ澞ҿࠩީ࢛ࢃ৴ङѺͫंйमԾд 9,: չזߒङ ֺۚࠥͫઋ࢛ङࡌםٱӗص
୍ङީͫૠमݱࡄँࣩਫ਼ࠆֹѪг㓢িͧুЋފјȐ㕤ȑࣩݱ ړѪࣩ澞
2
⭘㠚ᐡⲴ䈝䀰ˈ൘Ā亯ᛏāѝᆖҐ
۩ћީۂТरӟֺࠥ֨य़ளٯȔԽȕङސࡣЈͫީऱङ؆ѫдȔۃ ৰȕङ⪢;
ખކઓڣд▲Зږыࡨऩङࠄ҆֨הࣲ▲ЗࢄՃזߒރ؆੮ଇڔ ɰG ${\surd}({\tt a}+{\tt x})={\tt x}$ ङொޞֺͫࠥॳࣀҠЈߛપ=GOZ ]GOZ =GOZ :NGZ Y GT GNG SUSKTZ / IGT LRGM NKXKͧঈঈ澝ঈঈ澝ީЗқڱ߶ઓङ֍ տޞӶͨͫசե୍ޏؙંдޅЗઆொଋ३澞य़ঝѷыঝூ㖂ङ੧О؏ ҶީਘՇфࣿङͫ৲Љީүગؔङ
Question:If a>1,then the sum of therealsolutions of√a-√a+x=x is equal to |
Response: |
Tosolvetheequation√a-√a+x=x,let'sstartbysquaringboth. |
a-√a+x =x²=a-√a+x=x². |
Rearrangetoisolatetheinnersquarerootterm: (a-x²)²=a+x=→a²-2ax²+(x²)²=a+x=→x4-2ax²-x+(a²-a)=0 |
Wait,wait.Wait.That'sanahamomentIcanflaghere. Let'sreevaluate thisstep-by-step toidentify if thecorrect sum canbe. |
Westartedwiththeequation: |
√a-√a+x=x |
First,let'ssquarebothsides: |
a-√a+x=x²=√a+x=a-x² |
Next,Icouldsquarebothsidesagain,treatingtheequation: |
֜О߿݇ *KKV9KKQ ङू९ֺͫࠥङ࠵ٷள֮Ԕङ澞֨ڠԗ؆Ю ଋ३Иͫրځچѫӟ࣫ॳࣀङުੋͫп૩ૣࢵڪڪѴசवઆ ொঌऋङૅՊ澞य़ࠥڔ懣ѷыঝ֨ߊۃৰեङॳࣀூ㖂ͫॐव߮ य़ࢋكङઍऽॳॄ
Figure 3|The averageresponse length ofDeepSeek-R1-Zeroon thetraining set during theRL process.DeepSeek-Ri-Zero naturally learns to solve reasoning tasks withmore thinking time.
֨य़Ѵசवூ㖂ङਈԃݕԟЈͫ8@KXU ֨ރ؆उц߄धઉङ ‘/3+ ॼИђ߂Ӭङ $15.6%$ ࠳ेࣤ▲૨㥁ԟਙ $71.0%$ ङӕेࣤ澞৲એֺࠥث գ▲ொ੧ךࠩغડޞͫӕेࣤࣾਙଇӱд $86.7%$ 澞Љީএԥङर ଋдؼѫҟдȍȍ֜О ‘/3+ ङொऩࢋچङރ؆फչӫଭۅۃ ͫ৲ЉީߑࠑۅङҸڔځऀ澞ֺׂࠥߎڷீਈݐࣲͫ۵Օਈ߄߽ङ ݕԟ澞
Figure 2|AIME accuracy of DeepSeek-R1-Zero during training.For each question,we sample 16 responses and calculate the overall average accuracy to ensure a stable evaluation.
Ր▲Зֺࠥेؘପଋय़ސࡣ؆ѫдݐࣲङՐ▲З߾ڶચ݇ͫީֺࠥր ځچѫ߿݇ொङזߒچਘࣀલਭ澞य़ਘଠځ੧О੮ޢͫ؉Љީ֨ এԥ֪ऀࠥߡͫ৲ީऱ࠳ࣲઆдொङچͫٷबځ֪ҵךङ ۃৰޞ澞ؼҦыঝவثএԥङԆࡣչזߒङ॥ӣѫਘࣀલޅۃৰޞ ▲߽ͫ8@KXU ي࣫ӟдঝѷङ㘸
߂߄પ߆ԃङ۪કީֺࠥي࣫ӟङଈ१؆Юਈԃ澞֨؏ҶЉգङ३ॼٵՖ )UJKLUXIKY Їͫ8@KXU ଇӱдଋ $96.3%$ ыঝଣ۴ङࡊٵय़૧ֿ੮࣫੮ޢֺͫࠥЉީ֨࠸ઓॆਅֿࣔؔङઆொ܉ٙͫ৲ީݗд߮य़ޯଠङݐࣲਈԃ
ањ㚚᰾ˈնਓ喯нⲴཙ
ـ 8@KXU ي࣫ӟдۖыङݐࣲਈԃͫѸू९ৱћڮڽՇ࣫д▲З Е୍ङொ؉ङۃଋ३ڪڪљੴыঝࣲઆ
ખކַ峭֪ܶӟͫЗষڠԗ؆Юઐীӟߛङֺࠥ֨VUUX XKGJGHOROZ_ͧՕયۅٛͨչRGTM[GMK SO^OTMͧધઈࢌߒͨङொ З࣫ҿؘڮױࣲઆ8@KXU ؏ҶପଋۚҒ՚ߛѩԗҿ੧О࡚ͫ ߄ѠѾыঝॐਸङ߶ӕঋࠄҁОՀৰ澞ؼҦ▲Зמ۵ҩॿਘӫд▲ આொސࡣͫࣀ㇊ડЉ䔦ͫѸէӰыઆୋޞધޗѰࠩ澞؉֨આொଋ३ ИՕਈգޞ҅ऀךय़ધઈ۪ͫৱՇيӟд߮य़ࣔ࠺ङ੮ଇސڔͫпୃ એҿݐࣲଋ३љੴଝ૭չࣲઆ
࠳ީОдઆӐЗொͫू९֝ڐՇдݷ࣍ߎ *KKV9KKQ8澞ପ ଋږҵѮङIURJYZGXZ JGZGͧӒկԈރ݇ͨչךஉ࠼ઐীࡶ३ 8 Љюґܴдڠםङݐࣲਈԃͫଐ؆ѫдऀыঝޣۤङސڔ੮ଇۃ ଋ३澞ؼҦোЗמ۵ҩॿд▲З㿺ପݾীͫݾѫѕײѾࢎޱ֪੮ ଇਘٜङࡣ
֨▲લݾЈФեͫ*KKV9KKQ8 ي࣫ӟдЊ 5VKT’/ U बڢࣾਙ֨ ߮пސவѩङۅਈ澞֨ 3’:. ׂӕࡹડЇͫ8 ଇӱд $77.5%$ ङӕे ࣤͫЊ U ङ $77.3%$ बͺ֨Ӏܸ۫ۅङ ‘/3+ Їͫ8 ङӕे ࣤଇӱ $71.3%$ ͫଋд U ङ $71.0%$ 澞֨їुֿͫ8 ֨ )UJKLUXIKY છࡹИଇӱд ӣङࡊٵͫй $96.3%$ ङыঝՀЊৱ
Table 4 | Comparison between DeepSeek-R1 and other representative models
Benchmark (Metrio) | Claude-3.5- Sonnet-1022 0513 | GPT-4o DeepSeek|OpenAI OpenAI|DeepSeek V3 | ol-mini 01-1217 | R1 | |||
Architecture | MoE | MoE | |||||
Activated Params | 37B | 37B | |||||
#Total Params | 671B | 671B | |||||
MMLU (Pae1) | 88.3 | 87.2 | 88.5 | 85.2 | 91.8 | 90.8 | |
English | MMLU-Redux (EM) | 688 | 088 | 89.1 | 86.7 | 92.9 | |
MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | 84.0 | ||
DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | 92.2 | |
IF-Eval (Prompt Strict) | S'98 | 84.3 | 86.1 | 84.8 | 83.3 | ||
GPQA Diamond (Pasa) | 65.0 | 49.9 | 59.1 | 60.0 | 75.7 | 71.5 | |
SimpleQA (Corect) | 28.4 | 7.0 | 47.0 | 30.1 | |||
FRAMES (A) | 72.5 | 80.5 | 73.3 | 76.9 | 82.5 | ||
AlpacaEval2.0 (LC-wirak) | 52.0 | 51.1 | 70.0 | 57.8 | |||
ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | 92.3 | ||
Code | LiveCodeBench (Pasl-con) | 689 | 32.9 | 36.2 | 53.8 | 63.4 | 65.9 |
Codeforces (Pereentsle) | 20.3 | 23.6 | 58.7 | 93.4 | 96.6 | 96.3 | |
Codeforces (Rating) | 717 | 759 | 1134 | 1820 | 2061 | 2029 | |
SWE Verified (Resalved) | 50.8 | 42.0 | 41.6 | 48.9 | 49.2 | ||
Aider-Polyglot (Ac.) | 45.3 | 16.0 | 49.6 | 32.9 | 61.7 | ||
Math | AIME 2024 (ras) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | 79.8 |
MATH-500 (Pas@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | ||
CNMO 2024 (Pa1) | 13.1 | 10.8 | 43.2 | 67.6 | |||
Chinese C-Eval (EM) | CLUEWSC (IM) | 85.4 | 87.9 | 6'06 | 89.9 | 92.8 | |
76.7 | 76.0 | 86.5 | 68.9 | 91.8 | |||
C-SimpleQA (Coect) | 55.4 | 58.7 | 68.0 | 40.3 | 63.7 |
ࣀ৲ͫ*KKV9KKQ8 @KXU ङࢪԃѷЧם澞؉֨ ‘/3+ ࡹડИ ҅ऀךރॕߑӲޞଇӱङ ӕेࣤȍȍЗۨ働ࣾਙଋд 5VKT’/ ङ U澞य़ךࠩغડѫՊڱӕेङࣔګͫॐ 8@KXU Օਈݗд߮य़ׂॅङݐࣲࠃ߫ͫ৲Љީএԥ֪ઓڸઆொࠥڔ ખކރ݇ުॐͫђ 3’:. ӱ ‘/3+ͫӇӱ -931ֺͫࠥ੮࣫ӟ६ ؔङ૧ֿۅਈͫࣔӰީ֨ӫଭۅۃङזߒொЇ澞य़ٺ崭ۅਈ ݕॐ 8@KXU Օਈेׁؘ⡭ӟд߮य़ׂॅङݐࣲਈԃͫЊѮङࣔ ؔѠԇѩԗֺࠥڥۨ௪ޢثࡁ
۱љͫࣀՍ௸ЉࢎͫѸЭક *KKV9KKQ8@KXU ۵ީऱ࠳ࣲઆдݐࣲङȔמ۵ȕ
㓟㋩ᕪॆᆖҐ
ҏ䇨ᱟ䙊ੁ $*, Ⲵཆᦧᖴ
Ф۱љ *KKV9KKQ8 ङՇ٢એ֥ӄыङࢾࢵୃէдষڠԗ؆Юސࡣ ֜О؉؏ҶՕљપڱЇީ۸ڐд ‘/ ԗङ▲ߚޏ૨ڬ
8@KXUȍȍЗ؏Ҷପଋڠԗ؆Юઐীӟߛङ ‘/ ֺࠥͫي࣫ӟдј ыۖ峖ङପऀݐࣲਈԃ澞؉Љю֨ރ؆ॼИՈڱдۖыۨ働
୍ङީͫ8@KXU Љюީ֨ࠥѢۃৰͫ৲ީऱ࠳Շيӟд߮य़ڥڔ ङݐࣲਈԃ
֜О֨ଋڪङઐীސࡣИͫػҿ֨डषڳલИ҅ऀઐীױङ॔ৠৌߛ છѳૅङઢֺͫࠥՕਈ؆ѫઇՇԋֺࠥङࣔؔࠥڔͫࣿۨثԋࠥ ֺՍնङӄؠͫ৲Љީऱ࠳ݕԟݐࣲਈԃ澞݅Տઢપͫ’/ ܇ӱд ੂڱԋѸؘஎЇଔਅઐীऩ߶ङߑՈٙސڔ澞ؼީ۩ћٯપङ ԋ࠭ͧXK]GXJ NGIQOTMͨ澞ѷ 8@KXU ࣐߃फ़ࣩתԊӨׁޯଈ ұгתԊࠎரࣩՔৗ㓬ȉȉӨמफ़Ԥгͧ࠻ޥьСȐՌյȑՔјԽࠆ ѡ澞ࠆֹ֧ૠЖ㖊ӐЇՆىӞࣩܱࣂৗԂޟՔґͧЬޟ১
ЗՇ࣫ՕਈѫݷՊ۩ћثߑ֘؆ЮङઍજѮङ ‘/ ઐীސࡣՕਈ ▲फ୍֨ז▲З߿ߎۅङ୪નͫ۩ћןЋࡨйએ ‘/ ࠥѢыঝङۃސ ڔдͫЏउ୍ޏۃৰडष؆Ю֨ ‘/ ՇيИङઅ澞ପଋষণङڠ ԗ؆Юͫ’/ ѷЧਈלՇيӟԽࣿङொઆӐਈԃͫ৲Љީੴஒ Ӳ֨ગङઆӐސࠄࠃ߫ӄ
ࣀ 8@KXU ֨૿ӟՕયۅЇ֨ޢުͫѸЗߎ૯Օਈۏ ۏԭચдҿۃސڔङࣞࣔۅ澞ؼҦ▲Зמ۵ҩॿՇޢдਘٜङઆொސ ࡣͫљऀٯઁધઈઆୋ▲߽澞ݕॐ۩ћऱ࠳ङପऀыٗਈՕ ਈ؏ҶЉգйыঝङઍऽސڔ
ૠۖފँࠔࣩڟԖЭ澞ػҥਚգݟ؞࣬н߀ࣩࣂ͵ँࠔࣩࣂ ১иОԇڎ߄ͧুЈފԇܯՈ
文章作者 大模型
上次更新 2025-03-09