{"id":773,"date":"2024-12-26T21:46:20","date_gmt":"2024-12-26T21:46:20","guid":{"rendered":"https:\/\/naujienaplius.lt\/index.php\/2024\/12\/26\/deepseek-v3-itin-didelis-atvirojo-kodo-ai-pralenkia-llama-ir-qwen-paleidima\/"},"modified":"2024-12-26T21:46:20","modified_gmt":"2024-12-26T21:46:20","slug":"deepseek-v3-itin-didelis-atvirojo-kodo-ai-pralenkia-llama-ir-qwen-paleidima","status":"publish","type":"post","link":"https:\/\/naujienaplius.lt\/index.php\/2024\/12\/26\/deepseek-v3-itin-didelis-atvirojo-kodo-ai-pralenkia-llama-ir-qwen-paleidima\/","title":{"rendered":"\u201eDeepSeek-V3\u201c, itin didelis atvirojo kodo AI, pralenkia \u201eLlama\u201c ir \u201eQwen\u201c paleidim\u0105"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Prisijunkite prie m\u016bs\u0173 kasdieni\u0173 ir savaitini\u0173 naujienlai\u0161ki\u0173, kad gautum\u0117te naujausi\u0173 naujien\u0173 ir i\u0161skirtinio turinio apie pramon\u0117je pirmaujan\u010di\u0105 AI apr\u0117pt\u012f. Su\u017einokite daugiau<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>Kinijos dirbtinio intelekto startuolis DeepSeek, \u017einomas d\u0117l i\u0161\u0161\u016bki\u0173 pirmaujantiems dirbtinio intelekto pardav\u0117jams savo naujovi\u0161komis atvirojo kodo technologijomis, \u0161iandien i\u0161leido nauj\u0105 itin didel\u012f model\u012f: DeepSeek-V3.<\/p>\n\n\n\n<p>Naujasis modelis, kur\u012f galima \u012fsigyti per Hugging Face pagal \u012fmon\u0117s licencijos sutart\u012f, turi 671B parametrus, ta\u010diau naudoja ekspert\u0173 mi\u0161inio architekt\u016br\u0105, kad suaktyvint\u0173 tik tam tikrus parametrus, kad tiksliai ir efektyviai atlikt\u0173 nurodytas u\u017eduotis. Remiantis \u201eDeepSeek\u201c bendrais etalonais, pasi\u016blymas jau vir\u0161ija top\u0173 vir\u0161\u016bnes, pralenkdamas pirmaujan\u010dius atvirojo kodo modelius, \u012fskaitant \u201eMeta&#8217;s Llama 3.1-405B\u201c, ir beveik prilygdamas u\u017edar\u0173 Anthropic ir OpenAI modeli\u0173 na\u0161umui.<\/p>\n\n\n\n<p>I\u0161leidimas \u017eymi dar vien\u0105 svarb\u0173 patobulinim\u0105, panaikinant\u012f atotr\u016bk\u012f tarp u\u017edarojo ir atvirojo kodo AI. Galiausiai \u201eDeepSeek\u201c, prad\u0117j\u0119s veikti kaip Kinijos kiekybinio rizikos draudimo fondo \u201eHigh-Flyer Capital Management\u201c at\u0161aka, tikisi, kad \u0161ie poky\u010diai atvers keli\u0105 dirbtiniam bendrajam intelektui (AGI), kur modeliai gal\u0117s suprasti ar i\u0161mokti bet koki\u0105 intelektin\u0119 u\u017eduot\u012f, kuri\u0105 \u017emogus gali.<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Turinys:<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/naujienaplius.lt\/index.php\/2024\/12\/26\/deepseek-v3-itin-didelis-atvirojo-kodo-ai-pralenkia-llama-ir-qwen-paleidima\/#Ka_%E2%80%9EDeepSeek-V3%E2%80%9C_pateikia_prie_stalo\" >K\u0105 \u201eDeepSeek-V3\u201c pateikia prie stalo?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/naujienaplius.lt\/index.php\/2024\/12\/26\/deepseek-v3-itin-didelis-atvirojo-kodo-ai-pralenkia-llama-ir-qwen-paleidima\/#Stipriausias_siuo_metu_prieinamas_atvirojo_kodo_modelis\" >Stipriausias \u0161iuo metu prieinamas atvirojo kodo modelis<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\" id=\"h-what-does-deepseek-v3-bring-to-the-table\"><span class=\"ez-toc-section\" id=\"Ka_%E2%80%9EDeepSeek-V3%E2%80%9C_pateikia_prie_stalo\"><\/span>K\u0105 \u201eDeepSeek-V3\u201c pateikia prie stalo?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Kaip ir jo pirmtakas \u201eDeepSeek-V2\u201c, naujasis itin didelis modelis naudoja t\u0105 pa\u010di\u0105 pagrindin\u0119 architekt\u016br\u0105, kuri sukasi aplink daugiagalv\u012f latentin\u012f d\u0117mes\u012f (MLA) ir DeepSeekMoE. \u0160is metodas u\u017etikrina veiksming\u0105 mokym\u0105 ir i\u0161vadas \u2013 specializuoti ir bendri \u201eekspertai\u201c (atskiri, ma\u017eesni neuroniniai tinklai didesniame modelyje) suaktyvina 37B parametrus i\u0161 671B kiekvienam prieigos raktui.<\/p>\n\n\n\n<p>Nors pagrindin\u0117 architekt\u016bra u\u017etikrina tvirt\u0105 DeepSeek-V3 na\u0161um\u0105, bendrov\u0117 taip pat pristat\u0117 dvi naujoves, kad dar labiau padidint\u0173 kartel\u0119. <\/p>\n\n\n\n<p>Pirmoji yra papildoma be nuostoli\u0173 apkrovos balansavimo strategija. Tai dinami\u0161kai stebi ir koreguoja ekspert\u0173 apkrov\u0105, kad juos panaudot\u0173 subalansuotai, nepakenkiant bendram modelio veikimui. Antrasis yra keli\u0173 \u017eeton\u0173 numatymas (MTP), kuris leid\u017eia modeliui vienu metu numatyti kelis ateities \u017eetonus. \u0160i naujov\u0117 ne tik padidina mokymo efektyvum\u0105, bet ir leid\u017eia modeliui veikti tris kartus grei\u010diau, generuojant 60 \u017eeton\u0173 per sekund\u0119.<\/p>\n\n\n\n<p>\u201ePer i\u0161ankstin\u012f mokym\u0105 apmok\u0117me DeepSeek-V3 naudodami 14.8T auk\u0161tos kokyb\u0117s ir \u012fvairi\u0173 \u017eeton\u0173&#8230; Tada atlikome dviej\u0173 etap\u0173 kontekstinio ilgio prat\u0119sim\u0105 DeepSeek-V3\u201c, \u2013 ra\u0161\u0117 bendrov\u0117 techniniame dokumente, kuriame i\u0161samiai apra\u0161omas naujasis modelis. \u201ePirmajame etape maksimalus konteksto ilgis pailginamas iki 32K, o antrajame \u2013 dar iki 128K. Po to sureng\u0117me mokymus, \u012fskaitant pri\u017ei\u016brim\u0105 tobul\u0105 derinim\u0105 (SFT) ir sustiprint\u0105 mokym\u0105si (RL) pagal pagrindin\u012f DeepSeek-V3 model\u012f, kad suderintume j\u012f su \u017emogaus pageidavimais ir dar labiau atskleistume jo potencial\u0105. Po treniruot\u0117s mes distiliuojame argumentavimo galimybes i\u0161 DeepSeekR1 modeli\u0173 serijos, o tuo tarpu atid\u017eiai palaikome modelio tikslumo ir kartos ilgio pusiausvyr\u0105.<\/p>\n\n\n\n<p>Pa\u017eym\u0117tina, kad mokymo etape \u201eDeepSeek\u201c naudojo daugyb\u0119 aparatin\u0117s \u012frangos ir algoritm\u0173 optimizavimo, \u012fskaitant FP8 mi\u0161ri\u0105 tikslumo mokymo sistem\u0105 ir dujotiekio lygiagretumo algoritm\u0105 \u201eDualPipe\u201c, kad suma\u017eint\u0173 proceso i\u0161laidas.<\/p>\n\n\n\n<p>Apskritai, jis teigia, kad visus \u201eDeepSeek-V3\u201c mokymus baig\u0117 per ma\u017edaug 2788\u00a0000 H800 GPU valand\u0173 arba apie 5,57 mln. USD, darant prielaid\u0105, kad nuomos kaina yra 2 USD u\u017e GPU valand\u0105. Tai yra daug ma\u017eesn\u0117 u\u017e \u0161imtus milijon\u0173 doleri\u0173, paprastai i\u0161leid\u017eiam\u0173 dideli\u0173 kalb\u0173 modeli\u0173 paruo\u0161imui.<\/p>\n\n\n\n<p>Pavyzd\u017eiui, manoma, kad Llama-3.1 buvo apmokytas investavus daugiau nei 500 mln. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-strongest-open-source-model-currently-available\"><span class=\"ez-toc-section\" id=\"Stipriausias_siuo_metu_prieinamas_atvirojo_kodo_modelis\"><\/span>Stipriausias \u0161iuo metu prieinamas atvirojo kodo modelis<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Nepaisant ekonomi\u0161ko mokymo, DeepSeek-V3 tapo stipriausiu atvirojo kodo modeliu rinkoje.<\/p>\n\n\n\n<p>Bendrov\u0117 atliko kelis etalonus, kad palygint\u0173 AI na\u0161um\u0105 ir pasteb\u0117jo, kad jis \u012ftikinamai lenkia pirmaujan\u010dius atvirus modelius, \u012fskaitant Llama-3.1-405B ir Qwen 2.5-72B. Jis netgi lenkia u\u017edarojo kodo GPT-4o pagal daugum\u0105 etalon\u0173, i\u0161skyrus angli\u0161kai orientuotus \u201eSimpleQA\u201c ir \u201eFRAMES\u201c, kur OpenAI modelis buvo \u012f priek\u012f su atitinkamais 38,2 ir 80,5 balais (palyginti su 24,9 ir 73,3).<\/p>\n\n\n\n<p>Pa\u017eym\u0117tina, kad \u201eDeepSeek-V3\u201c na\u0161umas ypa\u010d i\u0161siskyr\u0117 kin\u0173 ir matematikos etalonuose, gaudamas geresnius balus nei visi kolegos. Math-500 teste jis surinko 90,2 balo, o Qwen 80 bal\u0173 buvo kitas geriausias. <\/p>\n\n\n\n<p>Vienintelis modelis, kuriam pavyko mesti i\u0161\u0161\u016bk\u012f DeepSeek-V3, buvo Anthropic&#8217;s Claude 3.5 Sonnet, pranok\u0119s j\u012f auk\u0161tesniais balais MMLU-Pro, IF-Eval, GPQA-Diamond, SWE Verified ir Aider-Edit.<\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-rich is-provider-twitter wp-block-embed-twitter\"><p>\n\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">\ud83d\ude80 Introducing DeepSeek-V3!<br><br>Biggest leap forward yet:<br>\u26a1 60 tokens\/second (3x faster than V2!)<br>\ud83d\udcaa Enhanced capabilities<br>\ud83d\udee0 API compatibility intact<br>\ud83c\udf0d Fully open-source models &amp; papers<br><br>\ud83d\udc0b 1\/n <a href=\"https:\/\/t.co\/p1dV9gJ2Sd\">pic.twitter.com\/p1dV9gJ2Sd<\/a><\/p>&mdash; DeepSeek (@deepseek_ai) <a href=\"https:\/\/twitter.com\/deepseek_ai\/status\/1872242657348710721?ref_src=twsrc%5Etfw\">December 26, 2024<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/p><\/figure>\n\n\n\n<p>Darbas rodo, kad atvirasis \u0161altinis priart\u0117ja prie u\u017edarojo kodo modeli\u0173, \u017ead\u0117damas beveik lygiavert\u012f na\u0161um\u0105 atliekant \u012fvairias u\u017eduotis. Toki\u0173 sistem\u0173 k\u016brimas yra labai naudingas pramonei, nes tai gali panaikinti galimyb\u0119, kad \u017eaidim\u0105 valdys vienas didelis AI \u017eaid\u0117jas. Ji taip pat suteikia \u012fmon\u0117ms daug galimybi\u0173 pasirinkti ir dirbti surengiant savo kr\u016bvas.<\/p>\n\n\n\n<p>\u0160iuo metu DeepSeek-V3 kodas yra prieinamas per GitHub pagal MIT licencij\u0105, o modelis pateikiamas pagal bendrov\u0117s modelio licencij\u0105. \u012emon\u0117s taip pat gali i\u0161bandyti nauj\u0105j\u012f model\u012f per DeepSeek Chat, pana\u0161i\u0105 \u012f ChatGPT platform\u0105, ir pasiekti API komerciniam naudojimui. \u201eDeepSeek\u201c teikia API adresu <a href=\"https:\/\/twitter.com\/deepseek_ai\/status\/1872242663489188088\/photo\/2\">ta pati kaina kaip DeepSeek-V2<\/a> iki vasario 8 d. Po to jis apmokestins 0,27 USD u\u017e milijon\u0105 \u012fvesties \u017eeton\u0173 (0,07 USD u\u017e milijon\u0105 \u017eeton\u0173 su talpyklos \u012fvykiais) ir 1,10 USD u\u017e milijon\u0105 i\u0161vesties \u017eeton\u0173.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><\/figure>\n\n\n\n\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Kasdien \u012f\u017evalgos apie verslo naudojimo atvejus su VB Daily<\/strong><\/p>\n<p class=\"copy\">Jei norite padaryti \u012fsp\u016bd\u012f savo vir\u0161ininkui, \u201eVB Daily\u201c jums pad\u0117s. Suteikiame jums informacij\u0105 apie tai, k\u0105 \u012fmon\u0117s daro su generuojamuoju AI, nuo reguliavimo poky\u010di\u0173 iki praktinio diegimo, kad gal\u0117tum\u0117te pasidalinti \u012f\u017evalgomis apie did\u017eiausi\u0105 IG.<\/p>\n<p class=\"Form__newsletter-legal\">Perskaitykite m\u016bs\u0173 privatumo politik\u0105<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n<p>\t\t\t\t\tA\u010di\u016b, kad u\u017esiprenumeravote. Daugiau VB naujienlai\u0161ki\u0173 rasite \u010dia.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">\u012evyko klaida.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/deepseek-v3-ultra-large-open-source-ai-outperforms-llama-and-qwen-on-launch\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Prisijunkite prie m\u016bs\u0173 kasdieni\u0173 ir savaitini\u0173 naujienlai\u0161ki\u0173, kad gautum\u0117te naujausi\u0173 naujien\u0173 ir i\u0161skirtinio turinio apie pramon\u0117je pirmaujan\u010di\u0105 AI apr\u0117pt\u012f. Su\u017einokite&hellip;<\/p>\n","protected":false},"author":1,"featured_media":774,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[167],"tags":[],"class_list":["post-773","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technologijos"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/posts\/773","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/comments?post=773"}],"version-history":[{"count":0,"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/posts\/773\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/media\/774"}],"wp:attachment":[{"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/media?parent=773"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/categories?post=773"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/naujienaplius.lt\/index.php\/wp-json\/wp\/v2\/tags?post=773"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}