๐Ÿ› ๏ธAI ๋„๊ตฌ2026-05-28

๋‰ด์Šค - ์›๋ฌธ ๊ธฐ๋ฐ˜ ์š”์•ฝ ํ•„์š”

๐Ÿ’ก ํ•œ์ค„ ์š”์•ฝ|๋‰ด์Šค - ์›๋ฌธ ๊ธฐ๋ฐ˜ ์š”์•ฝ ํ•„์š”


title: "์—”๋น„๋””์•„, ์—์ด์ „ํŠธ RL ํ”„๋ ˆ์ž„์›Œํฌ Polar ๊ณต๊ฐœ" description: "๋‰ด์Šค - ์›๋ฌธ ๊ธฐ๋ฐ˜ ์š”์•ฝ ํ•„์š”" date: 2026-05-28 tags: [vibe-coding, ai-tool] source: "https://www.marktechpost.com/2026/05/27/nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-across-codex-claude-code-and-qwen-code/" sidebar: order: 0

์ œ๋ชฉ(ํ•œ๊ธ€): ์—”๋น„๋””์•„, ์—์ด์ „ํŠธ RL ํ”„๋ ˆ์ž„์›Œํฌ Polar ๊ณต๊ฐœ ์›๋ฌธ ์ œ๋ชฉ(์˜๋ฌธ): NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code ์›๋ฌธ: NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code ์†Œ์Šค: marktechpost MD ํŒŒ์ผ: content/2026-05-28/marktechpost-nvidia-releases-polar-a-token-faithful-rollout-fra.md

ํ•ต์‹ฌ ๋‚ด์šฉ

์—”๋น„๋””์•„๊ฐ€ Codex CLIยทClaude CodeยทQwen Code ๊ฐ™์€ ๊ธฐ์กด ์—์ด์ „ํŠธ ํ•˜๋„ค์Šค๋ฅผ ๊ฑฐ์˜ ์ˆ˜์ • ์—†์ด GRPO ํ•™์Šต์— ์—ฐ๊ฒฐํ•˜๋Š” Polar๋ฅผ ๊ณต๊ฐœํ–ˆ์–ด์š”.

ํ•ต์‹ฌ์€ ํ•˜๋„ค์Šค ๋‚ด๋ถ€๊ฐ€ ์•„๋‹ˆ๋ผ ๋ชจ๋ธ API ๊ฒฝ๊ณ„์— ํ”„๋ก์‹œ๋ฅผ ๋‘๋Š” ๋ฐฉ์‹์ด์—์š”. ๊ธฐ์กด์ฒ˜๋Ÿผ env.initยทenv.stepยทenv.reset ํ˜•ํƒœ๋กœ ํ•˜๋„ค์Šค๋ฅผ ์žฌ๊ตฌํ˜„ํ•˜์ง€ ์•Š์•„๋„, ๋ฒ ์ด์Šค URL๋งŒ ๊ฒŒ์ดํŠธ์›จ์ด๋กœ ๋ฐ”๊พธ๋ฉด ๋™์ž‘ํ•˜๊ฑฐ๋“ ์š”.

ํ”„๋ก์‹œ๋Š” ์š”์ฒญ๋งˆ๋‹ค 4๋‹จ๊ณ„๋ฅผ ์ฒ˜๋ฆฌํ•ด์š”. API ์œ ํ˜• ๊ฐ์ง€, ์š”์ฒญ ์ •๊ทœํ™”, ํ”„๋กฌํ”„ํŠธ/์‘๋‹ต ํ† ํฐ IDยทfinish reasonยทlogprob ์บก์ฒ˜, ๊ทธ๋ฆฌ๊ณ  ์›๋ž˜ ์ œ๊ณต์ž ์Šคํ‚ค๋งˆ๋กœ ์žฌ๋ณ€ํ™˜๊นŒ์ง€ ๋งก์•„์š”. ์ŠคํŠธ๋ฆฌ๋ฐ๋„ ๋น„์ŠคํŠธ๋ฆฌ๋ฐ ์‘๋‹ต์„ ๋ฐ›์•„ SSE ํ˜•ํƒœ๋กœ ๋‹ค์‹œ ๋‚ด๋ณด๋‚ด ํ˜ธํ™˜์„ฑ์„ ์œ ์ง€ํ•ด์š”.

๊ฒฐ๊ตญ ํ‰๊ฐ€ ํ™˜๊ฒฝ์˜ ๋™์ž‘์„ ์œ ์ง€ํ•œ ์ฑ„ ํ† ํฐ ๋‹จ์œ„ RL ๋ฐ์ดํ„ฐ๋ฅผ ํ™•๋ณดํ•ด, ์—์ด์ „ํŠธ ํ•™์Šต ์žฌํ˜„์„ฑ๊ณผ ํ™•์žฅ์„ฑ์„ ํ•จ๊ป˜ ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ์ ‘๊ทผ์ด์—์š”.

์žก๋Œ์Œค์˜ ํ•œ๋งˆ๋””

๊ธฐ์กด ํ•˜๋„ค์Šค๋Š” ๋ฒ ์ด์Šค URL๋งŒ ๋ฐ”๊พธ๋ฉด ๋ผ์š”. ์žฌ๊ตฌํ˜„ ์—†์ด ๋„ค์ดํ‹ฐ๋ธŒ ์‹คํ–‰ ๊ฒฝ๋กœ๋ฅผ ์œ ์ง€ํ•ด GRPO ํ•™์Šต์˜ ์žฌํ˜„์„ฑ๊ณผ ํ™•์žฅ์„ฑ์ด ๋†’์•„์ ธ์š”.


์ถœ์ฒ˜: NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

์ด ๊ธ€์ด ์–ด๋• ๋‚˜์š”?