2026-03-12

Grok 4.20 trails Gemini and GPT-5.4 by a wide margin but sets a new record for not hallucinating

xAI์˜ Grok 4.20์ด ์„ฑ๋Šฅ์€ ๋’ค์ฒ˜์กŒ์ง€๋งŒ, ํ™˜๊ฐ ์–ต์ œ์—์„œ๋Š” ์‹ ๊ธฐ๋ก์„ ์„ธ์› ์–ด์š”.

Artificial Analysis ๊ธฐ์ค€ ์ถ”๋ก  ๋ชจ๋“œ Intelligence Index๊ฐ€ 48์ ์œผ๋กœ, Gemini 3.1 Pro Preview์™€ GPT-5.4์˜ 57์ ๋ณด๋‹ค ํฌ๊ฒŒ ๋‚ฎ์•˜์–ด์š”. ๊ทธ๋ž˜๋„ Grok 4 ๋Œ€๋น„ 6์  ์˜ค๋ฅธ ์ˆ˜์น˜์˜ˆ์š”.

API๋Š” ์ถ”๋ก  ํฌํ•จยท๋ฏธํฌํ•จยท๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ 3์ข…์œผ๋กœ ๋‚˜์™”๊ณ , ์ปจํ…์ŠคํŠธ ์œˆ๋„๋Š” 200๋งŒ ํ† ํฐ์ด์—์š”. ๊ฐ€๊ฒฉ์€ 100๋งŒ ํ† ํฐ๋‹น 2๋‹ฌ๋Ÿฌ ๋˜๋Š” 6๋‹ฌ๋Ÿฌ๋ผ์„œ ์ด์ „ Grok 4๋ณด๋‹ค ์ €๋ ดํ•˜๊ณ  ์„œ๊ตฌ๊ถŒ ๋ชจ๋ธ ๋Œ€๋น„ ๊ฒฝ์Ÿ๋ ฅ๋„ ์žˆ๋Š” ํŽธ์ด๊ฑฐ๋“ ์š”.

๋Œ€์‹  ์‚ฌ์‹ค์„ฑ์€ ๊ฐ•ํ–ˆ์–ด์š”. AA Omniscience์—์„œ ๋น„ํ™˜๊ฐ๋ฅ  78%๋ฅผ ๊ธฐ๋กํ•ด ์ตœ๊ณ ์น˜๋ฅผ ์ฐ์—ˆ๊ณ , ๋ชจ๋ฅผ ๋•Œ ํ‹€๋ฆฐ ๋‹ต์„ ๋งŒ๋“  ๋น„์œจ์ด ์•ฝ 5๋ฒˆ ์ค‘ 1๋ฒˆ ์ˆ˜์ค€์ด์—ˆ์–ด์š”. ์ด์ œ LLM ๊ฒฝ์Ÿ์˜ ์ถ•์ด โ€œ๋” ๋˜‘๋˜‘ํ•จโ€์—์„œ โ€œ๋œ ์ง€์–ด๋ƒ„โ€์œผ๋กœ๋„ ์ด๋™ํ•˜๋Š” ํ๋ฆ„์ด์—์š”.

์ด ๊ธ€์ด ์–ด๋• ๋‚˜์š”?