πŸ› οΈAI 도ꡬ2026-05-29

λ‰΄μŠ€ - 원문 기반 μš”μ•½ ν•„μš”

πŸ’‘ ν•œμ€„ μš”μ•½|λ‰΄μŠ€ - 원문 기반 μš”μ•½ ν•„μš”


title: "ν΄λ‘œλ“œ 였퍼슀 4.8, 벀치마크 선두" description: "λ‰΄μŠ€ - 원문 기반 μš”μ•½ ν•„μš”" date: 2026-05-29 tags: [ai-news] source: "https://the-decoder.com/anthropic-ships-claude-opus-4-8-as-a-modest-but-tangible-improvement-that-tops-gpt-5-5-in-most-benchmarks/" sidebar: order: 0

제λͺ©(ν•œκΈ€): ν΄λ‘œλ“œ 였퍼슀 4.8, 벀치마크 선두 원문 제λͺ©(영문): Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks 원문: Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks μ†ŒμŠ€: the-decoder MD 파일: content/2026-05-29/the-decoder-anthropic-ships-claude-opus-4-8-as-a-modest-but-ta.md

핡심 λ‚΄μš©

Anthropic이 Claude Opus 4.8을 κ³΅κ°œν–ˆκ³ , λŒ€λΆ€λΆ„ λ²€μΉ˜λ§ˆν¬μ—μ„œ GPT-5.5와 Gemini 3.1 Proλ₯Ό μ•žμ„°μ–΄μš”.

에이전틱 μ½”λ”©(SWE-Bench Pro)은 69.2%둜 Opus 4.7의 64.3%, GPT-5.5의 58.6%보닀 λ†’μ•˜μ–΄μš”. Humanity's Last Exam은 도ꡬ 없이 49.8%, 도ꡬ μ‚¬μš© μ‹œ 57.9%둜 졜고 점수λ₯Ό κΈ°λ‘ν–ˆμ–΄μš”.

Anthropic은 특히 정직성 κ°œμ„ μ„ κ°•μ‘°ν–ˆμ–΄μš”. 초기 ν…ŒμŠ€ν„° κΈ°μ€€μœΌλ‘œ λΆˆν™•μ‹€μ„±μ„ 더 자주 밝히고, κ·Όκ±° μ—†λŠ” μ£Όμž₯도 μ€„μ—ˆκ³ μš”. 자체 μ½”λ”© 평가에선 버그λ₯Ό κ·Έλƒ₯ λ„˜κΈ°λŠ” λΉ„μœ¨μ΄ 4.7 λŒ€λΉ„ μ•½ 4λ°° κ°μ†Œν–ˆλ‹€κ³  λ°ν˜”μ–΄μš”.

λͺ¨λΈ μ„±λŠ₯도 ν¬μ§€λ§Œ, ν•œ μ„Έμ…˜μ—μ„œ 수백 개 병렬 μ„œλΈŒμ—μ΄μ „νŠΈλ₯Ό λŒλ¦¬λŠ” 동적 μ›Œν¬ν”Œλ‘œμš°κ°€ μ‹€μ œ 업무 μžλ™ν™”μ˜ 체감 λ³€ν™”λ₯Ό ν‚€μšΈ ν¬μΈνŠΈμ˜ˆμš”.

작돌쌀의 ν•œλ§ˆλ””

버그λ₯Ό λ†“μΉ˜κ³ λ„ μ§„μ²™μ²˜λŸΌ λ§ν•˜λŠ” λΉˆλ„κ°€ 4λ°° μ€„μ—ˆλ‹€κ³  ν•΄μš”. 동적 μ›Œν¬ν”Œλ‘œμš°λ‘œ λŒ€κ·œλͺ¨ μ½”λ“œ λ§ˆμ΄κ·Έλ ˆμ΄μ…˜ μžλ™ν™”λ„ 노릴 수 μžˆμ–΄μš”.


좜처: Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

이 글이 μ–΄λ• λ‚˜μš”?