๐Ÿ› ๏ธAI ๋„๊ตฌ2026-05-26

๋‰ด์Šค - ์›๋ฌธ ๊ธฐ๋ฐ˜ ์š”์•ฝ ํ•„์š”

๐Ÿ’ก ํ•œ์ค„ ์š”์•ฝ|๋‰ด์Šค - ์›๋ฌธ ๊ธฐ๋ฐ˜ ์š”์•ฝ ํ•„์š”


title: "Together AI, OSCAR๋กœ KV ์บ์‹œ INT2 ๊ณต๊ฐœ" description: "๋‰ด์Šค - ์›๋ฌธ ๊ธฐ๋ฐ˜ ์š”์•ฝ ํ•„์š”" date: 2026-05-26 tags: [ai-news] source: "https://www.marktechpost.com/2026/05/25/together-ai-open-sources-oscar-an-attention-aware-2-bit-kv-cache-quantization-system-for-long-context-llm-serving/" sidebar: order: 0

์ œ๋ชฉ(ํ•œ๊ธ€): Together AI, OSCAR๋กœ KV ์บ์‹œ INT2 ๊ณต๊ฐœ ์›๋ฌธ ์ œ๋ชฉ(์˜๋ฌธ): Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving ์›๋ฌธ: Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving ์†Œ์Šค: marktechpost MD ํŒŒ์ผ: content/2026-05-26/marktechpost-together-ai-open-sources-oscar-an-attention-aware-.md

ํ•ต์‹ฌ ๋‚ด์šฉ

Together AI๊ฐ€ ์žฅ๋ฌธ๋งฅ LLM ์„œ๋น™์šฉ 2๋น„ํŠธ KV ์บ์‹œ ์–‘์žํ™” ์‹œ์Šคํ…œ OSCAR๋ฅผ ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœํ–ˆ์–ด์š”.

๋ฌธ์ œ๋Š” 100K ํ† ํฐ๊ธ‰ ๊ธด ์ปจํ…์ŠคํŠธ์™€ ๋™์‹œ ๋‹ค์ค‘ ์š”์ฒญ์—์„œ KV ์บ์‹œ๊ฐ€ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํฌ๊ฒŒ ์žก์•„๋จน๋Š”๋‹ค๋Š” ์ ์ด์—ˆ์–ด์š”. ๊ธฐ์กด INT2 ๋ฐฉ์‹์€ ์ •ํ™•๋„๊ฐ€ ๋ฌด๋„ˆ์ง€๊ฑฐ๋‚˜ paged KV-cache์™€ ์•ˆ ๋งž๋Š” ๊ตฌ์กฐ๊ฐ€ ๋งŽ์•˜๊ฑฐ๋“ ์š”.

OSCAR๋Š” ํ™œ์„ฑ๊ฐ’ ๋ถ„ํฌ๊ฐ€ ์•„๋‹ˆ๋ผ ์–ดํ…์…˜ ํ†ต๊ณ„ ๊ธฐ๋ฐ˜์œผ๋กœ ํšŒ์ „ ํ–‰๋ ฌ์„ ์žก์•„์š”. ํŠนํžˆ ํ‚ค๋Š” QโŠคQ(์ฟผ๋ฆฌ ๊ณต๋ถ„์‚ฐ) ๊ธฐ์ค€์œผ๋กœ, ๊ฐ’์€ ์ถœ๋ ฅ ์˜ค์ฐจ ๊ด€์ ์œผ๋กœ ์ค‘์š” ๋ฐฉํ–ฅ์„ ๋ฐ˜์˜ํ•ด 4๋ ˆ๋ฒจ๋ฟ์ธ INT2 ์˜ค์ฐจ๋ฅผ ๋œ ์น˜๋ช…์ ์ธ ์ถ•์œผ๋กœ ๋ณด๋‚ด๋Š” ์ ‘๊ทผ์ด์—์š”.

๊ฐ™์€ 2๋น„ํŠธ๋ผ๋„ ์–ด๋””์— ์˜ค์ฐจ๋ฅผ ๋‚จ๊ธฐ๋А๋ƒ๊ฐ€ ์‹ค์‚ฌ์šฉ ์„ฑ๋Šฅ์„ ๊ฐ€๋ฅธ๋‹ค๋Š” ๊ฑธ ๋ณด์—ฌ์ค€ ์‚ฌ๋ก€๋ผ, ์žฅ๋ฌธ๋งฅ ์„œ๋น™ ๋น„์šฉ ๊ตฌ์กฐ๋ฅผ ๋ฐ”๊ฟ€ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์–ด์š”.

์žก๋Œ์Œค์˜ ํ•œ๋งˆ๋””

OSCAR๋Š” INT2์˜ 4๋ ˆ๋ฒจ ํ•œ๊ณ„๋ฅผ ์–ดํ…์…˜ ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ํšŒ์ „์œผ๋กœ ๋ณด์™„ํ•ด์š”. ๊ฐ™์€ ์••์ถ•๋ฅ ์—์„œ๋„ ์ •ํ™•๋„์™€ ์„œ๋น™ ํ˜ธํ™˜์„ฑ์„ ํ•จ๊ป˜ ๋…ธ๋ฆฌ๋Š” ์ ‘๊ทผ์ด์—์š”.


์ถœ์ฒ˜: Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

์ด ๊ธ€์ด ์–ด๋• ๋‚˜์š”?