bfloat16 부동소수점

bfloat16(brain floating point)^[1]^[2] 부동소수점 형식은 주기억장치에서 16비트를 차지하는 컴퓨터 숫자 형식이다. 이는 부동 소수점(floating radix point)을 사용하여 넓은 동적 범위의 숫자 값을 나타낸다. 이 형식은 기계 학습 및 니어 센서 컴퓨팅의 가속화를 목표로 하는 32비트 IEEE 754 단정밀도 부동소수점 형식(binary32)의 단축(16비트) 버전이다.^[3] 8개의 지수 비트를 유지함으로써 32비트 부동소수점 숫자의 대략적인 동적 범위를 보존하지만, binary32 형식의 24비트 유효숫자 대신 8비트 정밀도만을 지원한다. 단정밀도 32비트 부동소수점 숫자보다 bfloat16 숫자는 정수 계산에는 부적합하지만, 이는 의도된 용도가 아니다. bfloat16은 기계 학습 알고리즘의 저장 요구 사항을 줄이고 계산 속도를 높이는 데 사용된다.^[4]

bfloat16 형식은 구글의 인공지능 연구 그룹인 구글 브레인에서 개발되었다. 이는 인텔 제온 프로세서(AVX-512 BF16 확장), 인텔 데이터 센터 GPU, 인텔 너바나 NNP-L1000, 인텔 FPGA^[5]^[6]^[7], AMD 젠, AMD 인스팅트, 엔비디아 GPU, 구글 클라우드 TPU^[8]^[9]^[10], AWS 인퍼렌티아, AWS 트레이니움, ARMv8.6-A^[11], 애플의 M2^[12] 및 이후 A15 칩 등 다양한 CPU, GPU, AI 프로세서에서 활용된다. CUDA^[13], 인텔 oneAPI Math Kernel Library, AMD ROCm^[14], AMD 최적화 CPU 라이브러리, PyTorch, 텐서플로 등 여러 라이브러리가 bfloat16을 지원한다.^[10]^[15] 이러한 플랫폼에서 bfloat16은 혼합 정밀도 연산에서도 사용될 수 있으며, 이때 bfloat16 숫자는 더 넓은 자료형으로 연산되고 확장될 수 있다.

bfloat16 부동소수점 형식

bfloat16은 다음 형식을 가진다.

부호 비트: 1 비트
지수 너비: 8 비트
유효숫자 정밀도: 8비트 (명시적으로 7비트 저장, 암시적 선행 비트 포함), 기존 단정밀도 부동소수점 형식의 24비트와는 대조적이다.

bfloat16 형식은 단축된 IEEE 754 단정밀도 32비트 부동소수점 형식이므로 IEEE 754 단정밀도 32비트 부동소수점 형식으로의 형 변환 및 그 반대 변환이 빠르다. bfloat16 형식으로 변환할 때 지수 비트는 보존되는 반면, 유효숫자 필드는 절단(따라서 0으로 반올림에 해당) 또는 기타 반올림 메커니즘에 의해 축소될 수 있으며, NaN 특수 사례는 무시된다. 지수 비트를 보존함으로써 32비트 부동소수점의 범위인 약 10⁻³⁸에서 약 3 × 10³⁸이 유지된다.^[16]

비트는 다음과 같이 배열된다.

IEEE 반정밀도 16비트 부동소수점

부호

지수 (5 비트)

가수 (10 비트)

┃

0

1

0

1

0

15

14

10

9

0

bfloat16

부호

지수 (8 비트)

가수 (7 비트)

┃

0

1

0

1

0

15

14

7

6

0

엔비디아의 TensorFloat-32 (19 비트)

부호

지수 (8 비트)

가수 (10 비트)

┃

0

1

0

1

0

18

17

10

9

0

ATI의 fp24 형식 ^[17]

부호

지수 (7 비트)

가수 (16 비트)

┃

0

1

0

1

0

23

22

16

15

0

픽사의 PXR24 형식

부호

지수 (8 비트)

가수 (15 비트)

┃

0

1

0

1

0

23

22

15

14

0

IEEE 754 단정밀도 32비트 부동소수점

부호

지수 (8 비트)

가수 (23 비트)

┃

0

1

0

1

0

31

30

23

22

0

지수 인코딩

bfloat16 이진 부동소수점 지수는 오프셋 이진 표현을 사용하여 인코딩되며, 0 오프셋은 127이다. 이는 IEEE 754 표준에서 지수 편향으로도 알려져 있다.

E_min = 01_H−7F_H = −126
E_max = FE_H−7F_H = 127
지수 편향 = 7F_H = 127

따라서, 오프셋 이진 표현으로 정의된 실제 지수를 얻기 위해서는 지수 필드 값에서 127의 오프셋을 빼야 한다.

지수 필드의 최소 및 최대 값(00_H 및 FF_H)은 IEEE 754 표준 형식과 마찬가지로 특별하게 해석된다.

지수	가수 영	가수 비영	방정식
00_H	영, −0	비정규 값	(−1)^부호비트×2⁻¹²⁶× 0.가수비트
01_H, ..., FE_H	정규화된 값		(−1)^부호비트×2^{지수비트−127}× 1.가수비트
FF_H	±무한	NaN (quiet, signaling)

최소 양수 정규 값은 2⁻¹²⁶ ≈ 1.18 × 10⁻³⁸이고 최소 양수 (비정규) 값은 2⁻¹²⁶⁻⁷ = 2⁻¹³³ ≈ 9.2 × 10⁻⁴¹이다.

반올림 및 변환

가장 일반적인 사용 사례는 IEEE 754 binary32와 bfloat16 사이의 변환이다. 다음 섹션에서는 변환 과정과 변환 시의 반올림 방식을 설명한다. bfloat16으로 또는 bfloat16에서 다른 형식으로의 변환 시나리오도 있을 수 있음을 참고한다. 예를 들어, int16과 bfloat16.

binary32에서 bfloat16으로. bfloat16이 처음 저장 형식으로 도입되었을 때^[15], IEEE 754 binary32(32비트 부동소수점)에서 bfloat16으로의 변환은 절단(0으로 반올림)이었다. 이후 행렬 곱셈 단위의 입력이 되면서, 변환은 하드웨어 플랫폼에 따라 다양한 반올림 메커니즘을 가질 수 있다. 예를 들어, 구글 TPU의 경우 변환 시 반올림 방식은 가장 가까운 짝수로 반올림이다.^[18] ARM은 비IEEE Round-to-Odd 모드를 사용한다.^[19] 엔비디아의 경우, float 숫자를 가장 가까운 짝수로 반올림 모드로 bfloat16 정밀도로 변환하는 것을 지원한다.^[20]
bfloat16에서 binary32로. binary32는 bfloat16의 모든 정확한 값을 표현할 수 있으므로, 변환은 단순히 유효숫자 비트에 16개의 0을 채운다.^[18]

특수 값 인코딩

양의 무한대와 음의 무한대

IEEE 754에서와 마찬가지로, 양의 무한대와 음의 무한대는 해당 부호 비트와 8개의 지수 비트가 모두 설정(FF_hex)되고 모든 유효숫자 비트가 0으로 표현된다. 명시적으로,

val    s_exponent_signcnd
+inf = 0_11111111_0000000
-inf = 1_11111111_0000000

숫자가 아님

IEEE 754에서와 마찬가지로, NaN 값은 부호 비트, 8개의 지수 비트가 모두 설정(FF_hex)되고 모든 유효숫자 비트가 0이 아닌 경우로 표현된다. 명시적으로,

val    s_exponent_signcnd
+NaN = 0_11111111_klmnopq
-NaN = 1_11111111_klmnopq

여기서 k, l, m, n, o, p, q 중 적어도 하나는 1이다. IEEE 754와 마찬가지로, NaN 값은 quiet 또는 signaling일 수 있지만, 2018년 9월 현재 signaling bfloat16 NaN의 알려진 사용 사례는 없다.

범위 및 정밀도

Bfloat16은 32비트 IEEE 754 단정밀도 부동소수점 형식(binary32)의 숫자 범위를 유지하면서 정밀도를 24비트에서 8비트로 줄이도록 설계되었다. 이는 정밀도가 2에서 3십진수 자릿수 사이이며, bfloat16은 최대 약 3.4 × 10³⁸의 유한 값을 표현할 수 있음을 의미한다.

예시

이 예시들은 부동소수점 값의 비트 표현을 십육진법과 이진법으로 보여준다. 여기에는 부호, (편향된) 지수, 유효숫자가 포함된다.

3f80 = 0 01111111 0000000 = 1
c000 = 1 10000000 0000000 = −2

7f7f = 0 11111110 1111111 = (2⁸ − 1) × 2⁻⁷ × 2¹²⁷ ≈ 3.38953139 × 10³⁸ (bfloat16 정밀도에서 최대 유한 양수 값)
0080 = 0 00000001 0000000 = 2⁻¹²⁶ ≈ 1.175494351 × 10⁻³⁸ (bfloat16 정밀도 및 단정밀도 부동소수점에서 최소 정규 양수 값)

일반 bfloat16 숫자의 최대 양의 유한 값은 3.38953139 × 10³⁸이며, 단정밀도에서 표현 가능한 최대 양의 유한 값인 (2²⁴ − 1) × 2⁻²³ × 2¹²⁷ = 3.402823466 × 10³⁸보다 약간 낮다.

영과 무한대

0000 = 0 00000000 0000000 = 0
8000 = 1 00000000 0000000 = −0

7f80 = 0 11111111 0000000 = 무한대
ff80 = 1 11111111 0000000 = −무한대

특수 값

4049 = 0 10000000 1001001 = 3.140625 ≈ π ( 파이 )
3eab = 0 01111101 0101011 = 0.333984375 ≈ 1/3

NaNs

ffc1 = x 11111111 1000001 => qNaN
ff81 = x 11111111 0000001 => sNaN

같이 보기

반정밀도 부동소수점수: IEEE 754에 의해 정의된 1비트 부호, 5비트 지수, 11비트 유효숫자를 가진 16비트 부동소수점
ISO/IEC 10967, 언어 독립 산술
원시 자료형
미니플로트
구글 브레인
TPU에서 bfloat16을 사용한 구글에 대한 소송

각주

↑ Teich, Paul (2018년 5월 10일). “Tearing Apart Google's TPU 3.0 AI Coprocessor”. 《The Next Platform》. 2020년 8월 11일에 확인함. Google invented its own internal floating point format called "bfloat" for "brain floating point" (after Google Brain).
↑ Wang, Shibo; Kanwar, Pankaj (2019년 8월 23일). “BFloat16: The secret to high performance on Cloud TPUs”. 《Google Cloud》. 2020년 8월 11일에 확인함. This custom floating point format is called "Brain Floating Point Format," or "bfloat16" for short. The name flows from "Google Brain", which is an artificial intelligence research group at Google where the idea for this format was conceived.
↑ Tagliavini, Giuseppe; Mach, Stefan; Rossi, Davide; Marongiu, Andrea; Benin, Luca (2018). 〈A transprecision floating-point platform for ultra-low power computing〉. 《2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)》. 1051–1056쪽. arXiv:1711.10374. doi:10.23919/DATE.2018.8342167. ISBN 978-3-9819263-0-9. S2CID 5067903.
↑ Dr. Ian Cutress (2020년 3월 17일). “Intel': Cooper lake Plans: Why is BF16 Important?”. 2020년 3월 18일에 원본 문서에서 보존된 문서. 2020년 5월 12일에 확인함. The bfloat16 standard is a targeted way of representing numbers that give the range of a full 32-bit number, but in the data size of a 16-bit number, keeping the accuracy close to zero but being a bit more loose with the accuracy near the limits of the standard. The bfloat16 standard has a lot of uses inside machine learning algorithms, by offering better accuracy of values inside the algorithm while affording double the data in any given dataset (or doubling the speed in those calculation sections).
↑ Khari Johnson (2018년 5월 23일). “Intel unveils Nervana Neural Net L-1000 for accelerated AI training”. 《VentureBeat》. 2018년 5월 23일에 확인함. ...Intel will be extending bfloat16 support across our AI product lines, including Intel Xeon processors and Intel FPGAs.
↑ Michael Feldman (2018년 5월 23일). “Intel Lays Out New Roadmap for AI Portfolio”. 《TOP500 Supercomputer Sites》. 2018년 5월 23일에 확인함. Intel plans to support this format across all their AI products, including the Xeon and FPGA lines
↑ Lucian Armasu (2018년 5월 23일). “Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019”. 《Tom's Hardware》. 2018년 5월 23일에 확인함. Intel said that the NNP-L1000 would also support bfloat16, a numerical format that’s being adopted by all the ML industry players for neural networks. The company will also support bfloat16 in its FPGAs, Xeons, and other ML products. The Nervana NNP-L1000 is scheduled for release in 2019.
↑ “Available TensorFlow Ops | Cloud TPU | Google Cloud”. 《Google Cloud》. 2018년 5월 23일에 확인함. This page lists the TensorFlow Python APIs and graph operators available on Cloud TPU.
↑ Elmar Haußmann (2018년 4월 26일). “Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50”. 《RiseML Blog》. 2018년 4월 26일에 원본 문서에서 보존된 문서. 2018년 5월 23일에 확인함. For the Cloud TPU, Google recommended we use the bfloat16 implementation from the official TPU repository with TensorFlow 1.7.0. Both the TPU and GPU implementations make use of mixed-precision computation on the respective architecture and store most tensors with half-precision.
↑ ^가 ^나 Tensorflow Authors (2018년 7월 23일). “ResNet-50 using BFloat16 on TPU”. 《Google》. 2018년 11월 6일에 확인함.
↑ “BFloat16 extensions for Armv8-A”. 《community.arm.com》 (영어). 2019년 8월 29일. 2019년 8월 30일에 확인함.
↑ “AArch64: add support for newer Apple CPUs · llvm/llvm-project@677da09”. 《GitHub》 (영어). 2023년 5월 8일에 확인함.
↑ “CUDA Library bloat16 Intrinsics”.
↑ “ROCm version history”. 《github.com》 (영어). 2019년 10월 23일에 확인함.
↑ ^가 ^나 Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous (2017년 11월 28일). TensorFlow Distributions (보고서). arXiv:1711.10604. Bibcode:2017arXiv171110604D. Accessed 2018-05-23. All operations in TensorFlow Distributions are numerically stable across half, single, and double floating-point precisions (as TensorFlow dtypes: tf.bfloat16 (truncated floating point), tf.float16, tf.float32, tf.float64). Class constructors have a validate_args flag for numerical asserts CS1 관리 - 여러 이름 (링크)
↑ “Livestream Day 1: Stage 8 (Google I/O '18) - YouTube”. 《Google》. 2018년 5월 8일. 2018년 5월 23일에 확인함. In many models this is a drop-in replacement for float-32
↑ Buck, Ian (2005년 3월 13일), 〈Chapter 32. Taking the Plunge into GPU Computing〉, Pharr, Matt, 《GPU Gems》, Addison-Wesley, ISBN 0-321-33559-7, 2018년 4월 5일에 확인함 .
↑ ^가 ^나 “The bfloat16 numerical format”. 《Google Cloud》. 2023년 7월 11일에 확인함. On TPU, the rounding scheme in the conversion is round to nearest even and overflow to inf.
↑ “Arm A64 Instruction Set Architecture”. 《developer.arm.com》. 2023년 7월 26일에 확인함. Uses the non-IEEE Round-to-Odd rounding mode.
↑ “1.3.5. Bfloat16 Precision Conversion and Data Movement” (PDF). 《docs.nvidia.com》. 199쪽. 2023년 7월 26일에 확인함. Converts float number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.

[1] Teich, Paul (2018년 5월 10일). “Tearing Apart Google's TPU 3.0 AI Coprocessor”. 《The Next Platform》. 2020년 8월 11일에 확인함. Google invented its own internal floating point format called "bfloat" for "brain floating point" (after Google Brain).

[2] Wang, Shibo; Kanwar, Pankaj (2019년 8월 23일). “BFloat16: The secret to high performance on Cloud TPUs”. 《Google Cloud》. 2020년 8월 11일에 확인함. This custom floating point format is called "Brain Floating Point Format," or "bfloat16" for short. The name flows from "Google Brain", which is an artificial intelligence research group at Google where the idea for this format was conceived.

[3] Tagliavini, Giuseppe; Mach, Stefan; Rossi, Davide; Marongiu, Andrea; Benin, Luca (2018). 〈A transprecision floating-point platform for ultra-low power computing〉. 《2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)》. 1051–1056쪽. arXiv:1711.10374. doi:10.23919/DATE.2018.8342167. ISBN 978-3-9819263-0-9. S2CID 5067903.

[Why-4] Dr. Ian Cutress (2020년 3월 17일). “Intel': Cooper lake Plans: Why is BF16 Important?”. 2020년 3월 18일에 원본 문서에서 보존된 문서. 2020년 5월 12일에 확인함. The bfloat16 standard is a targeted way of representing numbers that give the range of a full 32-bit number, but in the data size of a 16-bit number, keeping the accuracy close to zero but being a bit more loose with the accuracy near the limits of the standard. The bfloat16 standard has a lot of uses inside machine learning algorithms, by offering better accuracy of values inside the algorithm while affording double the data in any given dataset (or doubling the speed in those calculation sections).

[vent_Inte-5] Khari Johnson (2018년 5월 23일). “Intel unveils Nervana Neural Net L-1000 for accelerated AI training”. 《VentureBeat》. 2018년 5월 23일에 확인함. ...Intel will be extending bfloat16 support across our AI product lines, including Intel Xeon processors and Intel FPGAs.

[top5_Inte-6] Michael Feldman (2018년 5월 23일). “Intel Lays Out New Roadmap for AI Portfolio”. 《TOP500 Supercomputer Sites》. 2018년 5월 23일에 확인함. Intel plans to support this format across all their AI products, including the Xeon and FPGA lines

[toms_Inte-7] Lucian Armasu (2018년 5월 23일). “Intel To Launch Spring Crest, Its First Neural Network Processor, In 2019”. 《Tom's Hardware》. 2018년 5월 23일에 확인함. Intel said that the NNP-L1000 would also support bfloat16, a numerical format that’s being adopted by all the ML industry players for neural networks. The company will also support bfloat16 in its FPGAs, Xeons, and other ML products. The Nervana NNP-L1000 is scheduled for release in 2019.

[clou_Avai-8] “Available TensorFlow Ops | Cloud TPU | Google Cloud”. 《Google Cloud》. 2018년 5월 23일에 확인함. This page lists the TensorFlow Python APIs and graph operators available on Cloud TPU.

[blog_Comp-9] Elmar Haußmann (2018년 4월 26일). “Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50”. 《RiseML Blog》. 2018년 4월 26일에 원본 문서에서 보존된 문서. 2018년 5월 23일에 확인함. For the Cloud TPU, Google recommended we use the bfloat16 implementation from the official TPU repository with TensorFlow 1.7.0. Both the TPU and GPU implementations make use of mixed-precision computation on the respective architecture and store most tensors with half-precision.

[gith_tens-10] 가 ^나 Tensorflow Authors (2018년 7월 23일). “ResNet-50 using BFloat16 on TPU”. 《Google》. 2018년 11월 6일에 확인함.

[11] “BFloat16 extensions for Armv8-A”. 《community.arm.com》 (영어). 2019년 8월 29일. 2019년 8월 30일에 확인함.

[12] “AArch64: add support for newer Apple CPUs · llvm/llvm-project@677da09”. 《GitHub》 (영어). 2023년 5월 8일에 확인함.

[13] “CUDA Library bloat16 Intrinsics”.

[14] “ROCm version history”. 《github.com》 (영어). 2019년 10월 23일에 확인함.

[arxiv_1711.10604-15] 가 ^나 Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous (2017년 11월 28일). TensorFlow Distributions (보고서). arXiv:1711.10604. Bibcode:2017arXiv171110604D. Accessed 2018-05-23. All operations in TensorFlow Distributions are numerically stable across half, single, and double floating-point precisions (as TensorFlow dtypes: tf.bfloat16 (truncated floating point), tf.float16, tf.float32, tf.float64). Class constructors have a validate_args flag for numerical asserts CS1 관리 - 여러 이름 (링크)

[googleio18-day1-time2575-16] “Livestream Day 1: Stage 8 (Google I/O '18) - YouTube”. 《Google》. 2018년 5월 8일. 2018년 5월 23일에 확인함. In many models this is a drop-in replacement for float-32

[17] Buck, Ian (2005년 3월 13일), 〈Chapter 32. Taking the Plunge into GPU Computing〉, Pharr, Matt, 《GPU Gems》, Addison-Wesley, ISBN 0-321-33559-7, 2018년 4월 5일에 확인함 .

[google_TPU-18] 가 ^나 “The bfloat16 numerical format”. 《Google Cloud》. 2023년 7월 11일에 확인함. On TPU, the rounding scheme in the conversion is round to nearest even and overflow to inf.

[arm_product-19] “Arm A64 Instruction Set Architecture”. 《developer.arm.com》. 2023년 7월 26일에 확인함. Uses the non-IEEE Round-to-Odd rounding mode.

[20] “1.3.5. Bfloat16 Precision Conversion and Data Movement” (PDF). 《docs.nvidia.com》. 199쪽. 2023년 7월 26일에 확인함. Converts float number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

v t e 자료형
공통	비트 바이트 트리트(삼진법) 트라이트 워드
수치	정수형 고정소수점 부동소수점 Complex Bignum 구간 십진 연산
플레인 텍스트	문자 문자열
포인터	주소 참조
복합 자료형	대수적 자료형 (GADT) 배열 연관 배열 클래스 리스트 객체 옵션 타입 프로덕트 타입 레코드 집합 공용체
기타	불리언 자료형 보텀형 수집 열거형 예외 처리 함수형 불투명 자료형 재귀 자료형 세마포어 스트림 톱형 타입 클래스 유닛형 Void
관련 항목	추상 자료형 자료 구조 인터페이스 원시 자료형 파생형 탬플릿 형 구성체 다형성