Apache Iceberg

Apache Iceberg is an open table format for huge, slow-changing, analytic data-sets. Iceberg aims to support the following:

Because Iceberg has been designed for large data-sets it can be used in production where a single table is measured in petabytes. Iceberg is also designed to solve correctness problems in eventually-consistent cloud object stores. It is cloud platform agnostic, table changes are atomic so readers never see partial changes, and concurrent writers when experiencing conflicts will retry to ensure updates are compatible.

Overview

Iceberg Metadata

Performance

Reliability

Catalog

Partitioning

Branching & Tagging

Evolution

Maintenance

Metrics Reporting

Schemas

Type Description Notes
boolean True or false  
int 32-bit signed integers Can promote to long
long 64-bit signed integers  
float 32-bit IEEE 754 floating point Can promote to double
double 64-bit IEEE 754 floating point  
decimal(P,S) Fixed-point decimal; precision P, scale S Scale is fixed and precision must be 38 or less
date Calendar date without timezone or time  
time Time of day without date, timezone Stored as microseconds
timestamp Timestamp without timezone Stored as microseconds
timestamptz Timestamp with timezone Stored as microseconds
string Arbitrary-length character sequences Encoded with UTF-8
fixed(L) Fixed-length byte array of length L  
binary Arbitrary-length byte array  
struct<...> A record with named fields of any data type  
list<E> A list with elements of any data type  
map<K, V> A map with keys and values of any data type  

Glossary