NewsOpen SourceDatabases

Apache Iceberg 1.11: Deletion Vectors and the V3 Upgrade

Apache Iceberg 1.11 data lakehouse with deletion vectors and Roaring bitmap visualization on deep blue background
Apache Iceberg 1.11 graduates V3 to production with deletion vectors and server-side scan planning

Apache Iceberg 1.11.0 shipped May 19, and it does something no previous release managed: it graduates the V3 table spec from experimental to production-ready. The headline change is deletion vectors — a fundamentally different approach to row-level deletes that AWS benchmarks at up to 10x faster DML than the positional delete files most teams are running today. But the upgrade is irreversible, and any engine without a V3-compatible connector breaks the moment you flip the switch. Here’s what changed, what breaks, and the exact steps to take before touching your production tables.

Deletion Vectors Replace Positional Delete Files

If you’ve run a high-write Iceberg table, you’ve seen the problem: every UPDATE and DELETE creates new positional delete files, and those files accumulate between compaction runs. At query time, the engine has to merge-join all of them against your data files — an O(log n) operation per file that gets worse with every write. Compaction helps, but only until the next batch of deletes arrives.

Deletion vectors fix this at the design level. Each data file gets exactly one deletion vector — a Roaring bitmap stored in a Puffin statistics file — with a set bit at every deleted position. At read time, the engine applies the bitmap at O(1) per row. No file accumulation. No growing merge-join cost. AWS measured up to 10x DML performance improvement on EMR 7.11.

The catch: deletion vectors only work on V3-format tables, and upgrading is a one-way operation.

-- Check your current format version
SHOW TBLPROPERTIES my_table ('format-version');

-- Upgrade to V3 — this cannot be reversed
ALTER TABLE my_table SET TBLPROPERTIES ('format-version' = '3');

After the upgrade, new writes use deletion vectors automatically. Existing positional delete files are cleaned up on the next compaction run. The hard requirement before you flip that switch: every engine, tool, and connector in your stack must support Iceberg V3. Engines that only understand V2 cannot read V3 tables.

Server-Side Scan Planning Unlocks Cross-Engine Governance

The second major addition is quieter but arguably more important for enterprise deployments. Iceberg 1.11 ships a new REST catalog endpoint — POST …/plan — that lets the catalog server plan a query scan and return only the relevant FileScanTasks. Previously, every engine pulled full manifest files over the network, filtered them locally, and then fetched data file metadata. For large tables, that’s a significant amount of metadata transfer before a single row gets processed.

Server-side scan planning flips that model. The catalog handles manifest traversal and returns a lean, filtered task list. Smaller scans return results immediately. Large scans return a poll ID. Massive scans split into parallel task endpoints.

The more compelling application is governance. Databricks Unity Catalog is already shipping cross-engine attribute-based access control (ABAC) in beta using this API. Administrators define row filters and column masks once in Unity Catalog. When any Iceberg engine requests a scan, UC enforces those policies server-side and sends back a pre-filtered plan. The engine sees only authorized rows — whether it’s Spark, DuckDB, or Flink. Define your security policy once, enforce it everywhere.

Variant Type Kills the JSON-as-String Workaround

Anyone processing semi-structured data in Iceberg V2 knows the workaround: store JSON event logs as a STRING column. It works for storage. It fails for analytics. The engine cannot pushdown predicates into a string column that contains JSON, so every query that filters on a nested field reads the entire column.

Iceberg 1.11 introduces the native Variant type as part of V3, solving this at the storage level. Variant uses a binary encoding that supports predicate pushdown. The optional shredding feature — fully supported in Spark 4.1 — extracts frequently-queried nested fields into separate typed Parquet sub-columns at write time while keeping a residual binary blob for everything else. The result: Parquet column statistics work on your nested fields, and the engine can prune files based on semi-structured content. Your event log pipeline no longer needs separate raw and curated table layers.

Table Encryption Is Now Production-Ready

Iceberg 1.11 ships KMS-backed table encryption using an envelope model. Each metadata file is encrypted with a unique data encryption key (DEK), and those DEKs are wrapped by a master key in your KMS. Crucially, Iceberg encrypts manifest lists too — so even direct bucket access reveals nothing about table statistics or schema structure.

Storage-layer encryption (S3 SSE, GCS CMEK) does not protect against someone with catalog access reading your metadata. KMS-backed table encryption does. For teams in regulated industries or handling sensitive schemas, this closes a real gap.

Upgrade Checklist Before Bumping to 1.11

Aside from the V3 format migration, two practical changes affect anyone upgrading:

Artifact rename for Spark 4.1: Iceberg ships separate Maven artifacts per engine version. Update your dependency before upgrading:

<!-- Old: Spark 3.5 -->
<dependency>
  <groupId>org.apache.iceberg</groupId>
  <artifactId>iceberg-spark-3.5_2.12</artifactId>
  <version>1.10.0</version>
</dependency>

<!-- New: Spark 4.1 -->
<dependency>
  <groupId>org.apache.iceberg</groupId>
  <artifactId>iceberg-spark-4.1_2.13</artifactId>
  <version>1.11.0</version>
</dependency>

JDK 17 baseline: Spark 4.1 and Flink 2.1 both require JDK 17. If your cluster or CI environment is on JDK 11, that needs to move first. Flink 2.1 also adds nanosecond timestamp precision (timestamp_ns, timestamptz_ns) — useful for streaming ingest pipelines with sub-millisecond event precision requirements.

What’s Next: V4 Foundations Are Already In

Iceberg 1.11 ships early Java interfaces for the V4 spec — not user-facing yet, but the foundation is committed. Likely V4 features include single-file commits and Parquet-native metadata storage, which would further cut the overhead of frequent small writes. No production timeline is announced yet. The official announcement is the right place to track that progress.

For now, if your engines support V3 and you’re managing high-write tables, Iceberg 1.11 is the upgrade that changes the economics of running Iceberg at scale. The deletion vector benchmark alone justifies the migration — just audit your connector versions first.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:News