On SingleStore native integration with Apache Iceberg. Q&A with Dave Eyler

Q1. You just announced a native integration with Apache Iceberg. What is Apache Iceberg and what is it useful for?

Apache Iceberg is the most popular open-source format for storing massive data sets and making them available to a variety of engines.  Now that the biggest lakehouse vendors have embraced Iceberg, it’s finally possible to write data with one system and read the same data with another.  This gives developers tremendous flexibility to pick the engine of their choice for various workloads on the same data.

Q2. What does it mean to build real-time intelligent applications on Iceberg data?

Customers expect more from their applications every year.  Whereas in the past it was enough to simply perform quick point reads and writes, in 2024 expectations for apps are much higher.  They are expected to use generative AI, which requires vector search and full text search, as well as offer historical views, which need analytical queries and fast ingest.  Often all of this must be done on real-time data and on JSON data, which only raises the bar further.   At SingleStore, we refer to these apps as “intelligent applications”.  Much of the data needed to power these features can be found in lakehouses which support Iceberg, and SingleStore is poised to capitalize on this opportunity with its new bi-directional Iceberg integration.

Q3. Why do you believe enterprises face the critical challenge where an estimated 90% of data remains “frozen” in lakehouses?

The challenge enterprises face is that the data platforms they use to write most of their Iceberg data, while amazing in other respects, most likely do not support the capabilities above, at scale, with the high concurrency and price / performance that SingleStore offers.  That’s what we mean when we say the data is “frozen” – it can be used for some workloads that don’t have tight SLAs – but it can’t be used to power intelligent applications.

Q4. Is this new integration going to help overcome this challenge? How?

Absolutely.  Now that we’ve made it easy to ingest Iceberg data into SingleStore without any ETL tools or costly queries on the source system, there is nothing stopping developers from taking advantage of SingleStore’s cutting edge performance capabilities on Iceberg data.  Also, since SingleStore can write back to the Glue catalog (and more catalogs coming soon), developers will be able to support use cases where real time data lands in SingleStore first and is shared back with the ecosystem.  

Q6. Is this going to replace extensive ETL workflows and compute-intensive Spark jobs, or anything else?

Raw data will still need to be processed by Spark and other technologies before it gets stored in Iceberg, but once it’s there, it can be used by SingleStore or any other Iceberg compatible data platform.  So while those ETL jobs will remain, the jobs that were needed to simply move the same data from one data platform to another will no longer be necessary. 

Q7. Some of this offering is available in public preview and some in private preview. What does it mean in practice?

The Zero ETL ingest from Iceberg feature is in public preview, which means it’s available now, in 8.7, for anyone to try on the Glue or Snowflake catalogs.  Writing back to Iceberg tables for the Glue catalog is in private preview, which means customers can contact us if they’d like to try it.  This approach allows us to gather customer feedback in a proactive and controlled manner, so we can improve the user experience and make the integration as seamless as possible. 

Q8. Will you continue to invest in Iceberg after this release?

We are just getting started!  In addition to adding Polaris and Hive support for the above features, later this year we will be adding external tables, which will let customers query Iceberg data directly without creating a local copy. This feature will allow users to access and analyze data stored in Apache Iceberg tables seamlessly from SingleStore, eliminating the need for data duplication and reducing storage overhead.

Q9. Apart from the Iceberg integration, what else did you announce?

In addition to the Iceberg integration, we’re releasing multiple product enhancements that will make enterprise-grade intelligent app development more efficient while saving time and money, including: 

  • Faster vector search capabilities. Our state-of-the-art vector performance continues to set industry-leading standards. We added a 40% faster HNSW vector search compared to the earlier versions from 8.5 released in January. In addition, our IVF flat index records between 47 to 100x faster than pgvector, and our vector indexing is 2-3x shorter than Milvus and pgvector. Our vector capabilities will support developers as they continue to work toward real-time AI. 
  • High-relevance full-text search. In this latest product iteration, we’re improving relevance scoring, phonetic similarity and fuzzy matching and keyword-proximity based ranking within our data platform. These features are available in public preview, allowing users to simplify their data architecture by bypassing the need to add specialty databases (e.g. ElasticSearch, Pinecone or Milvus) to power gen AI apps. 
  • Autoscaling. Scaling resources effectively for complex app development is not an easy task. When user growth accelerates or complex analytical requirements are needed, app performance tends to suffer and impacts overall revenue growth by delaying timelines, adding licensing costs and management overheads. Our new autoscaling and fast scaling capabilities mitigate these challenges by enabling users to scale up or down compute resources in minutes to adjust effectively to unpredictable workloads. 
  • Helios® — BYOC deployment. Customers using AWS can now access a fully managed cloud offering (Helios®— BYOC) within SingleStore’s real-time data platform, increasing the ease of management and elastic scalability of the platform. This means customers can manage their data within SingleStore in their own virtual private cloud (VPC) while still complying with data residency and governance policies. 

Qx. Anything else you wish to add?

We’re excited about how these new features will help our customers develop innovative and intelligent applications. To learn more about these products, read this blog post. Another way to keep up with all things SingleStore is to follow us on XLinkedInFacebook and Instagram

…………………………………….

Dave Eyler, VP of Product at SingleStore.

Resources

Sponsored by SingleStore

You may also like...