Saturday, May 25, 2019

Databricks Open Sources Delta Lake for Data Lake Reliability

BWW News Desk
April 24, 2019

a referrerpolicyunsafeurl targetblank hrefhttpsctsbusinesswirecomctCTidsmartlinkampurlhttp3A2F2Fwwwdatabrickscomampesheet51973704ampnewsitemid20190424005344amplanenUSampanchorDatabricksampindex1ampmd5fac6117f6e551d9fac693cde2b1fbefe relnofollowDatabricksa the leader in Unified Analytics and founded by the original creators of Apache Spark today announced a new open source project called Delta Lake to deliver reliability to data lakes Delta Lake is the first productionready open source technology to provide data lake reliability for both batch and streaming data This new open source project will enable organizations to transform their existing messy data lakes into clean Delta Lakes with high quality data thereby accelerating their data and machine learning initiatives pblockquotep idpullquotepblockquote p Watch the a referrerpolicyunsafeurl targetblank hrefhttpsctsbusinesswirecomctCTidsmartlinkampurlhttps3A2F2Fpagesdatabrickscom2F2019SAISlivestreamhtmlampesheet51973704ampnewsitemid20190424005344amplanenUSampanchorSpark2BAISummit2019keynotesampindex2ampmd59901da23d8a635461f172dac36611889 relnofollowSpark AI Summit 2019 keynotesa live p p While attractive as an initial sink for data data lakes suffer from data reliability challenges Unreliable data in data lakes prevents organizations from deriving business insights quickly and significantly slows down strategic machine learning initiatives Data reliability challenges derive from failed writes schema mismatches and data inconsistencies when mixing batch and streaming data and supporting multiple writers and readers simultaneously p p Today nearly every company has a data lake they are trying to gain insights from but data lakes have proven to lack data reliability Delta Lake has eliminated these challenges for hundreds of enterprises By making Delta Lake open source developers will be able to easily build reliable data lakes and turn them into Delta Lakes said Ali Ghodsi cofounder and CEO at Databricks p p Delta Lake delivers reliability by managing transactions across streaming and batch data and across multiple simultaneous readers and writers Delta Lakes can be easily plugged into any Apache Spark job as a data source enabling organizations to gain data reliability with minimal change to their data architectures With a referrerpolicyunsafeurl targetblank hrefhttpsctsbusinesswirecomctCTidsmartlinkampurlhttp3A2F2Fwwwdeltaioampesheet51973704ampnewsitemid20190424005344amplanenUSampanchorDeltaLakeampindex3ampmd5b37627541c23405132484d275b27d863 relnofollowDelta Lakea organizations no longer need to spend resources building complex and fragile data pipelines to move data across systems Instead developers can have hundreds of applications reliably upload and query data at scale p p With Delta Lake developers will be able to undertake local development and debugging on their laptops to quickly develop data pipelines They will be able to access earlier versions of their data for audits rollbacks or reproducing machine learning experiments They will also be able to convert their existing Parquet a commonly used data format to store large datasets files to Delta Lakes inplace thus avoiding the need for substantial reading and rewriting p p The Delta Lake project can be found at a referrerpolicyunsafeurl targetblank hrefhttpsctsbusinesswirecomctCTidsmartlinkampurlhttp3A2F2Fwwwdeltaioampesheet51973704ampnewsitemid20190424005344amplanenUSampanchordeltaioampindex4ampmd59eb276387ac406579aff0a647aeb7a53 relnofollowdeltaioa and is under the permissive Apache 20 license This technology is deployed in production by organizations such as Viacom Edmunds Riot Games and McGraw Hill p p Weve believed right from the onset that innovation happens in collaboration not isolation This belief led to the creation of the Spark project and MLflow Delta Lake will foster a thriving community of developers collaborating to improve data lake reliability and accelerate machine learning initiatives added Ghodsi p p For more information on Delta Lake follow a referrerpolicyunsafeurl targetblank hrefhttpsctsbusinesswirecomctCTidsmartlinkampurlhttps3A2F2Ftwittercom2FDeltaLakeOSSampesheet51973704ampnewsitemid20190424005344amplanenUSampanchor40DeltaLakeOSSampindex5ampmd5350544a6431f8ce6713a480438b57624 relnofollowDeltaLakeOSSa on Twitter p p bAbout Databricksb p p a referrerpolicyunsafeurl targetblank hrefhttpsctsbusinesswirecomctCTidsmartlinkampurlhttp3A2F2Fwwwdatabrickscomampesheet51973704ampnewsitemid20190424005344amplanenUSampanchorDatabricksampindex6ampmd59de1e44ed81ab72bdd815414bbab2785 relnofollowDatabricksa mission is to accelerate innovation for its customers by unifying Data Science Engineering and Business Founded by the original creators of Apache Spark Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products Users achieve faster timetovalue with Databricks by creating analytic workflows that go from ETL and interactive exploration to production The company also makes it easier for its users to focus on their data by providing a fully managed scalable and secure cloud infrastructure that reduces operational complexity and total cost of ownership Databricks has secured investments from Andreessen Horowitz Coatue Management Microsoft New Enterprise Associates NEA Battery Ventures Green Bay Ventures and Geodesic among others and has a global customer base that includes Viacom Shell and HP p p Apache Apache Spark and Spark are trademarks of the Apache Software Foundation p p p pimg referrerpolicyunsafeurl alt srchttpsctsbusinesswirecomctCTidbwnewsampsty20190424005344r1ampsidweb01ampdistronxamplangen stylewidth0height0span classbwct31415p

SOURCE: BUSINESS WIRE. ©2015 Business Wire

Comments

Registration Login
Registration Login
Registration Login
Registration