We are proud to announce that Apache Spark won the 2016 CloudSort Benchmark (both Daytona and Indy category). A joint team from Nanjing University, Alibaba Group, and Databricks Inc. entered the competition using NADSort, a distributed sorting program built on top of Spark, and set a new world record as the most cost-efficient way to sort 100TB of data.
Spark 2.0.2 released
We are happy to announce the availability of Apache Spark 2.0.2! This maintenance release includes fixes across several areas of Spark, as well as Kafka 0.10 and runtime metrics support for Structured Streaming.
Spark 1.6.3 released
We are happy to announce the availability of Spark 1.6.3! This maintenance release includes fixes across several areas of Spark.
Spark 2.0.1 released
We are happy to announce the availability of Apache Spark 2.0.1! Visit the release notes to read about the new features, or downloadthe release today.
Spark 2.0.0 released
We are happy to announce the availability of Spark 2.0.0! Visit the release notes to read about the new features, or download the release today.
Spark 1.6.2 released
We are happy to announce the availability of Spark 1.6.2! This maintenance release includes fixes across several areas of Spark.
Call for Presentations for Spark Summit EU is Open
Call for presentations is now open for Spark Summit EU! The event will take place on October 25-27 in Brussels. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, spark ecosystem and research. Please submit by July 1 to be considered.
Preview release of Spark 2.0
To enable wide-scale community testing of the upcoming Spark 2.0 release, the Apache Spark team has posted a preview release of Spark 2.0. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 2.0. If you would like to test the release, simply download it, and send feedback using either the mailing lists or JIRA.
Spark Summit (June 6, 2016, San Francisco) agenda posted
The agenda for Spark Summit 2016 is now available! The summit kicks off on June 6th with a full day of Spark training followed by over 90+ talks featuring speakers from Airbnb, Baidu, Bloomberg, Databricks, Duke, IBM, Microsoft, Netflix, Uber, UC Berkeley. Check out the full schedule and register to attend!
Spark 1.6.1 released
We are happy to announce the availability of Spark 1.6.1! This maintenance release includes fixes across several areas of Spark, including signficant updates to the experimental Dataset API.
Submission is open for Spark Summit San Francisco
Call for presentations is now open for Spark Summit San Francisco! The event will take place on June 6-8 in San Francisco. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, business value, spark ecosystem and research. Please submit by February 29th to be considered.
Spark Summit East (Feb 16, 2016, New York) agenda posted
The agenda for Spark Summit East is now posted, with 60 talks from organizations including Netflix, Comcast, Blackrock, Bloomberg and others. The 2nd annual Spark Summit East will run February 16-18th in NYC and feature a full program of speakers along with Spark training opportunities. More details are available on the Spark Summit East website, where you can also register to attend.
Spark 1.6.0 released
We are happy to announce the availability of Spark 1.6.0! Spark 1.6.0 is the seventh release on the API-compatible 1.X line. With this release the Spark community continues to grow, with contributions from 248 developers!
CFP for Spark Summit East 2016 is closing soon!
Call for presentations is closing soon for Spark Summit East! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research. Please submit by November 22nd to be considered.
Spark 1.5.2 released
We are happy to announce the availability of Spark 1.5.2! This maintenance release includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, R, Spark SQL, and MLlib.
Submission is open for Spark Summit East 2016
Abstract submissions are now open for the 2nd Spark Summit East! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research.
Spark 1.5.1 released
We are happy to announce the availability of Spark 1.5.1! This maintenance release includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, R, Spark SQL, and MLlib.
Spark 1.5.0 released
We are happy to announce the availability of Spark 1.5.0! Spark 1.5.0 is the sixth release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 230 developers and more than 1,400 commits!
Spark Summit Europe agenda posted
The agenda for Spark Summit Europe is now posted, with 38 talks from organizations including Barclays, Netflix, Elsevier, Intel and others. This inaugural Spark conference in Europe will run October 27th-29th 2015 in Amsterdam and feature a full program of speakers along with Spark training opportunities. More details are available on the Spark Summit Europe website, where you can alsoregister to attend.
Spark 1.4.1 released
We are happy to announce the availability of Spark 1.4.1! This is a maintenance release that includes contributions from 85 developers. Spark 1.4.1 includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, Spark SQL, and MLlib.
Spark Summit 2015 Videos Posted
The videos and slides for Spark Summit 2015 are now all available online! The talks include technical roadmap discussions, deep dives on Spark components, and use cases built on top of Spark.
Spark 1.4.0 released
We are happy to announce the availability of Spark 1.4.0! Spark 1.4.0 is the fifth release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 210 developers and more than 1,000 commits!
Announcing Spark Summit Europe
Abstract submissions are now open for the first ever Spark Summit Europe. The event will take place on October 27th to 29th in Amsterdam. Submissions are welcome across a variety of Spark related topics, including use cases and ongoing development.
One month to Spark Summit 2015 in San Francisco
There is one month left until Spark Summit 2015, which will be held in San Francisco on June 15th to 17th. The Summit will containpresentations from over 50 organizations using Spark, focused on use cases and ongoing development.
Spark Summit East 2015 Videos Posted
The videos and slides for Spark Summit East 2015 are now all available online. Watch them to get the latest news from the Spark community as well as use cases and applications built on top.
Spark 1.2.2 and 1.3.1 released
We are happy to announce the availability of Spark 1.2.2 and Spark 1.3.1! These are both maintenance releases that collectively feature the work of more than 90 developers.
Spark 1.3.0 released
We are happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 174 developers and more than 1,000 commits!
Spark 1.2.1 released
We are happy to announce the availability of Spark 1.2.1! This is a maintenance release that includes contributions from 69 developers. Spark 1.2.1 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, SQL, GraphX, and MLlib.
Spark Summit East agenda posted, CFP open for West
The agenda for Spark Summit East is now posted, with 38 talks from organizations including Goldman Sachs, Baidu, Salesforce, Novartis, Cisco and others. This inaugural Spark conference on the US East Coast will run March 18th-19th 2015 in New York City. More details are available on the Spark Summit East website, where you can also register to attend.
Spark 1.2.0 released
We are happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 172 developers and more than 1,000 commits!
Spark 1.1.1 released
We are happy to announce the availability of Spark 1.1.1! This is a maintenance release that includes contributions from 55 developers. Spark 1.1.1 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, SQL, GraphX, and MLlib.
Registration open for Spark Summit East 2015
Registration is now open for Spark Summit East 2015, to be held on March 18th and 19th in New York City. The conference will be a great chance to meet people from throughout the Spark community as well as attend training workshops on Spark. If you haven’t been to previous Spark Summits, you can find content from previous events on the Spark Summit website.
Spark wins Daytona Gray Sort 100TB Benchmark
We are proud to announce that Spark won the 2014 Gray Sort Benchmark (Daytona 100TB category). A team from Databricksincluding Spark committers, Reynold Xin, Xiangrui Meng, and Matei Zaharia, entered the benchmark using Spark. Spark won a tie with the Themis team from UCSD, and jointly set a new world record in sorting.
Submissions open for Spark Summit East 2015 in New York
After successful events in the past two years, the Spark Summit conference has expanded for 2015, offering both an event in New York on March 18-19 and one in San Francisco on June 15-17. The conference is a great chance to meet people from throughout the Spark community and see the latest news, tips and use cases.
Spark 1.1.0 released
We are happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 171 developers!
Spark 1.0.2 released
We are happy to announce the availability of Spark 1.0.2! This release includes contributions from 30 developers. Spark 1.0.2 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, and MLlib.
Spark 0.9.2 released
We are happy to announce the availability of Spark 0.9.2! Apache Spark 0.9.2 is a maintenance release with bug fixes. We recommend all 0.9.x users to upgrade to this stable release. Contributions to this release came from 28 developers.
Spark Summit 2014 videos posted
The videos and slides for Spark Summit 2014 are now all available online. Watch them to see the latest news from the Spark community as well as use cases and applications built on top. In addition, training materials from the Summit, including hands-on exercises, are all available freely as well.
Spark 1.0.1 released
We are happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark’s (alpha) SQL library, including support for JSON data and performance and stability fixes.
Two weeks to Spark Summit 2014
There are now two weeks left to Spark Summit 2014, which will be held in San Francisco on June 30th to July 2nd. The Summit will contain presentations from over 50 organizations using Spark, focused on use cases and ongoing development.
Spark 1.0.0 released
We are happy to announce the availability of Spark 1.0.0! Spark 1.0.0 is the first in the 1.X line of releases, providing API stability for Spark’s core interfaces. It is Spark’s largest release ever, with contributions from 117 developers. This release expands Spark’s standard libraries, introducing a new SQL package (Spark SQL) that lets users integrate SQL queries into existing Spark workflows. MLlib, Spark’s machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark’s core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements.
Spark Summit agenda posted
The agenda for the Spark Summit 2014 conference is now available online. With talks from more than 50 organizations, it will be the biggest Spark event yet, bringing the developer and user communities together. Join us in person or tune in online to learn about the latest happenings in Spark.
Spark 0.9.1 released
We are happy to announce the availability of Spark 0.9.1! Apache Spark 0.9.1 is a maintenance release with bug fixes, performance improvements, better stability with YARN and improved parity of the Scala and Python API. We recommend all 0.9.0 users to upgrade to this stable release. Contributions to this release came from 37 developers.
Submissions and registration open for Spark Summit 2014
After last year’s successful first Spark Summit, registrations and talk submissions are now open for Spark Summit 2014. This will be a 3-day event in San Francisco organized by multiple companies in the Spark community. The event will run June 30th to July 2nd in San Francisco, CA.
Spark becomes top-level Apache project
The Apache Software Foundation announced today that Spark has graduated from the Apache Incubator to become a top-level Apache project, signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles. This is a major step for the community and we are very proud to share this news with users as we complete Spark’s move to Apache. Read more about Spark’s growth during the past year and from contributors and users in the ASF’s press release.
Spark 0.9.0 released
We are happy to announce the availability of Spark 0.9.0! Spark 0.9.0 is a major release and Spark’s largest release ever, with contributions from 83 developers. This release expands Spark’s standard libraries, introducing a new graph computation package (GraphX) and adding several new features to the machine learning and stream-processing packages. It also makes major improvements to the core engine, including external aggregations, a simplified H/A mode for long lived applications, and hardened YARN support.
Spark 0.8.1 released
We’ve just posted Spark Release 0.8.1, a maintenance and performance release for the Scala 2.9 version of Spark. 0.8.1 includes support for YARN 2.2, a high availability mode for the standalone scheduler, optimizations to the shuffle, and many other improvements. We recommend that all users update to this release. Visit the release notes to read about the new features, ordownload the release today.
Spark Summit 2013 is a Wrap
The Spark Summit 2013, held in early December 2013 in downtown San Francisco, was a success! Over 450 Spark developers and enthusiasts from 13 countries and more than 180 companies came to learn from project leaders and production users of Spark, Shark, Spark Streaming and related projects about use cases, recent developments, and the Spark community roadmap.
Announcing the first Spark Summit: December 2, 2013
We are excited to announce the first Spark Summit on Dec 2, 2013 in Downtown San Francisco. Come hear from key production users of Spark, Shark, Spark Streaming and related projects. Also find out where the development is going, and learn how to use the Spark stack in a variety of applications. The summit is being organized and sponsored by leading organizations in the Spark community.
Spark 0.8.0 released
We’re proud to announce the release of Apache Spark 0.8.0. Spark 0.8.0 is a major release that includes many new capabilities and usability improvements. It’s also our first release under the Apache incubator. It is the largest Spark release yet, with contributions from 67 developers and 24 companies. Major new features include an expanded monitoring framework and UI, a machine learning library, and support for running Spark inside of YARN.
Spark user survey and "Powered By" page
As we continue developing Spark, we would love to get feedback from users and hear what you’d like us to work on next. We’ve decided that a good way to do that is a survey – we hope to run this at regular intervals. If you have a few minutes to participate, fill in the survey here. Your time is greatly appreciated.
Fourth Spark screencast released
We have released the next screencast, A Standalone Job in Scala that takes you beyond the Spark shell, helping you write your first standalone Spark job.
Registration open for AMP Camp training camp in Berkeley
Want to learn how to use Spark, Shark, GraphX, and related technologies in person? The AMP Lab is hosting a two-day training workshop for them on August 29th and 30th in Berkeley. The workshop will include tutorials, talks from users, and over four hours of hands-on exercises. Registration is now open on the AMP Camp website, for a price of $250 per person. We recommend signing up early because last year’s workshop was sold out.
Spark mailing lists moving to Apache
As part of the Spark project's recent move to Apache, we are planning to migrate the mailing lists to Apache infrastructure this month, so that the existing Google groups will become read-only on September 1, 2013. To keep receiving updates about Spark or to participate in development discussions, please subscribe to the following lists:
- user@spark.incubator.apache.org -- for usage questions, help, and announcements. (subscribe) (archives)
- dev@spark.incubator.apache.org -- for people who want to contribute code to Spark. (subscribe) (archives)
Most users will probably want the User list, but individuals interested in contributing code to the project should also subscribe to the Dev list.
Spark 0.7.3 released
We’ve just posted Spark Release 0.7.3, a maintenance release that contains several fixes, including streaming API updates and new functionality for adding JARs to a spark-shell
session. We recommend that all users update to this release. Visit the release notesto read about the new features, or download the release today.
Spark featured in Wired
Spark, its creators at the AMP Lab, and some of its users were featured in a Wired Enterprise article a few days ago. Read on to learn a little about how Spark is being used in industry.
Spark accepted into Apache Incubator
Spark was recently accepted into the Apache Incubator, which will serve as the long-term home for the project. While moving the source code and issue tracking to Apache will take some time, we are excited to be joining the community at Apache. Stay tuned on this site for updates on how the project hosting will change.
Spark 0.7.2 released
We’re happy to announce the release of Spark 0.7.2, a new maintenance release that includes several bug fixes and improvements, as well as new code examples and API features. We recommend that all users update to this release. Head over to the release notesto read about the new features, or download the release today.
Spark screencasts published
We have released the first two screencasts in a series of short hands-on video training courses we will be publishing to help new users get up and running with Spark in minutes.
Strata exercises now available online
At this year’s Strata conference, the AMP Lab hosted a full day of tutorials on Spark, Shark, and Spark Streaming, including online exercises on Amazon EC2. Those exercises are now available online, letting you learn Spark and Shark at your own pace on an EC2 cluster with real data. They are a great resource for learning the systems. You can also find slides from the Strata tutorials online, as well as videos from the AMP Camp workshop we held at Berkeley in August.
Spark 0.7.0 released
We’re proud to announce the release of Spark 0.7.0, a new major version of Spark that adds several key features, including a Python API for Spark and an alpha of Spark Streaming. This release is the result of the largest group of contributors yet behind a Spark release – 31 contributors from inside and outside Berkeley. Head over to the release notes to read more about the new features, ordownload the release today.
Spark/Shark Tutorial for Amazon EMR
This weekend, Amazon posted an article and code that make it easy to launch Spark and Shark on Elastic MapReduce. The article includes examples of how to run both interactive Scala commands and SQL queries from Shark on data in S3. Head over to theAmazon article for details. We’re very excited because, to our knowledge, this makes Spark the first non-Hadoop engine that you can launch with EMR.
Spark 0.6.2 released
We recently released Spark 0.6.2, a new version of Spark. This is a maintenance release that includes several bug fixes and usability improvements (see the release notes). We recommend that all users upgrade to this release.
Spark tips from Quantifind
Quantifind, one of the Bay Area companies that has been using Spark for predictive analytics, recently posted two useful entries on working with Spark in their tech blog:
Thanks for sharing this, and looking forward to see others!
Video up from first Spark development meetup
On December 18th, we held the first of a series of Spark development meetups, for people interested in learning the Spark codebase and contributing to the project. There was quite a bit more demand than we anticipated, with over 80 people signing up and 64 attending. The first meetup was an introduction to Spark internals. Thanks to one of the attendees, there’s now a video of the meetupon YouTube. We’ve also posted the slides. Look to see more development meetups on Spark and Shark in the future.
Spark in the news
Recently, we’ve seen quite a bit of coverage of Spark in the news. I wanted to list some of the more recent articles, for readers interested in learning more.
- Curt Monash, editor of the popular DBMS2 blog, wrote a great introduction to Spark and Shark, as well as a more detailedtechnical overview.
- Silicon Angle covered Spark and Shark after our presentation at Amazon re:Invent.
- Datanami highlighted Shark in its survey of big data research projects.
- O'Reilly's Strata blog recently covered Spark, Shark, and the Spark 0.6 release.
- DataInformed interviewed two Spark users and wrote about their applications in anomaly detection, predictive analytics and data mining.
In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.
Spark 0.6.1 and 0.5.2 out
Today we’ve made available two maintenance releases for Spark: 0.6.1 and 0.5.2. They both contain important bug fixes as well as some new features, such as the ability to build against Hadoop 2 distributions. We recommend that users update to the latest version for their branch; for new users, we recommend 0.6.1.
Spark version 0.6.0 released
Spark version 0.6.0 was released today, a major release that brings a wide range of performance improvements and new features, including a simpler standalone deploy mode and a Java API. Read more about it in the release notes.
Spark wins Best Paper Award at USENIX NSDI
Our paper on Spark won the Best Paper Award at the USENIX NSDI conference. You can see a video of the talk, as well as slides, online on the NSDI website.
We've started hosting a Bay Area Spark User Meetup
We’ve started hosting a regular Bay Area Spark User Meetup. Sign up on the meetup.com page to be notified about events and meet other Spark developers and users.