Data + AI Summit Europe 2020 原 Spark + AI Summit Europe 于2020年11月17日至19日举行。由于新冠疫情影响,本次会议和六月份举办的会议一样在线举办,一共为期三天,第一天是培训,第二天和第三天是正式会议。会议涵盖来自从业者的技术内容,他们将使用 Apache Spark™、Delta Lake、MLflow、Structured Streaming、BI和SQL分析、深度学习和机器学习框架来解决棘手的数据问题。会议的全部日程请参见:https://databricks.com/dataaisummit/europe-2020/agenda。
和今年六月份会议不一样,这次会议的 KeyNote 没什么劲爆的消息,不过会议的第二天和第三天还是有些干货大家可以看下的。在接下来的几天,本公众号也会对一些比较有意思的议题进行介绍,敬请关注本公众号。
本次会议的议题范围具体如下:
•人工智能用户案例以及新的机会;•Apache Spark™, Delta Lake, MLflow 等最佳实践和用户案例;•数据工程,包括流架构•使用数据仓库(data warehouse)和数据湖(data lakes)进行 SQL 分析和 BI;•数据科学,包括 Python 生态系统;•机器学习和深度学习应用•生产机器学习(MLOps)•大规模数据分析和ML研究•工业界的用户案例
下载途径
关注微信公众号 过往记忆大数据 或者 Java与大数据架构 并回复 spark-9902 获取。
可下载的PPT
下面议题提供 PPT 下载,共129个。注意,访问 https://www.iteblog.com/archives/9902.html 页面可以在线观看全部 PPT。
•3D: DBT using Databricks and Delta•Accelerated Training of Transformer Models•Achieving Lakehouse Models with Spark 3.0•Acoustics & AI for Conservation•Active Governance Across the Delta Lake with Alation•Add Historical Analysis of Operational Data with Easy Configurations in Fivetran Automated Data Integration•Advanced Natural Language Processing with Apache Spark NLP•Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline•Apache Spark Streaming in K8s with ArgoCD & Spark Operator•Apply MLOps at Scale•Arbitrary Stateful Aggregation and MERGE INTO•Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360•Building a Cross Cloud Data Protection Engine•Building a Distributed Collaborative Data Pipeline with Apache Spark•Building a MLOps Platform Around MLflow to Enable Model Productionalization in Just a Few Minutes•Building a Real-Time Supply Chain View: How Gousto Merges Incoming Streams of Inventory - - Data at Scale to Track Ingredients Throughout its Supply Chain•Building a SIMD Supported Vectorized Native Engine for Spark SQL•Building a Streaming Data Pipeline for Trains Delays Processing•Building a Streaming Microservices Architecture•Building an ML Tool to predict Article Quality Scores using Delta & MLFlow•Building Identity Graph at Scale for Programmatic Media Buying Using Apache Spark and Delta Lake•Building Notebook-based AI Pipelines with Elyra and Kubeflow•Building the Next-gen Digital Meter Platform for Fluvius•CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks•Cloud-native Semantic Layer on Data Lake•Common Strategies for Improving Performance on Your Delta Lakehouse•Comprehensive View on Date-time APIs of Apache Spark 3.0•Containerized Stream Engine to Build Modern Delta Lake•Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King•Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS SageMaker for Enterprise AI Scenarios•Cost Efficiency Strategies for Managed Apache Spark Service•Data Engineers in Uncertain Times: A COVID-19 Case Study•Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake•Data Privacy with Apache Spark: Defensive and Offensive Approaches•Data Time Travel by Delta Time Machine•Data Time Travel by Delta Time Machine•Data Versioning and Reproducible ML with DVC and MLflow•Databricks University Alliance Meetup - Data + AI Summit EU 2020•Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Customers•Delta: Building Merge on Read•Delta Lake: Optimizing Merge•Designing and Implementing a Real-time Data Lake with Dynamically Changing Schema•Detecting and Recognising Highly Arbitrary Shaped Texts from Product Images•Deterministic Machine Learning with MLflow and mlf-core•Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runtastic•Digital Turbine Adopts A Lakehouse to Scale to Their Analytics Needs•Distributed and Scalable Model Lifecycle Capabilities•Diving into Delta Lake: Unpacking the Transaction Log•eBay’s Work on Dynamic Partition Pruning & Runtime Filter•Efficient Query Processing Using Machine Learning•Embedding Insight through Prediction Driven Logistics•End to End Supply Chain Control Tower•Extending Apache Spark – Beyond Spark Session Extensions•Foundations of Data Teams•Frequently Bought Together Recommendations Based on Embeddings•From Query Plan to Query Performance: Supercharging your Apache Spark Queries using the Spark UI SQL Tab•From Zero to Hero with Kafka Connect•Generalized Pipeline Parallelism for DNN Training•Getting Started with Apache Spark on Kubernetes•Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads•How a Media Data Platform Drives Real-time Insights & Analytics using Apache Spark•How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost•Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and Parquet Reader•Introducing MLflow for End-to-End Machine Learning on Databricks•Koalas: Interoperability Between Koalas and Apache Spark•Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale•Livestream Economy: The Application of Real-time Media and Algorithmic Personalisation in Urbanism•Materialized Column: An Efficient Way to Optimize Queries on Nested Columns•MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestration of Machine Learning Pipelines•MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams•Migrate and Modernize Hadoop-Based Security Policies for Databricks•Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way•ML Production Pipelines: A Classification Model•ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed Feedback Environment•MLflow at Company Scale•MLOps Using MLflow•Model Experiments Tracking and Registration using MLflow on Databricks•Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Quality Check on Delta Lake•Moving to Databricks & Delta•NLP Text Recommendation System Journey to Automated Training•Operating and Supporting Delta Lake in Production•Optimising Geospatial Queries with Dynamic File Pruning•Optimizing Apache Spark UDFs•Our Journey to Release a Patient-Centric AI App to Reduce Public Health Costs•Parallel Ablation Studies for Machine Learning with Maggy on Apache Spark•Personalization Journey: From Single Node to Cloud Streaming•Photon Technical Deep Dive: How to Think Vectorized•Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark•Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch and More!)•Productionizing Real-time Serving With MLflow•Project Zen: Improving Apache Spark for Python Users•Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Problematic Queries•Ray and Its Growing Ecosystem•Real-time Feature Engineering with Apache Spark Streaming and Hof•Real-Time Health Score Application using Apache Spark on Kubernetes•Reproducible AI Using PyTorch and MLflow•Reproducible AI Using PyTorch and MLflow•Revealing the Power of Legacy Machine Data•Scale and Optimize Data Engineering Pipelines with Software Engineering Best Practices: Modularity and Automated Testing•Scale-Out Using Spark in Serverless Herd Mode!•Scaling Machine Learning Feature Engineering in Apache Spark at Facebook•Scaling Machine Learning with Apache Spark•Seamless MLOps with Seldon and MLflow•SHAP & Game Theory For Recommendation Systems•Simplifying AI integration on Apache Spark•Skew Mitigation For Facebook PetabyteScale Joins•Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metadata Platform•Spark NLP: State of the Art Natural Language Processing at Scale•Spark SQL Beyond Official Documentation•Spark SQL Join Improvement at Facebook•Speeding Time to Insight with a Modern ELT Approach•Stateful Streaming with Apache Spark: How to Update Decision Logic at Runtime•Stories from the Financial Service AI Trenches: Lessons Learned from Building AI Models in EY•Streaming Inference with Apache Beam and TFX•TeraCache: Efficient Caching Over Fast Storage Devices•The Beauty of (Big) Data Privacy Engineering•The Hidden Value of Hadoop Migration•The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analytics Engineer•The Pill for Your Migration Hell•Transforming GE Healthcare with Data Platform Strategy•Trust, Context and, Regulation: Achieving More Explainable AI in Financial Services•Unlocking Geospatial Analytics Use Cases with CARTO and Databricks•Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update/Delete SQL Operation•Using Machine Learning at Scale: A Gaming Industry Experience!•Using Machine Learning at Scale: A Gaming Industry Experience!•Using NLP to Explore Entity Relationships in COVID-19 Literature•Using Redash for SQL Analytics on Databricks•What is New with Apache Spark Performance Monitoring in Spark 3.0•X-RAIS: The Third Eye