Emma is a quotation-based Scala DSL for scalable data analysis.
Emma supports state-of-the-art dataflow engines such as Apache Flink and Apache Spark.
DSLs for scalable data analysis are embedded through types. In contrast, Emma is based on quotations (similar to Quill). This approach has several benefits which directly affect developer productivity.
First, it allows to reuse Scala-native, declarative constructs in the DSL.
Quoted Scala syntax such as
are thereby lifted to an intermediate representation called Emma Core.
Second, it allows to analyze and optimize Emma Core terms holistically.
Subterms of type
DataBag[A] are thereby transformed and off-loaded to a parallel dataflow engine such as Apache Flink or Apache Spark.
For a discussion of the benefits of Emma vs Flink and Spark APIs, check the Meet Emma presentation and the emma-tutorial.
For a brief introduction to the core API and its most distinctive features, check the Programming Guide.
For instructions on setting up an Emma-based project, check the Project Setup.
To learn about Emma internals, check the Emma Wiki.
If you discuss this project in a research publication, please cite our SIGMOD 2015 paper “Implicit Parallelism through Deep Language Embedding”.