Emma

Project Setup

Configure an Existing Project

To add Emma to an existing project, add the emma-language dependency

<!-- Core Emma API and compiler infrastructure -->
<dependency>
    <groupId>org.emmalanguage</groupId>
    <artifactId>emma-language</artifactId>
    <version>0.2.3</version>
</dependency>

and either emma-flink or emma-spark depending on the desired execution backend.

<!-- Emma backend for Flink -->
<dependency>
    <groupId>org.emmalanguage</groupId>
    <artifactId>emma-flink</artifactId>
    <version>0.2.3</version>
</dependency>
<!-- Emma backend for Spark -->
<dependency>
    <groupId>org.emmalanguage</groupId>
    <artifactId>emma-spark</artifactId>
    <version>0.2.3</version>
</dependency>

Setup a New Project

To bootstrap a new project org.acme:emma-quickstart from a Maven archetype, use the following command.

mvn archetype:generate -B                  \
    -DartifactId=emma-quickstart           \
    -DgroupId=org.acme                     \
    -Dversion=0.1-SNAPSHOT                 \
    -Dpackage=org.acme.emma                \
    -DarchetypeArtifactId=emma-quickstart  \
    -DarchetypeGroupId=org.emmalanguage    \
    -DarchetypeVersion=0.2.3

Build the project with one of the following commands.

mvn package # without tests
mvn verify  # with tests

HDFS Setup

If you are not familiar with Hadoop, check the “Getting started with Hadoop” guide.

To run the algorithms on a Flink or Spark cluster, copy the input files to HDFS.

Assuming a variable to bin/hdfs

export HDFS=/path/to/hadoop-2.x/bin/hdfs
export HDFS_ADDR="$HOSTNAME:9000"

you can run the following commands.

$HDFS dfs -mkdir -p /tmp/output
$HDFS dfs -mkdir -p /tmp/input
$HDFS dfs -copyFromLocal emma-quickstart-library/src/test/resources/* /tmp/input/.

If you are not familiar with Flink, check the “Getting started with Flink” guide.

Assuming a variable to bin/flink

export FLINK=/path/to/flink-1.2.x/bin/flink

and a local filesystem path shared between all nodes in your Flink cluster

export CODEGEN=/tmp/emma/codegen

you can run the algorithms in your quickstart project with one of the following commands.

Running the Examples on Spark

If you are not familiar with Spark, check the “Getting started with Spark” guide.

Assuming a variable to bin/spark-submit

export SPARK=/path/to/spark-2.1.x/bin/spark-submit

and a Spark master URL

export SPARK_ADDR="$HOSTNAME:7077"

you can run the algorithms in your quickstart project with one of the following commands.

$SPARK --master "spark://$SPARK_ADDR" \
  emma-quickstart-spark/target/emma-quickstart-spark-0.1-SNAPSHOT.jar \
  word-count \
  hdfs://$HDFS_ADDR/tmp/input/text/jabberwocky.txt \
  hdfs://$HDFS_ADDR/tmp/output/wordcount-output.tsv \
  --master "spark://$SPARK_ADDR"
$SPARK --master "spark://$SPARK_ADDR" \
  emma-quickstart-spark/target/emma-quickstart-spark-0.1-SNAPSHOT.jar \
  transitive-closure \
  hdfs://$HDFS_ADDR/tmp/input/graphs/trans-closure/edges.tsv \
  hdfs://$HDFS_ADDR/tmp/output/trans-closure-output.tsv \
  --master "spark://$SPARK_ADDR"
$SPARK --master "spark://$SPARK_ADDR" \
  emma-quickstart-spark/target/emma-quickstart-spark-0.1-SNAPSHOT.jar \
  k-means 2 4 0.001 10 \
  hdfs://$HDFS_ADDR/tmp/input/ml/clustering/kmeans/points.tsv \
  hdfs://$HDFS_ADDR/tmp/output/kmeans-output.tsv \
  --master "spark://$SPARK_ADDR"