Project Setup
Configure an Existing Project
To add Emma to an existing project, add the emma-language
dependency
<!-- Core Emma API and compiler infrastructure -->
<dependency>
<groupId>org.emmalanguage</groupId>
<artifactId>emma-language</artifactId>
<version>0.2.3</version>
</dependency>
and either emma-flink
or emma-spark
depending on the desired execution backend.
<!-- Emma backend for Flink -->
<dependency>
<groupId>org.emmalanguage</groupId>
<artifactId>emma-flink</artifactId>
<version>0.2.3</version>
</dependency>
<!-- Emma backend for Spark -->
<dependency>
<groupId>org.emmalanguage</groupId>
<artifactId>emma-spark</artifactId>
<version>0.2.3</version>
</dependency>
Setup a New Project
To bootstrap a new project org.acme:emma-quickstart
from a Maven archetype, use the following command.
mvn archetype:generate -B \
-DartifactId=emma-quickstart \
-DgroupId=org.acme \
-Dversion=0.1-SNAPSHOT \
-Dpackage=org.acme.emma \
-DarchetypeArtifactId=emma-quickstart \
-DarchetypeGroupId=org.emmalanguage \
-DarchetypeVersion=0.2.3
Build the project with one of the following commands.
mvn package # without tests
mvn verify # with tests
HDFS Setup
If you are not familiar with Hadoop, check the “Getting started with Hadoop” guide.
To run the algorithms on a Flink or Spark cluster, copy the input files to HDFS.
Assuming a variable to bin/hdfs
export HDFS=/path/to/hadoop-2.x/bin/hdfs
export HDFS_ADDR="$HOSTNAME:9000"
you can run the following commands.
$HDFS dfs -mkdir -p /tmp/output
$HDFS dfs -mkdir -p /tmp/input
$HDFS dfs -copyFromLocal emma-quickstart-library/src/test/resources/* /tmp/input/.
Running the Examples on Flink
If you are not familiar with Flink, check the “Getting started with Flink” guide.
Assuming a variable to bin/flink
export FLINK=/path/to/flink-1.2.x/bin/flink
and a local filesystem path shared between all nodes in your Flink cluster
export CODEGEN=/tmp/emma/codegen
you can run the algorithms in your quickstart project with one of the following commands.
$FLINK run -C "file://$CODEGEN/" \
emma-quickstart-flink/target/emma-quickstart-flink-0.1-SNAPSHOT.jar \
word-count \
hdfs://$HDFS_ADDR/tmp/input/text/jabberwocky.txt \
hdfs://$HDFS_ADDR/tmp/output/wordcount-output.tsv \
--codegen "$CODEGEN"
$FLINK run -C "file://$CODEGEN/" \
emma-quickstart-flink/target/emma-quickstart-flink-0.1-SNAPSHOT.jar \
transitive-closure \
hdfs://$HDFS_ADDR/tmp/input/graphs/trans-closure/edges.tsv \
hdfs://$HDFS_ADDR/tmp/output/trans-closure-output.tsv \
--codegen "$CODEGEN"
$FLINK run -C "file://$CODEGEN/" \
emma-quickstart-flink/target/emma-quickstart-flink-0.1-SNAPSHOT.jar \
k-means 2 4 0.001 10 \
hdfs://$HDFS_ADDR/tmp/input/ml/clustering/kmeans/points.tsv \
hdfs://$HDFS_ADDR/tmp/output/kmeans-output.tsv \
--codegen "$CODEGEN"
Running the Examples on Spark
If you are not familiar with Spark, check the “Getting started with Spark” guide.
Assuming a variable to bin/spark-submit
export SPARK=/path/to/spark-2.1.x/bin/spark-submit
and a Spark master URL
export SPARK_ADDR="$HOSTNAME:7077"
you can run the algorithms in your quickstart project with one of the following commands.
$SPARK --master "spark://$SPARK_ADDR" \
emma-quickstart-spark/target/emma-quickstart-spark-0.1-SNAPSHOT.jar \
word-count \
hdfs://$HDFS_ADDR/tmp/input/text/jabberwocky.txt \
hdfs://$HDFS_ADDR/tmp/output/wordcount-output.tsv \
--master "spark://$SPARK_ADDR"
$SPARK --master "spark://$SPARK_ADDR" \
emma-quickstart-spark/target/emma-quickstart-spark-0.1-SNAPSHOT.jar \
transitive-closure \
hdfs://$HDFS_ADDR/tmp/input/graphs/trans-closure/edges.tsv \
hdfs://$HDFS_ADDR/tmp/output/trans-closure-output.tsv \
--master "spark://$SPARK_ADDR"
$SPARK --master "spark://$SPARK_ADDR" \
emma-quickstart-spark/target/emma-quickstart-spark-0.1-SNAPSHOT.jar \
k-means 2 4 0.001 10 \
hdfs://$HDFS_ADDR/tmp/input/ml/clustering/kmeans/points.tsv \
hdfs://$HDFS_ADDR/tmp/output/kmeans-output.tsv \
--master "spark://$SPARK_ADDR"