PlasmaENGINE® Haversine Benchmark

In our Haversine Benchmark test, Spark processed ~1.6 million rows per second, compared to Plasma Engine which processed ~371 million rows per second.

Here are the results for Plasma Engine:

*This is the same chart, but the flat yellow line you see at the bottom is Spark.

This chart shows Plasma Engine throughput at 6GB/s (that’s bytes, not bits):

 

Here are all of the charts together:

 

The benchmark results speak for themselves: Plasma Engine processed 371 million rows per second, while Apache Spark only processed 1.6 million rows per second.

That is more than two orders of magnitude faster on the same cloud instance.

What can this do for your business?

Processing data 230x faster not only allows your enterprise to increase its existing revenue streams and cut costs; more importantly, it creates opportunities for new lines of revenue previously deemed impossible or uneconomical.

Perhaps the best part for your businesses is that Plasma Engine’s high efficiency will reduce your total data processing costs more than 50%, while massively cutting your power and space requirements in the data center by over 90%.

Finally, a 90% reduction in power usage means a 90% reduction of carbon output into our atmosphere.No matter the industry — Telecom, Retail, IoT, eCommerce, Finance, AdTech, Security, Energy & Utilities — the ability to process data two orders of magnitude faster while cutting infrastructure costs has significant implications for every business.

Ready to go fast with Plasma Engine on your on-premise system or favorite cloud instance? It’s easy!

Sign up today for a Test Flight POC to try our point-and-click demo or test your own.

Technical Benchmark Parameters

The benchmark runs Plasma Engine v1.1.1 with call data record (CDR) CSV dataset with the following schema:

LAT1 & LON1 – LOCATION OF CALLERLAT2 & LON2 – LOCATION OF CALLER

VAL HAVERSINESCHEMA = NEW STRUCTTYPE()

.ADD(“LAT1”, “FLOAT”)

.ADD(“LON1”, “FLOAT”)

.ADD(“LAT2”, “FLOAT”)

.ADD(“LON2”, “FLOAT”)

Dataset contains single CSV file generated for predefined calls quantity. 1000 symlinks to this file emulates the big streaming CSV file dataset. Plasma Engine uses the same data with the Apache Arrow format. CDR example:

-0.683814,0.973132,1.361955,-0.277550

-0.616051,0.139032,1.250225,1.743503

0.855723,1.392644,1.498878,-1.725920

-0.456180,2.459872,-1.540976,-2.514564

-1.397619,-2.760236,-1.480904,-1.880444

0.879134,-0.135406,0.135917,0.348850

Spark SQL query filters calls where distance between the two is greater than 20000:

SELECT LAT1 FROM STREAM WHERE (ASIN(POW((SIN((LAT2-LAT1) / CAST(2 AS FLOAT)) * SIN((LAT2-LAT1) / CAST(2 AS FLOAT)) + COS((LAT1))*COS((LAT2)) * SIN((LON2-LON1) / CAST(2 AS FLOAT)) * SIN((LON2-LON1) / CAST(2 AS FLOAT))), CAST(0.5 AS FLOAT))) * CAST(12742 AS FLOAT)) >CAST(20000 AS FLOAT)

Hardware

– AWS p3.2xlarge is a NVIDIA Tesla V100 GPU, 8 vCPUs (Broadwell Xeon), 61 GB of RAM, 10 Gbps NIC

– 16 GB driver memory

– 8 executors

Write a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.