Introducing “CPU-mode”

NO GPU? NO PROBLEM.

PlasmaENGINE® created a new standard for efficiency in data processing by performing a majority of its operations on the GPU. But what if you don’t have GPUs? What if you just want to test PlasmaENGINE® without changing your infrastructure? In PlasmaENGINE® 1.7.1, CPU-mode allows you to do just that. 

CPU-mode uses the same advanced vectorized engine that compiles your Apache Spark jobs to native code. The engine is capable of targeting multiple architectures. Previous versions supported only NVIDIA GPUs, but CPU mode adds support for x86_64. We plan to support AMD GPUs, upcoming Intel GPUs, and Xilinx FPGAs in future versions. 

PlasmaENGINE® CPU mode allows you to run your existing pipeline with no code changes and no infrastructure changes 2-4 times faster than Apache Spark. Once you try it and see the results, it’s an easy upgrade to even faster processing with NVIDIA GPUs.

Future versions will increase efficiency and automatically offload work to the CPU when the GPU queue is too long, allowing you to take full advantage of all of your system resources and maximize performance.

DEPLOYMENT

Using PlasmaENGINE® CPU-mode is as simple as using Apache Spark. 

We provide you with a Docker image that has everything you need. The image allows you to use standard Spark tools like spark-submit, spark-shell, and start-master.sh. Every Spark pipeline you run inside the container will use PlasmaENGINE® to run each job more efficiently and 2-4 times faster.

Other preferred deployment options include EMR and Qubole. If you have another deployment option in mind, please contact us (support@fastdata.io).

BENCHMARK

We used the haversine benchmark test to test CPU mode’s speed. This benchmark helps us demonstrate the processing power potential of PlasmaENGINE® by crunching through countless rows of latitude/longitude pairs and calculating their distance.

For this test we used AWS (c5.12xlarge instance, 48 vCPUs and 96GB RAM). There are no GPUs attached, which means both Apache Spark and PlasmaENGINE® ran solely on the CPU. Both Apache Spark and PlasmaENGINE® used the exact same code with the exact same settings. For both we used 48 cores (–local master[*]) and 64GB driver memory.

FIGURE 1

Apache Spark was able to process 14.1 million rows per second.

FIGURE 2 

Without any changes to the code or settings, PlasmaENGINE® processed 50.4 million rows per second on the same dataset. A 3.5x speed increase compared to Apache Spark with no code changes necessary. By processing the data in a vectorized fashion and using native code, CPU Mode uses the same amount of resources to significantly enhance performance. 

FIGURE 3

1.93GB of CSV data processed per second.

If you’re currently relying on Apache Spark for data processing and looking for a faster, more efficient solution, PlasmaENGINE® CPU Mode can be installed in minutes with zero code changes. 

Is PlasmaENGINE® the game-changing cost-savings tool you’ve been looking for? 

Email sales@fastdata.io for a demo and we’ll prove the power of PlasmaENGINE® with a free POC.

Write a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.