eBay / parallec

Fast Parallel Async HTTP/SSH/TCP/UDP/Ping Client Java Library. Aggregate 100,000 APIs & send anywhere in 20 lines of code. Ping/HTTP Calls 8000 servers in 12 seconds. (Akka) www.parallec.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parallec-logo

build status Build Status Coverage Status Apache V2.0 License

latest 0.10.x latest beta  maven central Gitter

Javadoc Documentation Samples Chinese

[ Get-Started | Features | Use Cases | Samples | Change Log / What's New / Versions | User Group | Motivation | Demos | Performance | Compare | Contributors | About | News | Plugin | 中文介绍 ]

[ API Overview | Generate & Submit Task | Track Status & Examine Responses | Configurations ]

Tweeted by the Creator of Akka & Featured in [ This Week in #Scala | OSChina - 2015 Top 100 ]

Overview

Parallec is a fast parallel async HTTP(S)/SSH/TCP/UDP/Ping client java library based on Akka. Scalably aggregate and handle API responses anyway and send it anywhere by writing 20 lines of code. A super convenient response context let you pass in/out any object when handling the responses. Now you can conduct scalable API calls, then effortlessly pass aggregated data anywhere to elastic search, kafka, MongoDB, graphite, memcached, etc. Flexible task level concurrency control without creating a 1,000 threads thread pool. Parallec means Parallel Client (pronounced as "Para-like"). Visit www.parallec.io

Watch Demo: 8,000 web server HTTP response aggregation to memory in 12 seconds / to ElasticSearch in 16 seconds.

Aggregated error messages - Debug friendly with full visibility: Having trouble debugging in concurrent environment? Not any more! All exceptions, timeout, stack traces, request sent and response received time are captured and aggregated in the response map. It is available in ParallelTask for polling right after you execute a task asynchronously. Multi-level (worker/manager) timeout guarantees tasks return even for 100,000s of requests.

Production Use Cases: widely used in infrastructure software as the polling and aggregation engine

  1. Application Deployment / PaaS: Parallec has been integrated in eBay main production application deployment system (PaaS). Parallec orchestrates 10+ API tasks, with each task targeting 10s to 1,000s servers over 1,000+ application pools in production. Parallec has been used with work flow engine Winder to handle work flows more complex but similar to this one.
  2. Data Extraction / ETL: Parallec has been used by eBay Israel's web intelligence team for executing 10k-100k API parallel calls to a single 3rd party server with dramatic improved performance and reduced resources.
  3. Network Troubleshooting via Probing: In eBay's network / cloud team, Parallec is instrumental to ensure an extremely low false alert rates to accurately detect switch soft failures. Parallec serves as the core polling engine in the master component to check agent healths and mark down agents to effectively and timely eliminate noises.
  4. Agent Management / Agent Master: In eBay's site operation / tools team, Parallec serves as the core engine to manage and monitor a puppet agent/salt minion/kubernetes kubelet like agent on 100,000+ production servers to ensure scalable operations.

Workflow Overview

Get Started

Donwload the latest JAR or grab from Maven:

<dependency>
	<groupId>io.parallec</groupId>
	<artifactId>parallec-core</artifactId>
	<version>0.10.6</version>
</dependency>

Snapshots of the development version are available in Sonatype's snapshots repository.

or Gradle:

compile 'io.parallec:parallec-core:0.10.6'

6 Line Example

In the example below, simply changing prepareHttpGet() to prepareSsh(), prepareTcp(), prepareUdp(), preparePing() enables you to conduct parallel SSH/TCP/Ping. Details please refer to the Java Doc and Example Code.

import io.parallec.core.*;
import java.util.Map;

ParallelClient pc = new ParallelClient(); 
pc.prepareHttpGet("").setTargetHostsFromString("www.google.com www.ebay.com www.yahoo.com")
.execute(new ParallecResponseHandler() {
    public void onCompleted(ResponseOnSingleTask res,
        Map<String, Object> responseContext) {
        System.out.println( res.toString() );  }
});

20 Line Example

Now that you have learned the basics, check out how easy to pass an elastic search client using the convenient response context to aggregate data anywhere you like. You can also pass a hash map to the responseContext, save the processed results to the map during onCompleted, and use the map outside for further work.

...
import org.elasticsearch.client.Client;
import static org.elasticsearch.node.NodeBuilder.*;

ParallelClient pc = new ParallelClient();
org.elasticsearch.node.Node node = nodeBuilder().node(); //elastic client initialize
HashMap<String, Object> responseContext = new HashMap<String, Object>();
responseContext.put("Client", node.client());
pc.prepareHttpGet("")
        .setConcurrency(1000).setResponseContext(responseContext)
        .setTargetHostsFromLineByLineText("http://www.parallec.io/userdata/sample_target_hosts_top100_old.txt", HostsSourceType.URL)
        .execute( new ParallecResponseHandler() {
            public void onCompleted(ResponseOnSingleTask res,
                    Map<String, Object> responseContext) {
                Map<String, Object> metricMap = new HashMap<String, Object>();
                metricMap.put("StatusCode", res.getStatusCode().replaceAll(" ", "_"));
                metricMap.put("LastUpdated",PcDateUtils.getNowDateTimeStrStandard());
                metricMap.put("NodeGroupType", "Web100");
                Client client = (Client) responseContext.get("Client");
                client.prepareIndex("local", "parallec", res.getHost()).setSource(metricMap).execute();
            }
        });
node.close(); pc.releaseExternalResources();

Different Requests to the Same Target

Now see how easy to use the request template to send multiple different requests to the same target. Variable replacement is allowed in post body, url and headers. Read more..

pc.prepareHttpGet("/userdata/sample_weather_$ZIP.txt")
    .setReplaceVarMapToSingleTargetSingleVar("ZIP",
        Arrays.asList("95037","48824"), "www.parallec.io")
    .execute(new ParallecResponseHandler() {...}...

What's New

  • 06/2017 Add dynamic response encoding according to response content type.
  • 09/2016 Add option to save response headers in HTTP #24.
  • 08/2016 Support Parallel async UDP (via Netty) #41.
  • 07/2016 Support replacing different ports in different requests.
  • 06/2016 Parallel SSH add run sudo with password for commands.

More details please check the Change Log.

Versions

  • The latest production-ready version is 0.10.x, where we use in production.
  • On async-http-client 2.x The Parallec.io version using more up-to-date async-http-client (currently using AHC version 2.0.15) is 0.20.0-SNAPSHOT. This version has passed comprehensive unit tests but has not been used yet in production. This version requires JDK8 due to AHC 2.x and can be used with the parallec-plugins with the same version 0.20.0-SNAPSHOT, details please check #37.

More Readings

  • More Examples on setting context, send to Elastic Search / Kafka, async running, auto progress polling, track progress, TCP/SSH/Ping. UDP example is here, with more to come.
  • Set Target Hosts from list, string, line by line text, json path, from local or remote URLs.
  • Full Documentation
  • Javadoc
  • Ping Demo Ping 8000 Servers within 11.1 Seconds, performance test vs. FPing.

User Group

  • Ask a question, and keep up to date on the library development by joining the discussion group / forum: Parallec.io Google Group.
  • Feel free to submit a Github Issue for any questions and suggestions too.
  • Check FAQ.

Use Cases

  1. Scalable web server monitoring, management, and configuration push, ping check.
  2. Asset / server status discovery, remote task execution in agent-less(parallel SSH) or agent based (parallel HTTP/TCP) method.
  3. Scalable API aggregation and processing with flexible destination with your favorate message queue / storage / alert engine.
  4. Orchestration and work flows on multiple web servers.
  5. Parallel different requests with controlled concurrency to a single server: as a parallec client for REST API enabled Database / Web Server CRUD operations. Variable replacement allowed in post body, url and headers.
  6. Load testing with request template.
  7. Network monitoring with active probing via UDP/Ping etc.

Features

Parallec is built on Akka actors and Async HTTP Client / Netty / Jsch. The library focuses on HTTP while also enables scalable communication over SSH/Ping/TCP.

90%+ Test coverage assures you always find an example of each of feature.

  1. Exceedingly intuitive interface with builder pattern similar to that in Async HTTP Client, but handles concurrency behind the scenes.
  2. Generic response handler with context. Special response context enables total freedom and convenience of processing each response your way. Process and aggregate data anywhere to Kafka, Redis, Elastic Search, mongoDB, CMS and etc.
  3. Flexible on when to invoke the handler: before (in worker thread) or after the aggregation (in master/manager thread).
  4. Flexible Input of target hosts: Input target hosts from a list, string, JSON Path from local files or a remote URL
  5. Scalable and fast, infinitely scalable with built-in Concurrency control.
  6. Auto-progress polling to enable task level concurrency with Async API for long jobs and orchestrations.
  7. Request template to handle non-uniform requests.
  8. Convenient single place handling success and failure cases. Handle in a single function where you can get the response including the actual response if success; or stacktrace and error details if failures.
  9. Capacity aware task scheduler helps you to auto queue up and fire tasks when capacity is insufficient. (e.g. submit consecutively 5 tasks each hitting 100K websites with default concurrency will result in a queue up)
  10. Fine-grained task progress tracking helps you track the the progress each individual task status. Of a parallel task on 1000 target hosts, you may check status on any single host task, and percentage progress on how many are completed.
  11. Fine-grained task cancelation on whole/individual request level. Of a parallel task on 1000 target hosts, you may cancel a subset of target hosts or cancel the whole parallel task anytime.
  12. Status-code-aggregation is provided out of the box.
  13. Parallel Ping supports both InetAddress.reachable ICMP (requires root) and Process based ping with retries. Performance testing shows it is 2x the speed of than best-effort tuned FPing on pinging on 1500 targets. (2.2 vs 4.5 sec)
  14. Parallel SSH supports both key and password based login and task cancellation.
  15. Parallel TCP/UDP supports idle timeout based channel closes.

Motivation

  • Flexible response handling and immediate processing embedded in other applications.
  • Handle async APIs with auto progress polling for task level concurrency control.
  • Support of other protocols, and more..
  • Anyone can write 20 lines to make his/her application become REST Commander.

With the feedbacks, lessons, and improvements from the past year of internal usage and open source of REST Commander, we now made the core of REST Commander as an easy to use standalone library. We added 15+ new features, rewritten 70%+ of the code, with 90%+ test coverage for confident usage and contribution. This time we also structure it better so that most internal development can be directly made here.

Watch Parallec in Action

[Watch Demo](https://www.youtube.com/watch?v=QcavegPMDms"Parallec demo - Click to Watch!"): Parallec Aggregates 100 websites status to elastic search and visualized with 20 lines of code.

20 lines parallec to elastic search demo

Watch Demo on HTTP Calls on 8000 Servers: 8,000 web server HTTP response aggregation to memory in 12 seconds / to ElasticSearch in 16 seconds.

[Watch Ping Demo](https://www.youtube.com/watch?v=9m1TFuO1Mys"Parallec Ping vs FPing demo - Click to Watch!"): Parallec is 2x Speed of best-efforted tuned FPing with same accurate results and pings 8000 servers within 11.1 seconds, details check here.

parallec pings 8000 servers in 11.1 seconds

Performance

Note that speed varies based on network speed, API response time, the slowest servers, timeout, and concurrency settings.

HTTP

We conducted remote task execution API on 3,000 servers with response aggregated to elastic search, visualized within 15 seconds, by writing 25 lines of code.

With another faster API, calls to 8,000 servers in the same datacenter with response aggregated in memory in 12 seconds.

Ping

Parallec 2.2 seconds vs FPing 4.5 seconds on 1500 servers. Parallec is 2x the speed of FPing (after best-effort tuning : -i 1 -r 0 v3.12) of pinging 1500 servers while getting the same ping results. Parallec pings 8000 servers within 11.1 seconds with breeze.

As usual, don't rely on these numbers and perform your own benchmarks.

Compare Parallec vs REST Commander vs ThreadPools+Async Client

  • Compared with java thread pool based solution, parallec gives you worry free concurrency control without constraints on thread size. Thread pools do not fit well when need to have a concurrency of 1000 (1000 threads..) or need a different concurrency setting for each request.
  • Compared with single-threaded Node.js solutions, Parallec enables parallel computation-intensive response handling with multiple-cores.
  • Similar issues with Python's global interpreter lock, and to use multiple CPU you will need to use costly multi-process. These are more suited for I/O only but no cpu intensive response processing.

In Parallec, you may handle response either in Worker (before aggregation: in parallel) or in Manager (after aggregation: single thread). Read More..

For more related work review, please visit here.

Features Parallec REST Commander Thread Pools + Async Client
Embedded library with intuitive builder pattern interface Parallec-logo Parallec-logo Parallec-logo
Ready to use application with GUI wizard based request submission and response aggregation Parallec-logo Parallec-logo Parallec-logo
Simple concurrency control not limited by thread size Parallec-logo Parallec-logo Parallec-logo
Immediate response handler without waiting all response return Parallec-logo Parallec-logo Parallec-logo
Capacity aware task scheduler and global capacity control Parallec-logo Parallec-logo Parallec-logo
Total freedom of response processing and API aggregation: Pluggable and generic response handler and response context Parallec-logo Parallec-logo Parallec-logo
1 line plugin to enable SSL Client auth Parallec-logo Parallec-logo Parallec-logo
90% Test Coverage Parallec-logo Parallec-logo Parallec-logo
Load target hosts from CMS query, JSON Path, text, list, string from URL/local Parallec-logo Parallec-logo Parallec-logo
Task level concurrency and orchestration for Async APIs: auto polling task progress Parallec-logo Parallec-logo Parallec-logo
Task level configuration on timeout and replacing Async HTTP Client Parallec-logo Parallec-logo Parallec-logo
Async and sync task control with progress polling and cancellation Parallec-logo Parallec-logo Parallec-logo
Scalable Parallel SSH with password and key based login Parallec-logo Parallec-logo Parallec-logo
Proven scalability and speed on 100,000+ target hosts in Production environment Parallec-logo Parallec-logo Parallec-logo
Generic request template with variable replacement for sending different requests to same/different target hosts Parallec-logo Parallec-logo Parallec-logo
Scalable Ping with Retries Parallec-logo Parallec-logo Parallec-logo
Scalable TCP/UDP with idle timeout Parallec-logo Parallec-logo Parallec-logo
Flexible handler location at either worker (in parallel) or manager thread Parallec-logo Parallec-logo Parallec-logo
Out-of-the-box two-level response aggregation on status code Parallec-logo Parallec-logo Parallec-logo
Configurable response log trimming on intervals Parallec-logo Parallec-logo Parallec-logo
Cancel task on a list of target hosts Parallec-logo Parallec-logo Parallec-logo

Plugins

We deeply thank all contributors for their effort.

Authors

Parallec is served to you by Yuanteng (Jeff) Pei and Teng Song, Cloud Infrastructure & Platform Services (CIPS) at eBay Inc. (original authors)

Credits & Acknowledgement

Contributions

Any helpful feedback is more than welcome. This includes feature requests, bug reports, pull requests, constructive feedback, and etc. You must agree on [this](https://github.com/eBay/parallec/blob/master/CONTRIBUTING.md) before submitting a [pull](https://github.com/eBay/parallec/pulls) request.

Licenses

Code licensed under Apache License v2.0

© 2015-2017 eBay Software Foundation

About

Fast Parallel Async HTTP/SSH/TCP/UDP/Ping Client Java Library. Aggregate 100,000 APIs & send anywhere in 20 lines of code. Ping/HTTP Calls 8000 servers in 12 seconds. (Akka) www.parallec.io

License:Apache License 2.0