Monday, February 20, 2017

Akka with Zipkin - Async Framework with Distributed Tracing for Microservices

Microservices architectural style is an approach to develop a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, like HTTP or AMQP. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services which may be written in different programming languages and use different data storage technologies. 

A tracing infrastructure for distributed microservices needs to record information about all the work done in a system, on behalf of a given initiator.

This blog tries to explain how to do the same.

What is Akka Framework?

Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven microservices applications on the JVM.
It is asynchronous and distributed by design. It provides high-level abstractions like Actors, Streams and Futures.


Actors give you:

  • Simple and high-level abstractions for distribution, concurrency and parallelism.
  • Asynchronous, non-blocking and highly performant message-driven programming model.
  • Very lightweight event-driven processes (several million actors per GB of heap memory).

With the asynchronous design of this framework, it provides with challenges of logging. A request coming to the server can be handled by multiple asynchronous actors. These actors might not even reside in the same JVM. Then how do we track such requests and how do calculate or visualize the latency across the actors. This is the problem of Distributed Tracing.

This problem is solved by Zipkin.

What is Zipkin?

Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data. 

Applications are instrumented to report timing data to Zipkin. The Zipkin UI also presents a Dependency diagram showing how many traced requests went through each application. If you are troubleshooting latency problems or errors, you can filter or sort all traces based on the application, length of trace, annotation, or timestamp. 

Zipkin is based on Google Dapper paper. Link for the same is provided below. Dapper talks about Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met.


Using Zipkin system with Akka framework

Akka based actor system can be integrated with Zipkin for distributed tracing using an open-source project called akka-tracing available on Github.

At the time of writing this blog, there were not many integrations examples available of integrating Akka in Java with Zipkin using akka-tracing libraries. Hence I went out to write this blog and share my code on Github for reference.


Refer to my Akka Java project with Zipkin implementation on Github here - https://github.com/tuhingupta/akka-sample-tracing-java.git

Look at Readme file on Github for instructions of installing and running Zipkin and my project.

Zipkin Terminology


SpanThe basic unit of work.  Span’s are identified by a unique 64-bit ID for the span and another 64-bit ID for the trace the span is a part of. Spans also have other data, such as descriptions, timestamped events, key-value annotations (tags), the ID of the span that caused them, and process ID’s (normally IP address).

Trace: A set of spans forming a tree-like structure.

Annotation: is used to record existence of an event in time. Some of the core annotations used to define the start and stop of a request are:
  • cs - Client Sent - The client has made a request. This annotation depicts the start of the span.
  • sr - Server Received - The server side got the request and will start processing it. If one subtracts the cs timestamp from this timestamp one will receive the network latency.
  • ss - Server Sent - Annotated upon completion of request processing (when the response got sent back to the client). If one subtracts the sr timestamp from this timestamp one will receive the time needed by the server side to process the request.
  • cr - Client Received - Signifies the end of the span. The client has successfully received the response from the server side. If one subtracts the cs timestamp from this timestamp one will receive the whole time needed by the client to receive the response from the server.


Use case implemented in my project


The project I developed is available on Github. This tries to replicate a normal microservice scenario using Akka, where a user request is handled by ActorA. This actor creates the request and sends to second actor ActorB. This actor does its processing, updates the request and forwards the request to ExternalCallActor.  ExternalCallActor is the actor making an HTTP API call to external system (hypothetically). Once the response is received, the response is sent back to ActorA.

Now, if the actors are running on the same instance or different instances or seperate JVMs, since the request/response model is asynchronous (inherent to Akka), it would be difficult to trace various requests and their latencies. 

This is where Zipkin comes into action.  This software aggregates timing data that can be used to track down latency issues. When a request comes in the front door, Zipkin, a Java-based application, traces it as it goes through the system. Each request gets a unique identifier, which is passed along with the request to each microservice. For Zipkin to work, each microservice is instrumented with Zipkin library that the service then uses identify the request’s entry and exit ports. Libraries are available for C#, Java, JavaScript, Python, Go, Scala and Ruby.





Zipkin UI with log tracing from my project

Zipkin comes with a Web interface that shows the amount of traffic each microservice instance is getting. The log data can be filtered by application, length of trace, annotation, or timestamp.





You can drill down to an individual request and see its span and other data:




Reference Sites:
Github project repo - https://github.com/tuhingupta/akka-sample-tracing-java.git
Akka - http://doc.akka.io/docs/akka/2.4/intro/what-is-akka.html
Zipkin - http://zipkin.io/
Zipkin wiki - https://github.com/openzipkin/zipkin/wiki
Dapper - https://research.google.com/pubs/pub36356.html
akka-tracing - https://github.com/levkhomich/akka-tracing




Thursday, April 14, 2016

Kappa Architecture implementation using Apache Flink, Kafka and Cassandra



Refer to the code here - https://github.com/tuhingupta/kappa-streaming
What is Kappa Architecture?
Kappa Architecture is a software architecture pattern. Rather than using a relational DB like SQL or a key-value store like Cassandra, the canonical data store in a Kappa Architecture system is an append-only immutable log. From the log, data is streamed through a computational system and fed into auxiliary stores for serving.


Source: Oreilly site
It follows a Command Query Responsibility Segregation pattern.

Client Writes

Client sends a stream of data, which could be 1 record or hundred thousands of records as a byte stream of json, csv or any other data format.

Server

Is a non-blocking reactive server written to accept bytes of data and parse/process bytes into records and put them into an immutable append log system like Kafka. 

Kafka

Kafka is an immutable append-only log based messaging system, where the server will write the records received as byte stream. This acts as the log from where different processes can read data and generate client specific views.

Processors

Various in-memory, stream processing frameworks like Flink, Storm, Samza can be used to process these logs to generate client specific views. These processors would be consistently churning data available on topics in Kafka and generating updated data.

Client Reads

Client now reads from the many materialized views that were created by processing the immutable append log.

Sample Use cases:

Log processing


Friday, February 26, 2016

Docker Image for JBoss EAP

Docker Image for JBoss EAP 

Following post explains how to create Docker Image for JBoss EAP and expose its management and web 8080 port for access from host machine or other applications.

Before proceeding with this article you should have Docker and Docker Machine installed on your machine.

You can also refer to my previous post on how to create simple java post.

Dockerfile

Create a similar Docker file. This file is available at following Github repo - tuhingupta/docker-jboss-eap.


If you look at the dockerfile, you will notice:
at line 1, we start with FROM key word. This tells Docker what image to use as base image.
at line 2, we specify MAINTAINER that tells the name, email of the person maintaining the image
at line 4, WORKDIR specifies the working directory in the image.
at line 5, RUN keyword runs whatever command proceeds it.
at line 6, copy jboss-eap zip file to /usr/software folder in the image
at line 9, unzip jbos-eap zip
at line 15, setup user for jboss admin
at line 18, set JAVA_OPT . We need to bind address and management console address so that EAP will bind on all IP addresses for our container.
at line 21, EXPOSE command is used to expose ports from the container.
at line 26, ENTRYPOINT command tells what command will run when the container starts.

Save this file as Dockerfile in a folder that also contains jboss-eap-6.2.4.zip file.

Building the container 

Now you build the image using the following command:

$ docker build --rm -f Dockerfile -t jboss-eap .

Once the image is built, it is ready to be run.

Run docker image

Run the docker image

$ docker run -it -p 0.0.0.0:9990:9990 -p 0.0.0.0:8080:8080 --name jboss-serv -d jboss-eap

you will notice that we used -p option. 
-p option publishes a container's port(s) to the host.

This option publishes the ip that we bind in line 18 of Dockerfile so that it can be accessed from host by using localhost.

Accessing JBoss URLs

Now you can access JBoss management console using 
http://localhost:9990/management



Saturday, February 20, 2016

Multi-platform Hybrid Mobile App Development

Looking for a hybrid mobile app using ionic, cordova, angular.js.
Refer to a sample restaurant app developed using Ionic, Cordova and Angular JS
The backend data is provided using json-server.

Refer to the project on my Github site - https://github.com/tuhingupta/ionic-cordova

Post you comments/questions on github or on this blog for more information about the application or how to develop a similar mobile app.


Reference Docs:
http://ionicframework.com/docs/components/

Wednesday, February 3, 2016

Amazon EC2 - Install node application on EC2 instance.


Before working on this blog you must have the EC2 environment setup. Refer to my older blogs for the same.



Get your project

Use git to download your node.js application. 
I have my application on Github - https://github.com/tuhingupta/gitwebhooks

$ git clone https://github.com/tuhingupta/gitwebhooks.git
$ cd gitwebhooks

Configure EC2 to listen inbound traffic at 8080

Somethings need to be configured in your application so that it can be accessed over internet.

Have your node.js application run server over port 8080

In my application (gitwebhooks) above I have set it in app.js

app.set('port', process.env.PORT || 8080);

Now configure your EC2 instance to listen inbound traffic at port 8080

Setup your security groups in EC2 like image below:



Now route HTTP traffic on port 80 to redirect to port 8080

$ sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to 8080

Access your application from internet

Now you can access your application over internet using your public DNS.

I can access my application (a Node.js Express REST API) using:

http://ec2-{my_ec2_id}.us-west-2.compute.amazonaws.com/api/name


Amazon AWS - Installing Git, Node.js and npm on EC2 instance

Before we setup Node.js on AWS, you should have AWS EC2 instance setup and you should be able to login using Putty or Terminal.

Refer to my previous blog on how to setup EC2 instance -
 http://javaredhot.blogspot.com/2016/02/amazon-ec2-creating-and-setting-up.html

Now login to EC2 instance

Install Git

$ sudo yum install git
This will ask for your confirmation and then install git.


Install Node.js

To install node and npm following steps need to be performed.

To compile node we need gcc, make and git to import node source code:
sudo yum install gcc-c++ make
sudo yum install openssl-devel
sudo yum install git
Cloning node.js source code:
git clone https://github.com/nodejs/node.git
Compile and install node.js
cd node
./configure 
make
sudo make install
Add user´s directory to BIN Paths (node binaries location)
sudo su
nano /etc/ec2-user
Inside the editor scroll to where you see the line: 
Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin
Append the value :/usr/local/bin
In some cases node does not install at /usr/local/bin and there is no soft link to the actual location. So in that case you can create a soft link from actual node location to /usr/local/bin

$ cd /usr/local/bin
$ sudo ln -s  /home/ec2-user/node/out/Release/node node
$ sudo ln -s /home/ec2-user/node/out/lib/node_modules/npm/bin/npm npm
$ sudo ln -s /home/ec2-user/node/out/lib/node_modules node_modules

Install npm

git clone https://github.com/npm/npm.git
cd npm
sudo make install

Test if node is working:

node
References:

https://gist.github.com/isaacs/579814
http://iconof.com/blog/how-to-install-setup-node-js-on-amazon-aws-ec2-complete-guide/

Amazon EC2 - Creating and setting up an instance on AWS

Setup AWS account

Set up AWS account at - https://aws.amazon.com/


Launch EC2 instance

On AWS console, 
  • Go to AWS services dropdown and select EC2 instance
  • Create and launch EC2 instance



Select Container and configure


  • I selected Amazon Linux
  • Using the configuration available as Free Tier.

Once setup & review is complete you are now provision your instance. And it will be ready to launch



Login to the instance


You can use Mac terminal or Putty on Windows to connect AWS instance.

Store pem file downloaded when creating instance. Make sure that the permissions are restrictive. Hence use 
$ chmod 400 aws.pem

Now use ssh command to connect to instance

ssh -i /path/my-key-pair.pem ec2-user@ec2-198-51-100-1.compute-1.amazonaws.com

For more details on how to ssh, read here - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html