Category Archives: amazon-emr

Apache Flink not able to start in cluster mode

I tried installing Flink on Amazon EMR with 1 master & 2 core nodes..In flink-conf.yaml file,I set jobmanager.rpc.address=EMR-MASTERDNS & taskmanager.numberOfTaskSlots=2 ...I then added the 2 Public IPs of core nodes in slaves file..I did all the same activities for the core nodes...but when I started the Flink cluster by ./bin/start-cluster.sh & then I opened the Flink Dashboard,it said taskmanagers=0,taskslots=0 ...I tried executing Flink java program,it threwed the error : org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #0 Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 ...What is the proper configuration required to start Flink on Cluster i.e. in distributed mode ??

Implementing KafkaConnect on EC2 using HDP2.3

I am following the steps given in the link http://www.confluent.io/blog/how-to-build-a-scalable-etl-pipeline-with-kafka-connect to install the kafka -connect on EC2 having HDP2.3 platform.

But I am getting the error : ERROR Failed to flush WorkerSourceTask{id=test-mysql-jdbc-0}, timed out while waiting for producer to flush outstanding messages, 1 left

Complete error can be seen in the following : image

Is this a kafka issue or HDP issue ?, because i did the same thing on AWS EMR and it worked .

Analyze apache log files files with mrjob?

I want to analyze huge amounts of Apache log files with Python and Mr.Job. I have the complete environment set up, everything is working as expected.

Has anybody have any examples or pointers on where to start analyzing (mapreducing) Apache log files with mrjob?

Example Log Line looks like this:

8047524342 1403256899 11.22.999.0 R domain.subdomain.com 47ac41d2d6566a27 ERR E1 E5 /blank.gif Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0) http://subdomain.domain/directory1/blablabla/othername&param&param