Category Archives: apache-beam

java.lang.RuntimeException: Error while staging packages

I am facing RuntimeException when running dataflow (Apache Beam SDK 2.4.0) from AppEngine Cron jobs.

I tried to investigate on dependencies but It leads me nowhere. Does anyone have a clue why this error occurs?

This problem did not happen to me with an older version of Dataflow (like 1.9.1 for example)

Below the stacktrace:

java.lang.RuntimeException: Error while staging packages
at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:396)
at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:273)
at org.apache.beam.runners.dataflow.util.GcsStager.stageFiles(GcsStager.java:76)
at org.apache.beam.runners.dataflow.util.GcsStager.stageDefaultFiles(GcsStager.java:64)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:661)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:174)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at com.apps.loader.dataflow.ContactExtenderDataflow.runDataflow(ContactActualExtenderDataflow.java:80)
at com.apps.loader.dataflow.ContactExtenderDataflow.runDataflow(ContactActualExtenderDataflow.java:74)
at com.apps.loader.servlet.ContactExtenderLoader.doGetPost(ContactActualExtenderLoader.java:51)
at com.apps.loader.servlet.ContactExtenderLoader.doPost(ContactActualExtenderLoader.java:41)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
at com.googlecode.objectify.ObjectifyFilter.doFilter(ObjectifyFilter.java:48)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at com.google.apphosting.utils.servlet.ParseBlobUploadFilter.doFilter(ParseBlobUploadFilter.java:125)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at com.google.apphosting.runtime.jetty9.SaveSessionFilter.doFilter(SaveSessionFilter.java:37)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at com.google.apphosting.utils.servlet.JdbcMySqlConnectionCleanupFilter.doFilter(JdbcMySqlConnectionCleanupFilter.java:60)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter(TransactionCleanupFilter.java:48)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at com.google.apphosting.runtime.jetty9.AppVersionHandlerMap.handle(AppVersionHandlerMap.java:297)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at com.google.apphosting.runtime.jetty9.RpcConnection.handle(RpcConnection.java:202)
at com.google.apphosting.runtime.jetty9.RpcConnector.serviceRequest(RpcConnector.java:81)
at com.google.apphosting.runtime.jetty9.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:108)
at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.dispatchServletRequest(JavaRuntime.java:686)
at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.dispatchRequest(JavaRuntime.java:648)
at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:618)
at com.google.apphosting.runtime.JavaRuntime$NullSandboxRequestRunnable.run(JavaRuntime.java:812)
at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:274)
at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NullPointerException: Operation not allowed in a thread that is neither the original request thread nor a thread created by ThreadManager
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory.newThread(ApiProxyImpl.java:1224)
at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1357)
at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:530)
at java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1640)
at java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1858)
at org.apache.beam.sdk.util.MoreFutures.supplyAsync(MoreFutures.java:98)
at org.apache.beam.runners.dataflow.util.PackageUtil.stagePackage(PackageUtil.java:173)
at org.apache.beam.runners.dataflow.util.PackageUtil.lambda$stageClasspathElements$2(PackageUtil.java:358)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

APACHE BEAM- Issue while reading from hive

Hello i get this error when i try reading from Hive. Can someone please help me with this. Also i have never given configuration details before for my cluster for hive but beam just doesn't allow you to do anything without having the following as config. I tried many options for key and values for mapping but i get this below error every time. Can someone help in spotting what would be the exact problem?

Also to just let you guys know that 2.1.0 Beam which is not yet released resolved this issue for me but very curious why could not achieve this with 2.0.0

Configuration hcatConf = new Configuration();
hcatConf.setClass("mapreduce.job.inputformat.class",HCatInputFormat.class, InputFormat.class);
hcatConf.setClass("key.class", LongWritable.class, Object.class);
hcatConf.setClass("value.class", HCatRecord.class, Object.class);

This is however giving me an error when i try to run.

Caused by: java.lang.IllegalArgumentException: Forbidden IOException when reading from InputStream
at org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:133)
at org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:113)
at org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:107)
at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:156)
at org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO$HadoopInputFormatBoundedSource$HadoopInputFormatReader.cloneIfPossiblyMutable(HadoopInputFormatIO.java:707)
at org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO$HadoopInputFormatBoundedSource$HadoopInputFormatReader.transformKeyOrValue(HadoopInputFormatIO.java:696)
at org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO$HadoopInputFormatBoundedSource$HadoopInputFormatReader.getCurrent(HadoopInputFormatIO.java:669)
at org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO$HadoopInputFormatBoundedSource$HadoopInputFormatReader.getCurrent(HadoopInputFormatIO.java:584)
at org.apache.beam.runners.spark.io.SourceRDD$Bounded$ReaderToIteratorAdapter.tryProduceNext(SourceRDD.java:189)
... 34 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at org.apache.hadoop.io.LongWritable.readFields(LongWritable.java:47)
at org.apache.beam.sdk.io.hadoop.WritableCoder.decode(WritableCoder.java:84)
at org.apache.beam.sdk.io.hadoop.WritableCoder.decode(WritableCoder.java:53)
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:160)
at org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:130)
... 42 more

Dataflow JAVA SDK : Take the code as an input, process at the backed

Please support me to understand the implementation of following scenario.

Suppose the user types the code written using data flow SDK commands in a text box at the front end. We need to get that code (let's say as a string) and execute at the back end. Does Data flow SDK provide a facility like a execution manager, to do such a thing? Also some resources to get familiar with such an implementation would be much appreciated.

Thanks