Category Archives: apache-spark-2.0

Spark & Scala MultilayerPerceptronClassifier – java.lang.NegativeArraySizeException

I am running to a "NegativeArraySizeException" error when I try to train a MultilayerPerceptronClassifier model in Spark Scala. I am following the example found here. Below is my code - I am aware that the layers typically go from large to small and that they are in reverse here - this is experimental but I have found nothing in the docs that explicitly prevent it.

// Data in
var df= spark.read.option("delimiter", "|").option("inferSchema", "true").format("csv").load("hdfs://master-node:9000//example.csv")

// Split the data into train and test
val splits = df.randomSplit(Array(0.9, 0.1))
val train = splits(0)
val test = splits(1)

// Set input dimensions
val input_dim = 3072
val output_dim = 198000

// Specify layers
val layers = Array[Int](input_dim, 24777, 49555, output_dim)

// create the trainer and set its parameters
val trainer = new MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100).setFeaturesCol("features").setLabelCol("label")

// train the model
val model = trainer.fit(train)

The error is thrown upon "trainer.fit(train)." The data is is the same format as the example where labels = Double and data = Vector.

Here is the error, any familiarly with it?

scala> val model = trainer.fit(train)
java.lang.NegativeArraySizeException
  at scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:141)
  at scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:139)
  at breeze.linalg.DenseVector$.zeros$mDc$sp(DenseVector.scala:254)
  at org.apache.spark.ml.ann.FeedForwardModel$.apply(Layer.scala:564)
  at org.apache.spark.ml.ann.FeedForwardTopology.model(Layer.scala:397)
  at org.apache.spark.ml.ann.FeedForwardTrainer.train(Layer.scala:807)
  at org.apache.spark.ml.classification.MultilayerPerceptronClassifier.train(MultilayerPerceptronClassifier.scala:260)
  at org.apache.spark.ml.classification.MultilayerPerceptronClassifier.train(MultilayerPerceptronClassifier.scala:145)
  at org.apache.spark.ml.Predictor.fit(Predictor.scala:96)
  ... 52 elided

Spark.read.csv Error: java.io.IOException: Permission Denied

I am using Spark v2.0 and trying to read a csv file using:

spark.read.csv("filepath")

But getting the below error:

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
  at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
  at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
  at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:342)
  ... 48 elided
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
  ... 71 more
Caused by: java.io.IOException: Permission denied
  at java.io.UnixFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
  ... 71 more

I have also tried using .format("csv").csv("filepath"),but that is also giving same results.