pio eval завершается с ошибкой OutOfMemoryError: превышен лимит накладных расходов GC

Я использую PredictionIO 0.9.6 и шаблон Recommendation. Я пытаюсь настроить гиперпараметры, но pio eval не работает с OutOfMemoryError: GC overhead limit exceeded, что заставляет меня думать, что мне нужно переопределить некоторые настройки памяти по умолчанию. Я уже увеличиваю память драйвера:

pio eval twel.RecommendationEvaluation twel.EngineParamsList -- --driver-memory 12G

Я не использую кластер, только одну машину. Команда выполняется в течение двух часов или около того, а затем прерывается. Что я должен настроить, чтобы избежать ошибки?

Полное сообщение об ошибке:

[ERROR] [Executor] Managed memory leak detected; size = 1344301673 bytes, TID = 1233
[ERROR] [Executor] Exception in task 0.0 in stage 2539.0 (TID 1233)
[ERROR] [SparkUncaughtExceptionHandler] Uncaught exception in thread Thread[Executor task launch worker-15,5,main]
[WARN] [TaskSetManager] Lost task 0.0 in stage 2539.0 (TID 1233, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:326)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
        at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
        at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:147)
        at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:185)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:206)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:55)
        at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:93)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:55)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:158)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:45)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:89)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:140)
        at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:136)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:136)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

[ERROR] [TaskSetManager] Task 0 in stage 2539.0 failed 1 times; aborting job
[ERROR] [Executor] Managed memory leak detected; size = 1342764217 bytes, TID = 1236
[ERROR] [Executor] Managed memory leak detected; size = 1337431506 bytes, TID = 1235
Exception in thread "main" [WARN] [TaskSetManager] Lost task 3.0 in stage 2539.0 (TID 1236, localhost): TaskKilled (killed intentionally)
[WARN] [TaskSetManager] Lost task 2.0 in stage 2539.0 (TID 1235, localhost): TaskKilled (killed intentionally)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2539.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2539.0 (TI
D 1233, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:326)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
        at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
        at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:147)
        at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:185)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:206)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:55)
        at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:93)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:55)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:158)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:45)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:89)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:140)
        at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:136)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:136)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
        at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
        at org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$stats$1.apply(DoubleRDDFunctions.scala:42)
        at org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$stats$1.apply(DoubleRDDFunctions.scala:42)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.DoubleRDDFunctions.stats(DoubleRDDFunctions.scala:41)
        at io.prediction.controller.StatsOptionMetricHelper$class.calculateStats(Metric.scala:83)
        at io.prediction.controller.OptionAverageMetric.calculateStats(Metric.scala:121)
        at io.prediction.controller.OptionAverageMetric.calculate(Metric.scala:132)
        at io.prediction.controller.OptionAverageMetric.calculate(Metric.scala:121)
        at io.prediction.controller.MetricEvaluator$$anonfun$5.apply(MetricEvaluator.scala:226)
        at io.prediction.controller.MetricEvaluator$$anonfun$5.apply(MetricEvaluator.scala:224)
        at scala.collection.parallel.AugmentedIterableIterator$class.map2combiner(RemainsIterator.scala:120)
        at scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:67)
        at scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1057)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
        at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
        at scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1054)
        at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
        at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
        at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
        at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
        at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:444)
        at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:514)
        at scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:492)
        at scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:64)
        at scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:961)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53)
        at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56)
        at scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:956)
        at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165)
        at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
        at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:326)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
        at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
        at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:147)
        at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:185)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:206)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:55)
        at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:93)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:55)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:158)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:45)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:89)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:140)
        at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:136)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:136)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

person stholzm    schedule 21.06.2016    source источник
comment
У меня были похожие проблемы. Я думаю, что решил их, запустив меньше сценариев. Вы можете изменить это в Evaluation.scala. Поэтому вместо того, чтобы пытаться ввести 3 разных ранга и 3 разных numIterations, попробуйте по 2 каждого из них.   -  person alex9311    schedule 22.06.2016
comment
@ alex9311 Я уже пробовал это (rank <- Seq(10); numIterations <- Seq(10)), в то время как обучение проходит гладко на моей машине с 30 каждым. Так что я предполагаю, что pio eval управляет памятью по-другому...   -  person stholzm    schedule 22.06.2016
comment
У меня также такая же проблема, кто-нибудь может помочь? +1. Я понятия не имею, о чем вы говорите в комментариях.. :/ Блин, я в отчаянии< /а>!   -  person gsamaras    schedule 01.09.2016