YARN Virtual Memory issue in Hadoop

We often get into YARN container memory issue like the error message below. This may happen if you are running various Hadoop applications like Hive, Shell, Pig, Sqoop, or Spark, either running from a command line (CLI) or running from a Oozie workflow.

Current usage: 135.2 MB of 2 GB physical memory used; 6.4 GB of 4.2 GB virtual memory used. Killing container.

I read many blogs, StackOverflow posts, Hadoop/YARN documentation, and they suggest to set the one or more of following parameters.

In mapred-site.xml:





In yarn-site.xml:



I was running my applications on AWS EMR (Elastic MapReduce – AWS’s Hadoop distribution) from an Oozie workflow, and none of those above settings helped. I was setting those parameters only on master node and restarting the YARN process. But when the application, especially a Shell script that can run on any slave nodes, is running on a slave node, the YARN settings on master node didn’t help. I had to set those parameters on every slave node (Node Manager) of the cluster.

And that can be done using configuration like below. This configuration has to be set while launching the EMR cluster. This can be directly set on the EMR console or load from a JSON file. This configuration setting sets the parameters on yarn-site.xml on all slave nodes.

   "Classification": "yarn-site",
   "Properties": {
     "yarn.nodemanager.vmem-pmem-ratio": "10",
     "yarn.nodemanager.vmem-check-enabled": "false"

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s