Tag: hadoop

Why can’t DataNode download file?

It’s very strange.I have seen NameNode and DataNode that they have already started in jps command.I can go into the NameNode WebSite(50070) and use “hdfs dfs -get” to get file.But I can’t download file from the NameNode WebSite. Answer The problem is because the /etc/hosts file.I confi…

How to sync configuration between hadoop worker machines

hadoop linux presto redhat

We have huge hadoop cluster and we installed one coordinator preso node and 850 presto workers nodes. now we want to change the values in the file – config.properties but this should be done on all the workers! so under the file is like this and we want to change it to but this was done only on the firs…

Linux Hadoop Services monitoring tool and restart if down

hadoop hadoop2 linux performance

I have configured Hadoop 2.7.5 with Hbase. It is a 5 system cluster in fully distributed mode. I have to monitor Hadoop/Hbase daemons and want to start some action (e.g. mail ) if some daemon goes down. Is there any built-in solution. Also I want to start Hadoop at boot time. How can I do this ? Answer I am

how to update blueprint/ambari cluster in case increasing disks on worker machines

ambari hadoop hadoop-yarn hdfs linux

we have ambari cluster with 3 masters machines , two kafka’s and 3 workers machines each worker have 5 disks as : we want to add additional 5 disks to each worker machine as ( /dev/sdf , /dev/sdh …, etc ) remark – our Workers nodes runs both a DataNode and NodeManager for now I understand th…

Ambari cluster + Service Auto Start Configuration by API

ambari hadoop json linux

Ambari services can be configured to start automatically on system boot. Each service can be configured to start all components, masters and workers, or selectively. so how to enable all services in ambari cluster to start automatically on system boot by API ? Remark – by default all services are disabl…

Why not all parameters from ambari cluster not represented by the blueprint json file

ambari hadoop json linux

Why not all parameters from ambari cluster not represented by the blueprint json file ? I generated the blueprint json file as the foolwing: but when I access to the ambari GUI we noticed that many parameters not appears in the blueprint json file example of parameters from HDFS – config that not appears in t…

ambari + API syntax in order to change the parameters of the ambari services

ambari api bigdata hadoop linux

In Ambari cluster GUI ( Version 2.5.0.3 ) , each service has the Config button And when we click on Config button we can see the list of all relevant parameters and their values For example YARN service have the parameter – Minimum Container Size (Memory) in MB Of course we can change from the ambari GU…

How to store log files of a shell script in HDFS

hadoop hdfs linux shell

I have shell script in HDFS. I want to collect the logs for this script in HDFS only. The contents of the script are below: The logs are not appending to the files. only the files are being created. How can I get the files to have the result of the function to be appended in HDFS Answer The logs

Get list of files whose creation date is greater than some date linux

awk bash hadoop linux

I have these files in Hadoop and want the list of all files whose creation date is greater than 2016-11-21. I tried the command below but it’s printing all the files. How to get only the one’s which satisfy the condition Answer Pass the input date as a variable into awk expression(via -v option): …

AWK usage issue while archiving HDFS files in Hortonworks Distribution

awk bash hadoop hortonworks-data-platform linux

I am trying to move files in a HDFS directory that are over 3 days old to an archiving folder in HDFS. AWK Script: Note: cmd variable would have a mv command once this script starts working Issue: Value of variable X is constant Value of Variable Y is constant Unable to get day difference between 2 date , i