I have NOT installed hadoop
on my Linux file System. I would like to run hadoop
and copy the file from local file system
to HDFS
WITHOUT installing hadoop
on my Linux file System. I have created a sample code but it says “wrong FS, expected file:///”. Any help for this?
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.*; import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.net.URI; /** * Created by Ashish on 23/4/15. */ public class SampleHadoop { public static void main(String[] args) throws Exception { try { Configuration configuration = new Configuration(); FileSystem fs = FileSystem.get(new URI("hdfs://192.168.1.170:54310/"),configuration); fs.copyFromLocalFile(new Path("./part-m-00000"), new Path("hdfs://192.168.1.170:54310/user/hduser/samplefile")); fs.close(); } catch (Exception ex) { System.out.println("Exception "+ex.toString()); } } }
POM.XML
<dependencies> <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <version>9.3-1102-jdbc41</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.3.4</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>1.0.4</version> </dependency> <dependency> <groupId>org.apache.sqoop</groupId> <artifactId>sqoop-client</artifactId> <version>1.99.1</version> </dependency> <dependency> <groupId>org.apache.sqoop</groupId> <artifactId>sqoop</artifactId> <version>1.4.0-incubating</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.34</version> </dependency> <dependency> <groupId>org.apache.sqoop</groupId> <artifactId>sqoop-tools</artifactId> <version>1.99.4</version> </dependency> <dependency> <groupId>commons-httpclient</groupId> <artifactId>commons-httpclient</artifactId> <version>3.1</version> </dependency> </dependencies>
I looked for all possible solution and found following:
... Configuration conf = new Configuration(); conf.addResource(new Path("/home/user/hadoop/conf/core-site.xml")); conf.addResource(new Path("/home/user/hadoop/conf/hdfs-site.xml"));
BUT in my case I do not want to install hadoop
on my liunx file system so I could not specify such path like “home/user/hadoop”. I prefer if I could make it run only using jar files.
Advertisement
Answer
The right choice for your use case will be using WebHDFS api. It supports the systems running outside Hadoop clusters to access and manipulate the HDFS contents. It doesn’t require the client systems to have hadoop binaries installed, you could manipulate remote hdfs over http using CURL itself.
Please refer,
https://hadoop.apache.org/docs/r1.2.1/webhdfs.html
http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/