Is that possible to run HADOOP and copy a file from local fs to HDFS in JAVA BUT without installing Hadoop on file system?

I have NOT installed hadoop on my Linux file System. I would like to run hadoop and copy the file from local file system to HDFS WITHOUT installing hadoop on my Linux file System. I have created a sample code but it says “wrong FS, expected file:///”. Any help for this?

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.net.URI;

/**
 * Created by Ashish on 23/4/15.
*/
public class SampleHadoop {

    public static void main(String[] args) throws Exception {
        try {

            Configuration configuration = new Configuration();
            FileSystem fs = FileSystem.get(new URI("hdfs://192.168.1.170:54310/"),configuration);
            fs.copyFromLocalFile(new Path("./part-m-00000"), new Path("hdfs://192.168.1.170:54310/user/hduser/samplefile"));
            fs.close();
        } catch (Exception ex) {
          System.out.println("Exception "+ex.toString());
        }
    }
}

JavaScript
​x
 
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;​import java.io.BufferedInputStream;import java.io.File;import java.io.FileInputStream;import java.io.InputStream;import java.net.URI;​/** * Created by Ashish on 23/4/15.*/public class SampleHadoop {​    public static void main(String[] args) throws Exception {        try {​            Configuration configuration = new Configuration();            FileSystem fs = FileSystem.get(new URI("hdfs://192.168.1.170:54310/"),configuration);            fs.copyFromLocalFile(new Path("./part-m-00000"), new Path("hdfs://192.168.1.170:54310/user/hduser/samplefile"));            fs.close();        } catch (Exception ex) {          System.out.println("Exception "+ex.toString());        }    }}​

POM.XML

<dependencies>
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>9.3-1102-jdbc41</version>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
        <version>4.3.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>1.0.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.sqoop</groupId>
        <artifactId>sqoop-client</artifactId>
        <version>1.99.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.sqoop</groupId>
        <artifactId>sqoop</artifactId>
        <version>1.4.0-incubating</version>
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.34</version>
    </dependency>
    <dependency>
        <groupId>org.apache.sqoop</groupId>
        <artifactId>sqoop-tools</artifactId>
        <version>1.99.4</version>
    </dependency>
    <dependency>
        <groupId>commons-httpclient</groupId>
        <artifactId>commons-httpclient</artifactId>
        <version>3.1</version>
    </dependency>
</dependencies>

JavaScript
 
<dependencies>    <dependency>        <groupId>org.postgresql</groupId>        <artifactId>postgresql</artifactId>        <version>9.3-1102-jdbc41</version>    </dependency>    <dependency>        <groupId>org.apache.httpcomponents</groupId>        <artifactId>httpclient</artifactId>        <version>4.3.4</version>    </dependency>    <dependency>        <groupId>org.apache.hadoop</groupId>        <artifactId>hadoop-client</artifactId>        <version>1.0.4</version>    </dependency>    <dependency>        <groupId>org.apache.sqoop</groupId>        <artifactId>sqoop-client</artifactId>        <version>1.99.1</version>    </dependency>    <dependency>        <groupId>org.apache.sqoop</groupId>        <artifactId>sqoop</artifactId>        <version>1.4.0-incubating</version>    </dependency>    <dependency>        <groupId>mysql</groupId>        <artifactId>mysql-connector-java</artifactId>        <version>5.1.34</version>    </dependency>    <dependency>        <groupId>org.apache.sqoop</groupId>        <artifactId>sqoop-tools</artifactId>        <version>1.99.4</version>    </dependency>    <dependency>        <groupId>commons-httpclient</groupId>        <artifactId>commons-httpclient</artifactId>        <version>3.1</version>    </dependency></dependencies>​

I looked for all possible solution and found following:

...
Configuration conf = new Configuration();
conf.addResource(new Path("/home/user/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/home/user/hadoop/conf/hdfs-site.xml"));

JavaScript
 
...Configuration conf = new Configuration();conf.addResource(new Path("/home/user/hadoop/conf/core-site.xml"));conf.addResource(new Path("/home/user/hadoop/conf/hdfs-site.xml"));​

BUT in my case I do not want to install hadoop on my liunx file system so I could not specify such path like “home/user/hadoop”. I prefer if I could make it run only using jar files.

Answer

The right choice for your use case will be using WebHDFS api. It supports the systems running outside Hadoop clusters to access and manipulate the HDFS contents. It doesn’t require the client systems to have hadoop binaries installed, you could manipulate remote hdfs over http using CURL itself.

Please refer,

https://hadoop.apache.org/docs/r1.2.1/webhdfs.html

http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/

Advertisement

Answer