hadoop hdfs API操作

hadoop的hdfs API的基本操作

简单的介绍

hadoop为我们提供了hdfs非常方便的shell命令(类似于Linux文件操作的命令),再者。hadoop还为我们提供hdfsAPI,使我们开发人员可以对hfds进行一些操作。如:copy文件(从本地到hdfs,从hdfs到本地)、删除文件或者目录、读取文件的内容、看文件的相关信息、列出文件的所有子目录,在文件后面追加内容。(注意:hdfs不支持文件中某一行的修改,只允许追加内容到文件的后面)。

首先我初始化hdfs,最后将hdfs关闭:
<span style="white-space:pre">	</span>private static final String HDFS_PATH = "hdfs://localhost:8020";
	private Configuration conf = null;
	private FileSystem fs = null;

	@Before
	public void beforeClass() throws IOException {
		conf = new Configuration();
		fs = FileSystem.get(URI.create(HDFS_PATH), conf);
	}

	@After
	public void AfterClass() throws IOException {
		fs.close();
	}


从本地copy文件到hdfs或者是从hdfs copy文件到本地

@Test
	public void testCopyLocalFileToHDFS() throws IOException {

		String[] args = { "/test.txt1",
				"hdfs://localhost:8020/user/root/test.txt" };
		if (args.length != 2) {
			System.err.println("Usage: filecopy <source> <target>");
			System.exit(2);
		}
		InputStream in = new BufferedInputStream(new FileInputStream(args[0]));
		FileSystem fs = FileSystem.get(URI.create(args[1]), conf);
		OutputStream out = fs.create(new Path(args[1]));
		IOUtils.copyBytes(in, out, conf);

		// fs.copyFromLocalFile(new
		// Path("/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz"), new
		// Path(HDFS_PATH+"/user/root/"));

		fs.copyToLocalFile(
				new Path(
						"hdfs://localhost:8020/user/root/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz"),
				new Path("/user/"));

	}

删除文件

@Test
	public void deleteFile() throws IOException {
		fs.delete(new Path("hdfs://localhost:8020/user/root/out1"), true);
	}

读取文件到输出流

@Test
	public void readFile() {
		InputStream in = null;
		try {
			in = fs.open(new Path(HDFS_PATH + "/user/root/test.txt"));
			IOUtils.copyBytes(in, System.out, conf);
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			IOUtils.closeStream(in);
		}
	}

获取文件的信息

@Test
	public void getFileInfo() throws IllegalArgumentException, IOException {
		FileStatus fSta = fs.getFileStatus(new Path(HDFS_PATH
				+ "/user/root/test.txt"));
		System.out.println(fSta.getAccessTime());
		System.out.println(fSta.getBlockSize());
		System.out.println(fSta.getModificationTime());
		System.out.println(fSta.getOwner());
		System.out.println(fSta.getGroup());
		System.out.println(fSta.getLen());
		System.out.println(fSta.getPath());
		System.out.println(fSta.isSymlink());
	}

列出目录下的所有文件

@Test
	public void listFile() throws FileNotFoundException,
			IllegalArgumentException, IOException {

		RemoteIterator<LocatedFileStatus> iterator = fs.listFiles(new Path(
				HDFS_PATH + "/user/root/"), true);
		while (iterator.hasNext()) {
			System.out.println(iterator.next());
		}

		FileStatus[] fss = fs.listStatus(new Path(HDFS_PATH + "/"));
		Path[] ps = FileUtil.stat2Paths(fss);
		for (Path p : ps) {
			System.out.println(p);
		}

		FileStatus sta = fs
				.getFileStatus(new Path(
						"hdfs://localhost:8020/user/root/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz"));
		BlockLocation[] bls = fs.getFileBlockLocations(sta, 0, sta.getLen());
		for (BlockLocation b : bls) {
			for (String s : b.getTopologyPaths())
				System.out.println(s);
			for (String s : b.getHosts())
				System.out.println(s);
		}

在文件的后头追加东西 append

首先我们要设置hdfs支持在文件后头追加内容
在hdfs-site.xml 加入
<property>
        <name>dfs.support.append</name>
        <value>true</value>
   </property>
代码实现是:
@Test
	public void appendFile() {
		String hdfs_path = "hdfs://localhost:8020/user/root/input/test.txt";// 文件路径

		// conf.setBoolean("dfs.support.append", true);

		String inpath = "/test.txt1";
		try {
			// 要追加的文件流,inpath为文件
			InputStream in = new BufferedInputStream(
					new FileInputStream(inpath));
			OutputStream out = fs.append(new Path(hdfs_path));
			IOUtils.copyBytes(in, out, 4096, true);

		} catch (IOException e) {

			e.printStackTrace();

		}

	}






hadoop hdfs API操作

上一篇:JQuery 点击控件获取当前坐标时不兼容IE7


下一篇:Windows xp IIS显示403错误解决方案