hadoop的hdfs API的基本操作
简单的介绍
hadoop为我们提供了hdfs非常方便的shell命令(类似于Linux文件操作的命令),再者。hadoop还为我们提供hdfsAPI,使我们开发人员可以对hfds进行一些操作。如:copy文件(从本地到hdfs,从hdfs到本地)、删除文件或者目录、读取文件的内容、看文件的相关信息、列出文件的所有子目录,在文件后面追加内容。(注意:hdfs不支持文件中某一行的修改,只允许追加内容到文件的后面)。
首先我初始化hdfs,最后将hdfs关闭:
<span style="white-space:pre"> </span>private static final String HDFS_PATH = "hdfs://localhost:8020"; private Configuration conf = null; private FileSystem fs = null; @Before public void beforeClass() throws IOException { conf = new Configuration(); fs = FileSystem.get(URI.create(HDFS_PATH), conf); } @After public void AfterClass() throws IOException { fs.close(); }
从本地copy文件到hdfs或者是从hdfs copy文件到本地
@Test public void testCopyLocalFileToHDFS() throws IOException { String[] args = { "/test.txt1", "hdfs://localhost:8020/user/root/test.txt" }; if (args.length != 2) { System.err.println("Usage: filecopy <source> <target>"); System.exit(2); } InputStream in = new BufferedInputStream(new FileInputStream(args[0])); FileSystem fs = FileSystem.get(URI.create(args[1]), conf); OutputStream out = fs.create(new Path(args[1])); IOUtils.copyBytes(in, out, conf); // fs.copyFromLocalFile(new // Path("/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz"), new // Path(HDFS_PATH+"/user/root/")); fs.copyToLocalFile( new Path( "hdfs://localhost:8020/user/root/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz"), new Path("/user/")); }
删除文件
@Test public void deleteFile() throws IOException { fs.delete(new Path("hdfs://localhost:8020/user/root/out1"), true); }
读取文件到输出流
@Test public void readFile() { InputStream in = null; try { in = fs.open(new Path(HDFS_PATH + "/user/root/test.txt")); IOUtils.copyBytes(in, System.out, conf); } catch (IOException e) { e.printStackTrace(); } finally { IOUtils.closeStream(in); } }
获取文件的信息
@Test public void getFileInfo() throws IllegalArgumentException, IOException { FileStatus fSta = fs.getFileStatus(new Path(HDFS_PATH + "/user/root/test.txt")); System.out.println(fSta.getAccessTime()); System.out.println(fSta.getBlockSize()); System.out.println(fSta.getModificationTime()); System.out.println(fSta.getOwner()); System.out.println(fSta.getGroup()); System.out.println(fSta.getLen()); System.out.println(fSta.getPath()); System.out.println(fSta.isSymlink()); }
列出目录下的所有文件
@Test public void listFile() throws FileNotFoundException, IllegalArgumentException, IOException { RemoteIterator<LocatedFileStatus> iterator = fs.listFiles(new Path( HDFS_PATH + "/user/root/"), true); while (iterator.hasNext()) { System.out.println(iterator.next()); } FileStatus[] fss = fs.listStatus(new Path(HDFS_PATH + "/")); Path[] ps = FileUtil.stat2Paths(fss); for (Path p : ps) { System.out.println(p); } FileStatus sta = fs .getFileStatus(new Path( "hdfs://localhost:8020/user/root/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz")); BlockLocation[] bls = fs.getFileBlockLocations(sta, 0, sta.getLen()); for (BlockLocation b : bls) { for (String s : b.getTopologyPaths()) System.out.println(s); for (String s : b.getHosts()) System.out.println(s); }
在文件的后头追加东西 append
首先我们要设置hdfs支持在文件后头追加内容
在hdfs-site.xml 加入
<property> <name>dfs.support.append</name> <value>true</value> </property>代码实现是:
@Test public void appendFile() { String hdfs_path = "hdfs://localhost:8020/user/root/input/test.txt";// 文件路径 // conf.setBoolean("dfs.support.append", true); String inpath = "/test.txt1"; try { // 要追加的文件流,inpath为文件 InputStream in = new BufferedInputStream( new FileInputStream(inpath)); OutputStream out = fs.append(new Path(hdfs_path)); IOUtils.copyBytes(in, out, 4096, true); } catch (IOException e) { e.printStackTrace(); } }