Huffman编码的原理这里就不说了,网上随处都是。这里来讲讲利用Huffman编码来进行压缩和解压缩的具体实现吧。本工程使用java实现。
编码
1. 流程图
2. 数据结构
CharacterWeight:记录字符值,以及其在待压缩文件中的权重。
Class{ char c; //字符值 int weight; //在文件中权重 String code; //其对应huffman编码 }
HuffmanNode:huffman树中的节点信息。
Class{ Int parent; //父节点 Int lChild; //左儿子 Int rChild; //右儿子 Int weight; //权重 }
3. 程序关键点
3.1 Huffman树的构建
Huffman树的变量:ArrayList<HuffmanNode> list;
创建流程图:
for(int i=0;i<list.size()-1;i++){ //w1 : the first min weight w2: the second min weight //i1 : the first min weight index, i2: the second min weight index int w1 = MAX_VALUE, w2=MAX_VALUE; int i1 = 0, i2 = 0; // find the two node with the minimum weight for(int j=0;j<tree.size();j++){ HuffmanNode node = tree.get(j); if(node.getWeight()< w1 && node.getParent()==-1){ w2 = w1; w1 = node.getWeight(); i2 = i1; i1 = j; } else if(node.getWeight()<w2 && node.getParent()==-1){ w2 = node.getWeight(); i2 = j; } } //set the two node to be the children of a new node, and add the new node to the tree HuffmanNode pNode = new HuffmanNode(w1+w2); pNode.setlChild(i1); pNode.setrChild(i2); tree.add(pNode); tree.get(i1).setParent(tree.indexOf(pNode)); tree.get(i2).setParent(tree.indexOf(pNode));}
3.2 根据Huffman 树获得Huffman编码
从叶子节点开始网上遍历Huffman树,直到到达根节点,根据当前节点为其父节点的左儿子还是右儿子确定这一位值是0还是1。最后将依次获得的0,1字符串反转获得Huffman编码。
代码:
for(int i=0;i<list.size();i++){ HuffmanNode node = tree.get(i); HuffmanNode pNode = tree.get(node.getParent()); String code =""; while(true){ if(pNode.getlChild()==tree.indexOf(node)){ code = "0"+code; } else if(pNode.getrChild() == tree.indexOf(node)){ code = "1"+code; } else { System.out.println("Tree Node Error!!!"); return null; } node=pNode; if(node.getParent()!=-1) pNode=tree.get(node.getParent()); else break; } list.get(i).setCode(new String(code)); }3.3 文件头设计
字符总数 |
Int 四个字节 |
字符种类数 |
Short 两个字节 |
叶子节点 |
char字符 short 父节点 3个字节 |
非叶子节点 |
Short 左儿子 short 右儿子 short父节点 6字节 |
文件头长度(单位: byte)
l= 9n其中n 为字符种类数。
3.4文件内容的编码和写入
while((temp=reader.read())!=-1){ //!= EOF // get the code from the code table String code = codeTable.get((char)temp); c++; if(c>=count/96){ System.out.print("="); c=0; } try{ StringBuilder codeString = new StringBuilder(code); outputStringBuffer.append(codeString); while(outputStringBuffer.length()>8){ out.write(Short.parseShort(outputStringBuffer.substring(0, 8),2)); outputStringBuffer.delete(0, 8); } } catch(Exception e){ e.printStackTrace(); } }
解码
1. 流程图
2. 数据结构
HuffmanNode:huffman树中的节点信息。
Class{ Int parent; //父节点 Int lChild; //左儿子 Int rChild; //右儿子 Int weight; //权重 Char c; //对应的字符值 |
3. 程序关键点
3.1 重建Huffman树。在文件头中存放的原本就是Huffman树的节点信息,所以重建Huffman树是比较简单的。
代码:in = new DataInputStream(new FileInputStream(file)); count = in.readInt(); charNum = in.readShort(); nodeNum = 2*charNum -1; //rebuild the huffman tree for(int i=0;i<charNum;i++){ HuffmanNode node = new HuffmanNode((char)in.readByte()); int parent = in.readShort(); node.setParent(parent); tree.add(node); } for(int i=charNum;i<nodeNum;i++){ HuffmanNode node = new HuffmanNode(' '); int l = in.readShort(); int r = in.readShort(); int p = in.readShort(); node.setlChild(l); node.setrChild(r); node.setParent(p); tree.add(node); }
3.2 解码
解码流程图
while(true){ while(buff.length()<32){ temp = in.readInt(); String codeString = Integer.toBinaryString(temp); while(codeString.length()<32){ codeString='0'+codeString; } buff.append(codeString); } node = tree.get(tree.size()-1); dep = 0; while(!(node.getlChild()==-1&&node.getrChild()==-1)){ if(dep>=buff.length()){ System.out.println( "Buff overflow"); } if(buff.charAt(dep)=='0'){ node = tree.get(node.getlChild()); } else if(buff.charAt(dep)=='1'){ node = tree.get(node.getrChild()); } else{ System.out.println("Coding error"); } dep++; } char c = node.getCH(); num++; if(num>=n/99){ System.out.print("="); num=0; } count++; if(count>=n){ break; } charBuff+=c; if(charBuff.length()>256){ writer.write(charBuff); charBuff=""; } buff.delete(0, dep); } } catch(EOFException e){ //just do nothing } catch(Exception e){ e.printStackTrace(); } finally{ //there may be data released in the buff and charbuff, so we need to process them while(buff.length()>0){ node = tree.get(tree.size()-1); dep = 0; while(!(node.getlChild()==-1&&node.getrChild()==-1)){ if(dep>=buff.length()){ break; } if(buff.charAt(dep)=='0'){ node = tree.get(node.getlChild()); } else if(buff.charAt(dep)=='1'){ node = tree.get(node.getrChild()); } else{ System.out.println("Coding error"); //return; } dep++; } char c = node.getCH(); num++; if(num>=n/99){ System.out.print("="); num=0; } count++; if(count>=n){ break; } charBuff+=c; if(charBuff.length()>256){ try { writer.write(charBuff); } catch (IOException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } charBuff=""; } buff.delete(0, dep); } try { writer.write(charBuff); writer.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } try{ writer.close(); } catch(IOException e){ throw e; }
完整工程就不放出了,以后再更新吧。