数据压缩课程作业——LZW压缩算法完整仿真代码和分析

2023-08-10 15:47:58

前言

关于数字压缩课程的作业记录，附完整代码

一、算法描述

1.1 算法特点
LZW压缩算法是一种无损数据压缩算法。在众多的压缩技术中，LZW算法是一种通用的、性能优良并得到广泛应用的压缩算法，它是一种完全可靠的算法，与其他算法相比，往往具有更高的压缩效率。LZW算法保留了LZ码的自适应功能，压缩比也大致相同，其显著特点是逻辑简单（典型的LZW算法处理1个字符不超过三个时钟周期）、硬件实现廉价、运算速度快，在消息长度为一万字符数量级时可以得到令人满意的压缩效果。
1.2 算法基本原理
LZW压缩算法提取原始文本文件数据中的不同字符，基于这些字符创建一个编译表，然后用编译表中的字符的索引来替代原始文本文件数据中的相应字符，减少原始数据大小[1]。看起来和调色板图象的实现原理差不多，但是应该注意到的是，我们这里的编译表不是事先创建好的，而是根据原始文件数据动态创建的，解码时还要从已编码的数据中还原出原来的编译表.

LZW压缩有三个重要的对象：数据流（CharStream）、编码流（CodeStream）和编译表（String Table）。在编码时，数据流是输入对象（文本文件的据序列），编码流就是输出对象（经过压缩运算的编码数据）；在解码时，编码流则是输入对象，数据流是输出对象；而编译表是在编码和解码时都须要用借助的对象。

字符（Character）：最基础的数据元素，在文本文件中就是一个字节，在光栅数据中就是一个像素的颜色在指定的颜色列表中的索引值；
字符串（String）：由几个连续的字符组成；
前缀（Prefix）：也是一个字符串，不过通常用在另一个字符的前面，而且它的长度可以为0；
根（Root）：一个长度的字符串；
编码（Code）：一个数字，按照固定长度（编码长度）从编码流中取出，编译表的映射值；图案：一个字符串，按不定长度从数据流中读出，映射到编译表条目[2]。

二、算法流程

2.1 编码过程[3]

Step1：将词典初始化为包含所有可能的单字符，当前前缀P初始化为空。
Step2：当前字符C：=字符流中的下一个字符。
Step3：判断P+C是否在词典中，如果“是”，则用C扩展P，即让P：=P+C，返回到step2，如果“否”，则输出与当前前缀P相对应的码字W；将P+C添加到词典中，令P:=C，并返回到step2.
Step4：判断字符流中是否还有字符，如果“是”，就返回step2，如果否，就把代表当前前缀P的码字输出到码字流。
LZW压缩算法流程图:
LZW算法举例

2.2解码过程

数据的解码，其实就是数据编码的逆向过程，要从已经编译的数据（编码流）中找出编译表，然后对照编译表还原数据。
：词典中包含所有的前缀根（即每个字符组成词典）
：译码时先记住先前码字（pW）
：从码字流中读出先前的码字（cW）
：输出当前码字（cW）对应的单词，string（cW）
：string（pW）与string（cW）的第一个字符合在一起，放入词典中。
LZW解码过程流程图

三、实验数据和平台

编程环境：VC++ 6.0
计算机配置：Windows 10
实验数据类型：
实验数据1：字符串类型 “ ababcbababaaaaaaa”
实验数据2：一段英文文献 “ Data compression has an undeserved reputation for being difficult to master, hard to implement, and tough to maintain. In fact, the techniques used in the previously mentioned programs are relatively simple, and can be implemented with standard utilities taking only a few lines of code. This article discusses a good all-purpose data compression technique: Lempel-Ziv-Welch, or LZW compression.”
实验数据3：浮点数组

四、实验结果

4.1实验数据1：

当我们对书上的例子一串由三个字母组成的字符串进行LZW压缩时：

发现和书上得到的结果是一致的，代码的正确性得到保障。
信息熵 H(x) =1.166

4.2实验数据2:

实验数据3:

代码如下：

/*


*	描述：LZW编码译码算法

*
*/

#include<iostream>

#include<cstring>

#define N 1000

using namespace std;

template <class T>

int getArrayLen(T& array){
    //使用模板定义一 个函数getArrayLen,该函数将返回数组array的长度
    return (sizeof(array) / sizeof(array[0]));
}




class LZW{           //LZW算法类

public:

	char encodeStr[N];		//要编码的字符串 输入 

	int decodeList[N];		//要译码的数组

	int firstDictionaryNum; //先前词典的大小  

	int length;				//当前词典的大小  

	char dictionary[N][N];	//先前词典 
	

	LZW()          //构造函数 
	{					
		encodeStr[0] = '\0';
        for(int i=0;i<N;i++)
		{

			this->decodeList[i]=-INT_MAX;  //初始化

		}
		for(int ii=0;ii<N;ii++){

			this->dictionary[ii][0] = '\0'; 

		}
    	firstDictionaryNum = 0;
    	length = 0;

	}



	bool initDictionary() 		//初始化先前词典
	{	

		if(encodeStr[0]=='\0')
		{		             	//若没有要编码的字符串，则不能生成先前词典 

			return false;

		}

		dictionary[0][0] = encodeStr[0];//将要编码的字符串的第一个字符加入先前词典 

		dictionary[0][1] = '\0';

		length = 1;

		int i,j; 

		for(i=1;encodeStr[i]!='\0';i++){//将要编码的字符串中所有不同的字符加入先前词典 

			for(j=0;dictionary[j][0]!='\0';j++){

				if(dictionary[j][0]==encodeStr[i]){

					break;

				}

			}

			if(dictionary[j][0]=='\0'){

				dictionary[j][0] = encodeStr[i];

				dictionary[j][1] = '\0';

				length++;

			}

		}

		firstDictionaryNum = length;			//先前词典的大小

		return true;

	}
	

	void Encode() 	 		 	//编码

	{

		for(int g=0;g<firstDictionaryNum;g++)
		{	//先前词典中的初始编码没有输出编码，故设置为-1 
			decodeList[g]=-1;
		}
		int num = firstDictionaryNum;
		char *q,*p,*c;
		q =  encodeStr;						  	//q为标志指针，用来确认位置的 
    	p =  encodeStr; 						//p指针作为字符串匹配的首字符 
    	c = p;									//通过不断移动c指针实现匹配 
    	while(p-q != strlen(encodeStr))//若还没匹配完所有字符，则循环
		{		 

			int i,j;
			int index=0;
            for(i=0;dictionary[i][0]!='\0'&&c-q!=strlen(encodeStr);i++)//通过不断向后移动c指针实现匹配
			{
                char temp[N]; 
                strncpy(temp,p,c-p+1);			//每添加一个匹配字符，则已匹配字符串temp增加一个字符 
                temp[c-p+1]='\0';
                 if(strcmp(temp,dictionary[i])==0)//字符匹配成功
				 { 
                    c++;
					index = i;
				 }
			}

			decodeList[num++]=index;			//遇到一个不匹配的字符或者已经没有字符可以匹配，则输出已匹配的字符串 
			if(c-q!=strlen(encodeStr))//若到一个不匹配的字符且还有字符未匹配，则说明出现了新的词典字段，加入词典 
			{			
				strncpy(dictionary[length],p,c-p+1);
				dictionary[length][c-p+1]='\0';
				length++;
			}
			p = c;								//匹配下一个时，p指向c的指向 

		}

	} 

	

	void Decode()    			//译码 

	{

		for(int i=1;decodeList[i]!=-INT_MAX;i++){	//根据输入代码来循环 

			if(decodeList[i]<=length){				//若出现输入代码在先前词典可以找到，则输出上一个输出的全部+当前输出的第一个 

				strcpy(dictionary[length],dictionary[decodeList[i-1]-1]);

				char temp[2];

				strncpy(temp,dictionary[decodeList[i]-1],1);

				temp[1]='\0';

				strcat(dictionary[length],temp);

			}

			else{									//若出现输入代码在先前词典找不到，则输出上一个输出的全部+上一个输出的第一个 

				strcpy(dictionary[length],dictionary[decodeList[i-1]-1]);

				char temp[2];

				strncpy(temp,dictionary[decodeList[i-1]-1],1);

				temp[1]='\0';

				strcat(dictionary[length],temp);

			}

			length++;

		}

	}	

};

 

 

int main(){



		cout<<"\n\t1.编码\t\t2.译码\t\t3.退出\n\n";
	    
		cout<<"请选择:";
		char x;
		cin>>x;
		LZW lzw;

		if(x=='1'){

			cout<<"请输入要编码的字符串:";

			char a[] = {"ccccddcccccadccabbccbccacb"}  ;
           
            strcpy(lzw.encodeStr,a);
 

			//cin>>lzw.encodeStr;
			cout<<lzw.encodeStr<<endl<<endl;
			if(lzw.initDictionary()==false)
			{

				cout<<"请先设置要编码的字符串encodeStr属性"<<endl;

			}

			lzw.Encode();	//开始编码

			cout<<endl<<"编码过程为:"<<endl<<endl;

			cout<<"\t索引号\t\t\t词典\t\t\t输出"<<endl;

			for(int i=0;i<lzw.length;i++)
			{

				cout<<"\t"<<i+1<<"\t\t\t"<<lzw.dictionary[i]<<"\t\t\t"; 

				if(i>=lzw.firstDictionaryNum)
				{

				cout<<lzw.decodeList[i]+1;
				

				}

				cout<<endl;

			}

			cout<<"\t- \t\t\t - \t\t\t"<<lzw.decodeList[lzw.length]+1<<endl<<endl<<endl;


			//计算相关参数：
			cout<<"压缩后的序列为：";

			for(i = lzw.firstDictionaryNum;i<=lzw.length;i++)
			{
				cout<<lzw.decodeList[i]+1<<" ";
			}
			cout<<endl;


			

		} 

		/*else if(x=='2'){

			cout<<"请按顺序输入初始先前词典：（例：1 A）(输入0结束)"<<endl;

			int tempNum;

			cin>>tempNum;

			int index = 1; 

			while(tempNum!=0){

				if(tempNum<0){

					cout<<"输入序号错误,重新输入该行"<<endl<<endl;

					getchar();//两个getchar()是删除掉该行已经输入的字符 

					getchar();

					cin>>tempNum;

					continue;

				}

				if(tempNum!=index){

					cout<<"请以递增顺序输入序号,重新输入该行"<<endl<<endl;

					getchar();

					getchar();

					cin>>tempNum;

					continue;

				}

				cin>>lzw.dictionary[tempNum-1];

				cin>>tempNum;

				index++; 

			}

			lzw.firstDictionaryNum = index-1;

			lzw.length = lzw.firstDictionaryNum; 

			

			cout<<endl<<"请输入要译的编码(输入0结束):"<<endl<<endl;

			int temp;

			int j=0;

			cin>>temp;

			while(temp!=0){

				if(temp<0){

					cout<<"输入要译的编码错误,重新输入该编码"<<endl<<endl;

					cin>>temp;

					continue;

				}

				lzw.decodeList[j] = temp;

				j++;

				cin>>temp;

			}

			lzw.Decode();	//开始译码 

			cout<<endl<<"译码过程为:"<<endl<<endl;

			

			cout<<"    输入代码\t\t索引号\t\t词典\t\t输出"<<endl;

			for(int i=0;i<lzw.firstDictionaryNum;i++){

				cout<<"      \t\t\t   "<<i+1<<"  \t\t "<<lzw.dictionary[i]<<endl;

			}

			cout<<"\t"<<lzw.decodeList[0]<<"\t\t   -\t\t -\t\t  "<<lzw.dictionary[lzw.decodeList[0]-1]<<endl;

			for(int i1=1;lzw.decodeList[i1]!=-INT_MAX;i1++){

				cout<<"\t"<<lzw.decodeList[i1]<<"\t\t   "<<i1+3<<"  \t\t "<<lzw.dictionary[i1+3-1]<<"\t\t  "<<lzw.dictionary[lzw.decodeList[i1]-1]<<endl;

			}
	

			cout<<endl<<endl;

		}

		else if(x=='3'){

			

		} */

		else{

			cout<<"请输入正确的选择!"<<endl<<endl<<endl;

		}

	

	return 0;

}

五、实验结论

LZW算法不但速度快，而且对各种计算机文件都有较好的压缩效果。针对文本文件，重复度越高，所达成的编码效率越高，压缩比也越大。相比较而言，LZW算法对浮点数据的压缩比要低于对英文段落的压缩比。

六、参考文献

[1]杨国梁,张光年.无损LZW压缩算法及实现[J].首都师范大学学报(自然科学版),2004(S1):11-13+17.
[2] https://wenku.baidu.com/view/dd4f06980912a216147929d4.html
[3]郭斌. 视频压缩中的熵编码研究[D].浙江大学,2007.
W算法对浮点数据的压缩比要低于对英文段落的压缩比。

码农公寓

前言