Loading... ### 1. LZW基础概念 之前提到的算术编码、霍夫曼编码等技术集中在消除编码的冗余上,而本文要讲的**LZW编码**是一种针对**空间冗余**的无误差压缩方法。 LZW算法o又叫“串表压缩算法”,就是通过建立一个将**字符串和其对应的记号**构成的表(把已经出现过的字符串映射到记号上),用较短的代码来表示较长的字符串来实现压缩。 需要注意的是,LZW算法中字符串和记号的对应关系是在压缩的过程中动态生成的,并且隐含在压缩数据中,解压的时候也是一步一步还原编码并动态生成字典的过程。 ### 2. LZW算法详解 #### 2.1 LZW编码算法 编码器从字符串中不断地读入新的字符,并且将字符或字符串映射到新的记号,以动态地构建编码字典。在编码过程中,我们维护两个变量,即`previous_char`(表示目前已有的,但未被编码的字符)和`current_char`(表示刚刚读入的字符)。编码算法流程如下 - (1)初始状态,`previous_char`和`current_char`都是空的。 - (2)读入新的字符current_char,并于previous合并成新的字符串`p_and_c`(previous_char+current_car) - (3)在编码字典里查找`p_and_c`,如果: - `p_and_c`在字典里,previous_char = p_and_c - `p_and_c`不在字典里,将`previous_char`的标记输出;在字典中建立`p_and_c`的映射;更新`previous_char = current_char`。 - (4)返回步骤(2),重复直至读完字符串。 #### 2.2 Example > ababcababac 初始状态字典里有三个默认的映射 | String | Symbol | | :----: | :----: | | a | 0 | | b | 1 | | c | 2 | 开始编码 | previous_char | current_char | p_and_c | p_and_c in dict | Action | output | | :-----------: | :----------: | :-----: | :-------------: | :-----------------------------------: | :----: | | "" | a | a | Y | previous_char = a | - | | a | b | ab | N | 添加映射{'ab', 3} previous_char = b | 0 | | b | a | ba | N | 添加映射{'ba', 4} | 1 | | a | b | ab | Y | previous_char = ab | - | | ab | c | abc | N | 添加映射{'abc', 5} previous_char = c | 3 | | c | a | ca | N | 添加映射{'ca', 6} previous_char = a | 2 | | a | b | ab | Y | previous_char = ab | - | | ab | a | aba | N | 添加映射{'aba', 7} previsou_char = a | 3 | | a | b | ab | Y | previous_char = ab | - | | ab | a | aba | Y | previous_char = aba | - | | aba | c | abac | N | 添加映射{'abac', 8} previous_char = c | 7 | | c | - | - | - | - | 2 | 编码结束,输出结果为`0132372`,编码过程中生成的字典为: | String | Symbol | | :----: | :----: | | a | 0 | | b | 1 | | c | 2 | | ab | 3 | | ba | 4 | | abc | 5 | | ca | 6 | | aba | 7 | | abac | 8 | #### 2.2 LZW解码算法 解码器输入的是压缩后的数据。同时,我们将维护两个变量`previous_symbol`(目前已有的但未被处理的记号),`current_symbol`(当前读入的记号)。算法流程如下 - (1)初始状态,`previous_symbol`和`current_symbol`都为空。 - (2)读入第一个记号,并赋给`current_symbol`,并解码直接输出。因为第一个字符为单个字符,一定可以直接解码。 - (3)赋值,`previous_symbol = current_symbol` - (4)在字典里查找`current_symbol`,如果: - `current_symbol`在字典里: - a. 将`current_symbol`解码的结果输出 - b. `previous_char = dict.get(previous_symbol)`, `current_char = dict.get(current_symbol)[0]` 。注意,这里`current_char`为`current_symbol`解码的第一个字符。 - c. 将`p_and_c`添加到新的映射 - `current_symbol`不在字典里: - a. `previous_char = dict.get(previous_symbol)`, `current_char = dict.get(previous_symbol)[0]` 。注意,这里`current_char`为`previous_symbol`解码的第一个字符。 - b. 将`p_and_c`添加到新的映射 - c. 输出`p_and_c` - 返回步骤(3),直至读完所有记号。 #### 2.4 Example > 0 1 3 2 3 7 2 初始状态字典里有三个默认的映射 | Symbol | String | | :----: | :----: | | 0 | a | | 1 | b | | 2 | c | 开始解码: 首先读取第一个记号,`current_symbol = 0`,输出解码结果`dict.get(current_symbol)`,即`a`赋值`previous_symbol = current_symbol`,然后循环读取之后的记号。 | previous_symbol | current_symbol | current_symbol in dict | Action | output | | --------------- | -------------- | ---------------------- | ------------------------------------------------------------ | ------ | | 0 | 1 | Y | previous_char = a current_char = b p_and_c = 'ab' 添加映射{3, 'ab'} | b | | 1 | 3 | Y | previous_char = b current_char = a p_and_c = 'ba' 添加映射{4. 'ba'} | ab | | 3 | 2 | Y | previous_char = ab current_char = c p_and_c = 'abc' 添加映射{5, 'abc'} | c | | 2 | 3 | Y | previous_char = c current_char = a p_and_c = ca 添加映射{6. 'ca'} | ab | | 3 | 7 | N | previous_char = ab current_char = a p_and_c = aba 添加映射{7, 'aba'} | aba | | 7 | 2 | Y | previous_char = aba current_char = c P_and_c = abac 添加映射{8, 'abac'} | c | 解码结束,输出结果为`ababcababac`。解码过程中生成的字典为 | Symbol | String | | :----: | :----: | | 0 | a | | 1 | b | | 2 | c | | 3 | ab | | 4 | ba | | 5 | abc | | 6 | ca | | 7 | aba | | 8 | abac | #### 2.5 代码实现(python) ```python def encode_init(): k = 0 char_num_dict = {} for i in range(97,123): char_num_dict[chr(i)] = k k += 1 # print(char_num_dict.items()) return char_num_dict def decode_init(): k = 0 num_char_dict = {} for i in range(97,123): num_char_dict[str(k)] = chr(i) k += 1 # print(char_num_dict.items()) return num_char_dict def LZW_encode(strList, char_num_dict): symbol = 26 # 标记 previous_char = "" # 已有的,但未被编码的字符 current_char = "" # 当前读入的字符 p_and_c = "" output = "" # 输出的编码序列 for s in strList: current_char = s p_and_c = previous_char + current_char # print(previous_char, s) if char_num_dict.get(p_and_c) != None: previous_char = p_and_c else: output += str(char_num_dict.get(previous_char)) + '-' char_num_dict[p_and_c] = symbol symbol += 1 previous_char = current_char output += str(char_num_dict[s[-1]]) print(char_num_dict.items()) return output def LZW_decode(output, num_char_dict): output_list = output.split('-') print(output_list) symbol = 26 previous_symbol = "" current_symbol = "" previous_char = "" current_char = "" decodeList = "" decodeList += num_char_dict.get(output_list[0]) previous_symbol = output_list[0] for o in output_list[1:]: current_symbol = o # print(previous_symbol, current_symbol) if num_char_dict.get(current_symbol) != None: decodeList += num_char_dict.get(current_symbol) previous_char = num_char_dict.get(previous_symbol) current_char = num_char_dict.get(current_symbol)[0] num_char_dict[str(symbol)] = previous_char + current_char symbol += 1 else: previous_char = num_char_dict.get(previous_symbol) current_char = num_char_dict.get(previous_symbol)[0] num_char_dict[str(symbol)] = previous_char + current_char symbol += 1 decodeList += previous_char + current_char previous_symbol = current_symbol # print(num_char_dict.items()) return decodeList if __name__ == '__main__': strList = input("请输入字符序列:").lower() print('字符序列长度:', len(strList)) char_num_dict = encode_init() output = LZW_encode(strList, char_num_dict) print(output) num_char_dict = decode_init() # print(num_char_dict.items()) decodeList = LZW_decode(output, num_char_dict) print(decodeList) print("编码长度:", len(output) / 2) ``` Last modification:June 28th, 2020 at 11:39 am © 允许规范转载