Question
Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.
Machine 1 (sender) has the function:
string encode(vector<string> strs) {
// ... your code
return encoded_string;
}
Machine 2 (receiver) has the function:
vector<string> decode(string s) {
//... your code
return strs;
}
So Machine 1 does:
string encoded_string = encode(strs);
and Machine 2 does:
vector<string> strs2 = decode(encoded_string);
strs2
in Machine 2 should be the same as strs
in Machine 1.
Implement the encode
and decode
methods.
Note:
- The string may contain any possible characters out of 256 valid ascii characters. Your algorithm should be generalized enough to work on any possible characters.
- Do not use class member/global/static variables to store states. Your encode and decode algorithms should be stateless.
- Do not rely on any library method such as
eval
or serialize methods. You should implement your own encode/decode algorithm.
Solution 1 -- JSON format
第一种方法是参考的JSON的规则。
Encode: 我们将输入的字符串数组封装成JSON中的array。[ (left bracket) and ] (right bracket) 表示开头的结尾。中间的分隔用, (comma)。然后对于每个字符串,是 wrapped in double quotes。
由于字符串中本来就可能有双引号或是back slash (\),所以我们需要对这两种符号做转义。方法是多加一个back slash
如 原字符串 \"aafg" -> \\\"aafg\"
JSON里还有更复杂的字符串处理方法。但我们这里的目标只是让encode,再decode后的字符串相同,所以不必那么复杂。
Decode处理原则如下
1. 一个boolean的variable记录当前应该是下一个字符串的开头还是当前字符串的结束
2. 碰到bracket,根据是开始/结束,新建一个空字符串/将当前的字符串存入结果中
3. 碰到back slash,看它下一个元素是否是back slash / bracket,如果是,则将它下一个元素加到字符串中,计数加一。
public class Codec {
private final char start = '[';
private final char end = ']';
private final char include = '"';
private final char strSplit = ','; // Encodes a list of strings to a single string.
public String encode(List<String> strs) {
StringBuilder sb = new StringBuilder();
sb.append(start);
for (String str : strs) {
sb.append(include);
int len = str.length();
for (int i = 0; i < len; i++) {
char current = str.charAt(i);
if (current == '"' || current == '\\') {
sb.append('\\');
}
sb.append(current);
}
sb.append(include);
sb.append(strSplit);
}
sb.append(end);
return sb.toString();
} // Decodes a single string to a list of strings.
public List<String> decode(String s) {
List<String> result = new ArrayList<String>();
if (s == null || s.length() < 1) {
return result;
}
int len = s.length();
if (s.charAt(0) != start || s.charAt(len - 1) != end) {
return result;
}
boolean startSymbol = true;
StringBuilder sb = new StringBuilder();
for (int i = 1; i < len - 1; i++) {
char current = s.charAt(i);
if (current == include) {
if (startSymbol) {
sb = new StringBuilder();
} else {
result.add(sb.toString());
}
startSymbol = !startSymbol;
continue;
}
if (current == strSplit && startSymbol) {
continue;
}
if (current == '\\') {
char next = s.charAt(i + 1);
if (next == '\\' || next == '"') {
sb.append(next);
i++;
continue;
}
}
sb.append(current);
}
return result;
}
} // Your Codec object will be instantiated and called as such:
// Codec codec = new Codec();
// codec.decode(codec.encode(strs));
Solution 2
利用了Java里String的 int indexOf(int ch, int fromIndex)函数。
同时存入字符串和字符串的长度。
public class Codec { // Encodes a list of strings to a single string.
public String encode(List<String> strs) {
StringBuilder sb = new StringBuilder();
for (String str : strs) {
sb.append(str.length()).append('/').append(str);
}
return sb.toString();
} // Decodes a single string to a list of strings.
public List<String> decode(String s) {
List<String> result = new ArrayList<String>();
int length = s.length();
int i = 0;
while (i < length) {
int slash = s.indexOf('/', i);
int size = Integer.valueOf(s.substring(i, slash));
result.add(s.substring(slash + 1, slash + size + 1));
i = slash + size + 1;
}
return result;
}
}