Given a string representing a code snippet, you need to implement a tag validator to parse the code and return whether it is valid. A code snippet is valid if all the following rules hold:
- The code must be wrapped in a valid closed tag. Otherwise, the code is invalid.
- A closed tag (not necessarily valid) has exactly the following format :
<TAG_NAME>TAG_CONTENT</TAG_NAME>
. Among them,<TAG_NAME>
is the start tag, and</TAG_NAME>
is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is valid if and only if the TAG_NAME and TAG_CONTENT are valid. - A valid
TAG_NAME
only contain upper-case letters, and has length in range [1,9]. Otherwise, theTAG_NAME
is invalid. - A valid
TAG_CONTENT
may contain other valid closed tags, cdata and any characters (see note1) EXCEPT unmatched<
, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, theTAG_CONTENT
is invalid. - A start tag is unmatched if no end tag exists with the same TAG_NAME, and vice versa. However, you also need to consider the issue of unbalanced when tags are nested.
- A
<
is unmatched if you cannot find a subsequent>
. And when you find a<
or</
, all the subsequent characters until the next>
should be parsed as TAG_NAME (not necessarily valid). - The cdata has the following format :
<![CDATA[CDATA_CONTENT]]>
. The range ofCDATA_CONTENT
is defined as the characters between<![CDATA[
and the first subsequent]]>
. -
CDATA_CONTENT
may contain any characters. The function of cdata is to forbid the validator to parseCDATA_CONTENT
, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as regular characters.
Valid Code Examples:
Input: "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
Output: True
Explanation:
The code is wrapped in a closed tag : <DIV> and </DIV>.
The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata.
Although CDATA_CONTENT has unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as tag.
So TAG_CONTENT is valid, and then the code is valid. Thus return true.
Input: "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
Output: True
Explanation:
We first separate the code into : start_tag|tag_content|end_tag.
start_tag -> "<DIV>"
end_tag -> "</DIV>"
tag_content could also be separated into : text1|cdata|text2.
text1 -> ">> ![cdata[]] "
cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>"
text2 -> "]]>>]"
The reason why start_tag is NOT "<DIV>>>" is because of the rule 6. The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7.
Invalid Code Examples:
Input: "<A> <B> </A> </B>" Output: False Explanation: Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa. Input: "<DIV> div tag is not closed <DIV>" Output: False Input: "<DIV> unmatched < </DIV>" Output: False Input: "<DIV> closed tags with invalid tag name <b>123</b> </DIV>" Output: False Input: "<DIV> unmatched tags with invalid tag name </1234567890> and <CDATA[[]]> </DIV>" Output: False Input: "<DIV> unmatched start tag <B> and unmatched end tag </C> </DIV>" Output: False
Note:
- For simplicity, you could assume the input code (including the any characters mentioned above) only contain
letters
,digits
,'<'
,'>'
,'/'
,'!'
,'['
,']'
and' '
.
给定一个表示代码片段的字符串,你需要实现一个验证器来解析这段代码,并返回它是否合法。合法的代码片段需要遵守以下的所有规则:
- 代码必须被合法的闭合标签包围。否则,代码是无效的。
- 闭合标签(不一定合法)要严格符合格式:
<TAG_NAME>TAG_CONTENT</TAG_NAME>
。其中,<TAG_NAME>
是起始标签,</TAG_NAME>
是结束标签。起始和结束标签中的 TAG_NAME 应当相同。当且仅当 TAG_NAME 和 TAG_CONTENT 都是合法的,闭合标签才是合法的。 - 合法的
TAG_NAME
仅含有大写字母,长度在范围 [1,9] 之间。否则,该TAG_NAME
是不合法的。 - 合法的
TAG_CONTENT
可以包含其他合法的闭合标签,cdata (请参考规则7)和任意字符(注意参考规则1)除了不匹配的<
、不匹配的起始和结束标签、不匹配的或带有不合法 TAG_NAME 的闭合标签。否则,TAG_CONTENT
是不合法的。 - 一个起始标签,如果没有具有相同 TAG_NAME 的结束标签与之匹配,是不合法的。反之亦然。不过,你也需要考虑标签嵌套的问题。
- 一个
<
,如果你找不到一个后续的>
与之匹配,是不合法的。并且当你找到一个<
或</
时,所有直到下一个>
的前的字符,都应当被解析为 TAG_NAME(不一定合法)。 - cdata 有如下格式:
<![CDATA[CDATA_CONTENT]]>
。CDATA_CONTENT
的范围被定义成<![CDATA[
和后续的第一个]]>
之间的字符。 -
CDATA_CONTENT
可以包含任意字符。cdata 的功能是阻止验证器解析CDATA_CONTENT
,所以即使其中有一些字符可以被解析为标签(无论合法还是不合法),也应该将它们视为常规字符。
合法代码的例子:
输入: "<DIV>This is the first line <![CDATA[<div>]]></DIV>" 输出: True 解释: 代码被包含在了闭合的标签内: <DIV> 和 </DIV> 。 TAG_NAME 是合法的,TAG_CONTENT 包含了一些字符和 cdata 。 即使 CDATA_CONTENT 含有不匹配的起始标签和不合法的 TAG_NAME,它应该被视为普通的文本,而不是标签。 所以 TAG_CONTENT 是合法的,因此代码是合法的。最终返回True。 输入: "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>" 输出: True 解释: 我们首先将代码分割为: start_tag|tag_content|end_tag 。 start_tag -> "<DIV>" end_tag -> "</DIV>" tag_content 也可被分割为: text1|cdata|text2 。 text1 -> ">> ![cdata[]] " cdata -> "<![CDATA[<div>]>]]>" ,其中 CDATA_CONTENT 为 "<div>]>" text2 -> "]]>>]" start_tag 不是 "<DIV>>>" 的原因参照规则 6 。 cdata 不是 "<![CDATA[<div>]>]]>]]>" 的原因参照规则 7 。
不合法代码的例子:
输入: "<A> <B> </A> </B>" 输出: False 解释: 不合法。如果 "<A>" 是闭合的,那么 "<B>" 一定是不匹配的,反之亦然。 输入: "<DIV> div tag is not closed <DIV>" 输出: False 输入: "<DIV> unmatched < </DIV>" 输出: False 输入: "<DIV> closed tags with invalid tag name <b>123</b> </DIV>" 输出: False 输入: "<DIV> unmatched tags with invalid tag name </1234567890> and <CDATA[[]]> </DIV>" 输出: False 输入: "<DIV> unmatched start tag <B> and unmatched end tag </C> </DIV>" 输出: False
注意:
- 为简明起见,你可以假设输入的代码(包括提到的任意字符)只包含
数字
, 字母,'<'
,'>'
,'/'
,'!'
,'['
,']'
和' '
。
Runtime: 20 ms Memory Usage: 19.8 MB
1 class Solution { 2 func isValid(_ code: String) -> Bool { 3 var st:[String] = [String]() 4 var i:Int = 0 5 while(i < code.count) 6 { 7 if i > 0 && st.isEmpty 8 { 9 return false 10 } 11 if code.subString(i, 9) == "<![CDATA[" 12 { 13 var j:Int = i + 9 14 i = code.find("]]>",j) 15 if i < 0 {return false} 16 i += 2 17 } 18 else if code.subString(i, 2) == "</" 19 { 20 var j:Int = i + 2 21 i = code.find(">",j) 22 if i < 0 {return false} 23 var tag:String = code.subString(j, i - j) 24 if st.isEmpty || st.last! != tag 25 { 26 return false 27 } 28 st.popLast() 29 } 30 else if code.subString(i, 1) == "<" 31 { 32 var j:Int = i + 1 33 i = code.find(">",j) 34 if i < 0 || i == j || i - j > 9 35 { 36 return false 37 } 38 for k in j..<i 39 { 40 if code[k] < "A" || code[k] > "Z" 41 { 42 return false 43 } 44 } 45 var tag:String = code.subString(j, i - j) 46 st.append(tag) 47 } 48 i += 1 49 } 50 return st.isEmpty 51 } 52 } 53 54 //String扩展 55 extension String { 56 //subscript函数可以检索数组中的值 57 //直接按照索引方式截取指定索引的字符 58 subscript (_ i: Int) -> Character { 59 //读取字符 60 get {return self[index(startIndex, offsetBy: i)]} 61 } 62 63 // 截取字符串:指定索引和字符数 64 // - begin: 开始截取处索引 65 // - count: 截取的字符数量 66 func subString(_ begin:Int,_ count:Int) -> String { 67 let start = self.index(self.startIndex, offsetBy: max(0, begin)) 68 let end = self.index(self.startIndex, offsetBy: min(self.count, begin + count)) 69 return String(self[start..<end]) 70 } 71 72 // 截取字符串:从index到结束处 73 // - Parameter index: 开始索引 74 // - Returns: 子字符串 75 func subStringFrom(_ index: Int) -> String { 76 let theIndex = self.index(self.endIndex, offsetBy: index - self.count) 77 return String(self[theIndex..<endIndex]) 78 } 79 80 //从0索引处开始查找是否包含指定的字符串,返回Int类型的索引 81 //返回第一次出现的指定子字符串在此字符串中的索引 82 func find(_ sub:String)->Int { 83 var pos = -1 84 if let range = range(of:sub, options: .literal ) { 85 if !range.isEmpty { 86 pos = self.distance(from:startIndex, to:range.lowerBound) 87 } 88 } 89 return pos 90 } 91 92 //从指定索引处开始查找是否包含指定的字符串,返回Int类型的索引 93 //返回第一次出现的指定子字符串在此字符串中的索引 94 func find(_ sub:String,_ begin:Int)->Int { 95 var str:String = self.subStringFrom(begin) 96 var pos:Int = str.find(sub) 97 return pos == -1 ? -1 : (pos + begin) 98 } 99 }