用JavaScript计算字符串中的句子

已经有几个类似的问题:

> Splitting textarea sentences into array and finding out which sentence changed on keyup()
> JS RegEx to split text into sentences
> Javascript RegExp for splitting text into sentences and keeping the delimiter
> Split string into sentences in javascript

我的情况有些不同.

我需要计算字符串中句子的数量.

最接近我需要的答案是:

str.replace(/([.?!])\s*(?=[A-Z])/g, "$1|").split("|")

唯一的问题是此RegEx假定句子以大写字母开头,但情况并非总是如此.

更具体地说,我将句子定义为:

>以字母(无论是否大写),数字或符号(例如$或€)开头.
>以标点符号结尾,例如“.”,“?”或“!”.

但是,如果一个句子包含一个数字,而该数字本身包含一个“.”或一个“,”,则该句子应被视为一个句子而不是两个句子.

最后但并非最不重要的一点是,我们可以假设除第一个句子外,一个句子前面都有一个空格.

给定一个随机字符串,如何计算Javascript(或Cof​​feeScript)包含的句子数?

解决方法:

解决您的问题的一种正则表达式是:

\w[.?!](\s|$)

这些部分如下:

\w - Word character
\[.?!] - Punctuation as specified.
(\s|$) - Whitespace character OR the end of the string.

您也许可以使用角色类而不是组:

[\s|$]

对于最后一个元素,但这在https://regex101.com/上不起作用.

经过以下测试:

Contrary to popular belief, Lorem Ipsum is not simply random text. It
has roots in a piece of classical Latin literature from 45 BC, making
it over 2000 years old. Richard McClintock, a Latin professor at
Hampden-Sydney College in Virginia, looked up one of the more obscure
Latin words, consectetur, from a Lorem Ipsum passage, and going
through the cites of the word in classical literature, discovered the
undoubtable source. Lorem Ipsum comes from sections 1.10.32 and
1.10.33 of “de Finibus Bonorum et Malorum” (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the
theory of ethics, very popular during the Renaissance. The first line
of Lorem Ipsum, “Lorem ipsum dolor sit amet..”, comes from a line in
section 1.10.32.

并找到六个句子(将句子结尾加粗,而不是实际匹配).请注意,如果出于任何原因依赖不同的分组,则可能会带来问题.

上一篇:javascript-Coffeescript隐式返回对性能和副作用的影响


下一篇:javascript-根据表单输入生成动态网址