|
原帖由 tangqingfu 于 2010-8-21 10:23 发表
好像有些问题:
比如:
I'm OK. I like English. I’d like English. English is my favorite. That's Ok. That is right. Those are buses. These are bikes
用sylun兄8楼所说的操作,计算结果如下:
共有如下1 ...
关于这样的问题,其实前面也有所提及。表达式的匹配是含一个字母的字符串的,但不含'与s的组合(我想楼主应该知道为何要剔除这样的组合)。至于I’d之类的词,其中的撇号(')应是标准的39号编码字符,而楼主所用的可能是非标准的(外形上有直与弯之别)。
如下代码考虑了一个字母的单词及带弯撇号“单词”的问题,可试试,也可自行修改。至于其他有关单词的特殊判断问题,我不想再深入讨论了。抱歉。
Sub test2()
Dim a As String, Dic As Object
Dim myReg As Object, Matches As Object, Match As Object
Dim k, i As Long, j As Long, temp As String, c As String
a = ActiveDocument.Content.Text
Set Dic = CreateObject("Scripting.Dictionary")
Dic.CompareMode = vbTextCompare
Set myReg = CreateObject("VBScript.RegExp")
With myReg
.Pattern = "[A-Za-z]+(['" & ChrW(8217) & "][A-RT-Za-rt-z]+|[A-Za-z]?)"
.Global = True
Set Matches = .Execute(a)
For Each Match In Matches '统计频次
With Match
If Len(.Value) > 1 Or .Value Like "[AIa]" Then '剔除一个字母的匹配(I,a除外)
If Dic.Exists(.Value) Then Dic(.Value) = Dic(.Value) + 1 Else Dic.Add .Value, 1
End If
End With
Next
k = Dic.Keys '获取各“单词”
For i = 0 To UBound(k) - 1 '排序
For j = i + 1 To UBound(k)
If k(i) > k(j) Then
temp = k(i)
k(i) = k(j)
k(j) = temp
End If
Next
Next
For i = 0 To UBound(k) '合并以用于输出
c = c & k(i) & vbTab & Dic(k(i)) & Chr(13)
Next
Documents.Add.Content.Text = "共有如下" & Dic.Count & "个英文单词(含频次):" & Chr(13) & c
End With
End Sub
[ 本帖最后由 sylun 于 2010-8-21 12:23 编辑 ] |
|