|
[广告] VBA代码宝 - VBA编程加强工具 · VBA代码随查随用 · 内置多项VBA编程加强工具 ★ 免费下载 ★ ★ 使用手册★
- Sub Test()
- Debug.Print CosineSimilarity("http://excelhome", "http://club.excelhome") '相似度为0.9320546490018
- Debug.Print CosineSimilarity("Excel VBA程序开发", "VBA程序") '相似度为0.645497224367903
-
- '余弦相似度 以 词 为单位, 对距离不敏感,所以下面这两个例子的相似度为 100%
- Debug.Print CosineSimilarity("excelhome", "homeexcel")
- Debug.Print CosineSimilarity("生活好美", "美好生活")
-
- End Sub
- '余弦相似度算法 VBA 实现
- Function CosineSimilarity(strA As String, strB As String) As Double
- Dim objDic_All As Object, objDic_1 As Object, objDic_2 As Object
- Dim lngID As Long, strKey As String
- Dim arrKey As Variant, arrResult As Variant
- Dim dblSum As Double, dblVal_A As Double, dblVal_B As Double
-
- If strA = "" Or strB = "" Then
- CosineSimilarity = 0
- Exit Function
- End If
-
- '==========================================================
- 'VBA下没有合适的 分词API
- '只能分字了,太囧了
- '
- Set objDic_All = CreateObject("Scripting.Dictionary")
- Set objDic_1 = CreateObject("Scripting.Dictionary")
- Set objDic_2 = CreateObject("Scripting.Dictionary")
-
- For lngID = 1 To Len(strA)
- strKey = Trim(Mid(strA, lngID, 1))
- If strKey <> "" Then
- objDic_All(strKey) = ""
- objDic_1(strKey) = Val(objDic_1(strKey)) + 1
- End If
- Next
- For lngID = 1 To Len(strB)
- strKey = Trim(Mid(strB, lngID, 1))
- If strKey <> "" Then
- objDic_All(strKey) = ""
- objDic_2(strKey) = Val(objDic_2(strKey)) + 1
- End If
- Next
- '上面的这部分代码,如果有合适的分词工具,会更加精准
- '==========================================================
-
- arrKey = objDic_All.keys
- ReDim arrResult(LBound(arrKey) To UBound(arrKey), 1 To 3)
-
- For lngID = LBound(arrKey) To UBound(arrKey)
- arrResult(lngID, 1) = arrKey(lngID)
- arrResult(lngID, 2) = objDic_1(arrKey(lngID)) + 0
- arrResult(lngID, 3) = objDic_2(arrKey(lngID)) + 0
- Next
-
- Set objDic_All = Nothing
- Set objDic_1 = Nothing
- Set objDic_2 = Nothing
-
- For lngID = LBound(arrResult) To UBound(arrResult)
- dblSum = dblSum + arrResult(lngID, 2) * arrResult(lngID, 3)
- dblVal_A = dblVal_A + arrResult(lngID, 2) ^ 2
- dblVal_B = dblVal_B + arrResult(lngID, 3) ^ 2
- Next
-
- CosineSimilarity = dblSum / (Sqr(dblVal_A) * Sqr(dblVal_B))
- End Function
复制代码
|
评分
-
5
查看全部评分
-
|