本帖最后由 ypddd 于 2018-9-29 11:03 编辑
查找条件比较复杂,希望大神出手!谢谢咯!
--------------小白的分割线--------------------------------------------
现有一本Word格式的英文词典。相邻两个单词的界限是空格(半角)。
两个目标是:
1)找出每个自然段段首的词头(≥1个单词),用下划线在原来的Word中标出。(需要这一步,是因为下面的查找条件难免有遗漏,需要在Word中手工修正一次)。希望是一个独立的宏。
2)提取已经标记了下划线的词头到Excel,同时标出所在的Word文档页码(不是页眉的页码)。共两列数据。希望是另一个独立的宏。
查找条件
1、 开始标记的判断。从每个自然段的第一个单词开始,所有这些单词所在的行都是顶格的,没有任何缩进。 任何一页的第一个自然段,不论它前面有没有换行符,只要它右缩进了,都不处理。
2、 一个说明:如果遇到括号( )或[ ]( (半角圆括号或直角方括号),一律忽略括号中的内容,直接往下找结束标记。例如: lag [rare] Another term for perseveration. 又如: lenition (also weakening) Any phonological change in which a segment becomesless consonant-like than previously. A shift in character from left to rightalong any of the scales in Table 5 may be regarded as a lenition; a lenitionall the way to zero is loss (sense 1)or deletion.
3、 结束标记的判断。有下5种情形(优先执行靠前的情况)。
a) 一个自然段中,如果段首以前单引号‘开头,则以后单引号’为结束标记(词头包括后单引号)。 例如: ‘Yesterday’ssyntax is today’s morphology’ A slogan adopted by those who embrace (atleast part of) the linguistic cycle hypothesis. It expresses the view that mostmorphological markers derive from the reduction of syntactic structures. Thisview was advanced by Givon (1971), but it has been criticized - for example, byComrie (1980b).
b) 一个自然段中,如果第一个句号(半角)之前有以下形式之一, , Law of +首字母大写的单词 , Law of the +首字母大写的单词 , Principle of+首字母是大写的单词 例如 Waterloo,Law of Afamous analogy proposed by Edgar Sturtevant,who Rising Sonority, Law of Another term for the (Law of) OpenSyllables. Vasiljev and Dolobko, Law of Another term for Dolobko’s Law. Salience, Principle of A putative principle, put forward by Lemle andNaro (1977), governing the effect of phonological change on inflectionalmorphology.
c) 一个自然段中,如果第一个单词首字母是小写,如果后面的第一个首字母为大写的单词全由大写字母组成,则以第二个首字母为大写的单词作为界线,取它前面的字符(不含末尾空格)。例如: accent in PIE The nature and placement of the word-accentin PIE. This issue has long been debated, and no resolution is currently insight.
d) 一个自然段中,如果第一个单词首字母是小写开头,则第一个首字母为大写的单词是界线,取它前面的单词内容(不含末尾空格)。例如(下划线部分为待提取的字符,下同,但原文无下划线)。 abduction A type of reasoning in which we observe a result, invoke a generallaw which could aberrantformation Anew lexical item constructed in an anomalous manner stronger, law of the Another term for Grammont’s Law.
e) 如果段首几个单词的首字母都是大写,则以这几个单词的最后一个为界(忽略括号中的内容,包括方括号[ ]),取它前面的字符。例如: Abkhaz-Adyge(also Northwest Caucasian) Afamily of five languages spoken in and near the Caucasus to the east of theBlack Sea. 又如: Abnutzung (Ger ‘abrasion’) Thephonological reduction of grammatical morphemes of high frequency, which istypically greater than thedegree
4、 如有不属于以上任何一种情况,请取段首第一个单词,并在Word中标为红色。
附件
提取段首的字符串(词典词头).zip
(995.65 KB, 下载次数: 5)
拜谢!
补充内容 (2018-10-4 23:07):
撤销求助。
谢谢各位老师关注/b]! |