Util
Class StopWordsChecker
java.lang.Object
Util.StopWordsChecker
public class StopWordsChecker
- extends java.lang.Object
Loads files as needed and provides methods
to check if words are verbs, stopwords, useless, or names
- Stopwords.txt contains stopwords, these are one word per line and are not stemmed when checking for stopwords. These words are case-insensitive.
- Verbs.txt contains multiple words/line, with the intention that different conjugations of the same word are on the same line. This also means that there's no stemming from these words. These words are case-insensitive.
- names.txt contains names of journals, scientists, and possibly phrases that are found too regularly in scientific publications. These words are case-insensitive. A phrase is considered a name if it is contained by any of these lines. For example, if my names.txt only contains ?Childrens Hospital Los Angeles?, both ?Los Angeles? and ?Childrens Hospital? would be considered names and discarded during phrase extraction (this file is not used during SVM). Notice that the apostrophe is not included on purpose. In general, entries should consist of only letters since the parsing process will discard punctuation.
- Useless.txt contains words that should be discarded when finding phrases for general web searches. These words are fixed.
- Author:
- davidc
|
Method Summary |
static boolean |
isName(java.lang.String p)
|
static boolean |
isStopWord(java.lang.String s)
|
static boolean |
isUseless(java.lang.String fixed)
Added 8/16 to filter out useless results from general web searches |
static boolean |
isVerb(java.lang.String s)
|
| Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
StopWordsChecker
public StopWordsChecker()
isName
public static boolean isName(java.lang.String p)
isStopWord
public static boolean isStopWord(java.lang.String s)
isUseless
public static boolean isUseless(java.lang.String fixed)
- Added 8/16 to filter out useless results from general web searches
- Parameters:
fixed -
- Returns:
isVerb
public static boolean isVerb(java.lang.String s)