|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcluster.PhraseSupporter
public class PhraseSupporter
Lots of functions to inspect the documents. Use PhraseFinder first to
even identify the phrases to form the possible clusters, and then run
those clusters with the documents with the methods provided here.
Clients probably won't want to touch these methods, and instead will want to call methods
from TreeHelper, which can create the hierarchy for you.
| Constructor Summary | |
|---|---|
PhraseSupporter()
|
|
| Method Summary | |
|---|---|
static double |
calculateRelevance(java.util.Set<ClusterDoc> docs,
Phrase originalSet,
Phrase combined)
The interpretation is that if P(B|A) -> 1, then phrase b is seen every time phrase A occurs. |
static double |
calculateRelevanceRelaxed(java.util.Set<ClusterDoc> docs,
Phrase originalSet,
Phrase combined)
An experiment when calculating how two clusters are related. |
static boolean |
checkSet(java.util.Set<ClusterDoc> docs,
Phrase termSet,
double cutoff)
|
static java.util.List<Phrase> |
checkSets(java.util.List<? extends ClusterDoc> docs,
java.util.List<Phrase> candidates,
int sufficientDocs)
Records in TestDoc the number of terms supported Records in TermSet the documents that cover each term |
static double |
findRelevanceRelaxed(ClusterDoc d,
Phrase set,
Phrase required,
int slackNum)
An experiment while calculating the relationship between two phrases Abandoned in favor of embedding alternative phrases within Phrase |
static int |
getNumInstances(java.util.Set<ClusterDoc> docs,
Phrase set)
|
static int |
getNumInstancesOfCombinedSet(ClusterDoc d,
Phrase setA,
Phrase setB)
|
static int |
getNumInstancesOfCombinedSet(java.util.Set<ClusterDoc> docs,
Phrase thisI,
Phrase thisJ)
|
static int |
getNumInstancesOfSet(ClusterDoc d,
Phrase set)
Returns how many windows (ie. |
static int |
getNumInstancesOfSetRelaxed(ClusterDoc d,
Phrase set,
Phrase required,
int slackNum)
Set can occur in a sentence if at most slackNum words of the combined phrase are missing, and none of these missing words are in required |
static int |
getNumInstancesOfSetSingle(ClusterDoc d,
Phrase set)
A faster implementation when there is only one set |
static int |
numDocsWithSet(java.util.Collection<ClusterDoc> docs,
Phrase p)
|
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public PhraseSupporter()
| Method Detail |
|---|
public static java.util.List<Phrase> checkSets(java.util.List<? extends ClusterDoc> docs,
java.util.List<Phrase> candidates,
int sufficientDocs)
throws java.io.IOException
candidates - sufficientDocs - will keep this term if it's in at least this many docs
java.io.IOException
public static double calculateRelevance(java.util.Set<ClusterDoc> docs,
Phrase originalSet,
Phrase combined)
docs - originalSet - combined -
public static int getNumInstancesOfCombinedSet(java.util.Set<ClusterDoc> docs,
Phrase thisI,
Phrase thisJ)
public static int getNumInstances(java.util.Set<ClusterDoc> docs,
Phrase set)
docs - set -
public static double calculateRelevanceRelaxed(java.util.Set<ClusterDoc> docs,
Phrase originalSet,
Phrase combined)
docs - originalSet - combined -
public static int numDocsWithSet(java.util.Collection<ClusterDoc> docs,
Phrase p)
docs - p -
public static boolean checkSet(java.util.Set<ClusterDoc> docs,
Phrase termSet,
double cutoff)
docs - termSet - cutoff - double between 0 and 1 (fraction of size of docs needed to pass)
public static int getNumInstancesOfSetSingle(ClusterDoc d,
Phrase set)
throws java.io.IOException
d - set -
java.io.IOException
public static int getNumInstancesOfSet(ClusterDoc d,
Phrase set)
throws java.io.IOException
d - set -
java.io.IOException
public static int getNumInstancesOfCombinedSet(ClusterDoc d,
Phrase setA,
Phrase setB)
throws java.io.IOException
java.io.IOException
public static int getNumInstancesOfSetRelaxed(ClusterDoc d,
Phrase set,
Phrase required,
int slackNum)
throws java.io.IOException
d - set - required - slackNum -
java.io.IOException
public static double findRelevanceRelaxed(ClusterDoc d,
Phrase set,
Phrase required,
int slackNum)
throws java.io.IOException
d - set - required - slackNum -
java.io.IOException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||