Util
Class VectorManager

java.lang.Object
  extended by Util.VectorManager

public class VectorManager
extends java.lang.Object

Class that keeps track of how often a word occurs in a document or globally. Used during SVM to create vectors from the words.

Author:
davidc

Constructor Summary
VectorManager()
           
 
Method Summary
 boolean containsWord(java.lang.String s)
           
 void deleteWbid(java.lang.String wbid)
          Delete all references to this wbid's document
A way to save memory
 void emptyCounts()
           
static java.lang.String fix(java.lang.String s)
          Mainly keeps letters, stems the word via the Porter stemming algorithm, and lower-cases.
Also some tricks when dealing with hyphens.
 int getDocumentCount(int FeatureIdx)
           
 int getFeatureCount(java.lang.String wbid, int idx)
           
 int getFeatureLength()
           
 java.lang.String getFeatureString(int idx)
           
 int getIdxFromFeatureToUse(int k)
           
 int getMaxFeatureCount(java.lang.String wbid)
           
 int getTotalFeatureCount(int idx)
           
 int getVocabularySize()
           
 boolean isVocabularyLocked()
           
 boolean isWord(java.lang.String s)
           
 void registerLocally(int wordIndex, java.lang.String wbid)
           
 void registerWord(java.lang.String s, java.lang.String wbid)
          Adds this word to VectorManager's count of used words and this document's count
 void setVocabularyLocked()
           
 int wordIndex(java.lang.String origS)
          Gives the index for this word
 int wordIndexWithoutFix(java.lang.String s)
           
 int wordIndexWithoutFix(java.lang.String s, java.lang.String origS)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

VectorManager

public VectorManager()
Method Detail

fix

public static java.lang.String fix(java.lang.String s)
Mainly keeps letters, stems the word via the Porter stemming algorithm, and lower-cases.
Also some tricks when dealing with hyphens.

Issues I thought of 7/13: anterior-posterior gets mapped to anteriorposterior which should be acceptable as it's difficult to split words on -

Parameters:
s -
Returns:

isWord

public boolean isWord(java.lang.String s)

wordIndex

public int wordIndex(java.lang.String origS)
Gives the index for this word

Parameters:
s -
Returns:
An integer that belongs to only this word

wordIndexWithoutFix

public int wordIndexWithoutFix(java.lang.String s)

wordIndexWithoutFix

public int wordIndexWithoutFix(java.lang.String s,
                               java.lang.String origS)

getFeatureString

public java.lang.String getFeatureString(int idx)
Parameters:
idx -
Returns:
the first unstemmed version of the string that this index represents

getFeatureLength

public int getFeatureLength()

containsWord

public boolean containsWord(java.lang.String s)
Parameters:
s - Will not be fixed in this function
Returns:

getDocumentCount

public int getDocumentCount(int FeatureIdx)

getIdxFromFeatureToUse

public int getIdxFromFeatureToUse(int k)

getFeatureCount

public int getFeatureCount(java.lang.String wbid,
                           int idx)

registerWord

public void registerWord(java.lang.String s,
                         java.lang.String wbid)
Adds this word to VectorManager's count of used words and this document's count


registerLocally

public void registerLocally(int wordIndex,
                            java.lang.String wbid)

deleteWbid

public void deleteWbid(java.lang.String wbid)
Delete all references to this wbid's document
A way to save memory

Parameters:
wbid -

getMaxFeatureCount

public int getMaxFeatureCount(java.lang.String wbid)

getTotalFeatureCount

public int getTotalFeatureCount(int idx)

emptyCounts

public void emptyCounts()

getVocabularySize

public int getVocabularySize()

isVocabularyLocked

public boolean isVocabularyLocked()

setVocabularyLocked

public void setVocabularyLocked()