Bag of words is a method for reducing natural text into a representative model for use with machine learning and natural language processing.
I used it to train a network on log messages and then assign scores to known log messages, based on a bag of words representation the neural network can give a score of how well the test data matches.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
class BagOfWords: def __init__(self): self.dict = {} self.i = 0 def addVocab(self, line): for word in line.split(" "): if word not in self.dict.keys() and not word.isdigit(): self.dict[word] = self.i self.i += 1 def vocabScore(self, line): score = [0] * len(self.dict.keys()) for word in line.split(" "): if word in self.dict.keys() and not word.isdigit(): score[self.dict[word]] += 1 return score |
Questions? Comments?