Skip to content Skip to sidebar Skip to footer

Convert Nl String To Vector Or Some Numeric Equivalent

I'm trying to convert a string to a numeric equivalent so I can train a neural-network to classify the strings. I tried the sum of the ascii values, but that just results in larger

Solution 1:

I'm sure you've considered assigning each new word you encounter an integer. You'll have to keep track somewhere, but that's one option.

You could also use whatever built-in hash method js has.

If you don't mind a few hash collisions, and the size of the resulting integers doesn't matter, may I recommend a trick I've used a few times before.

Frequency of Letters in English

So, e = 2, t=3, a=5, etc., which gives us:

2       e
3       t
5       a
7       o
11      i
13      n
17      s
19      h
23      r
29      d
31      l
37      c
41      u
43      m
47      w
53      f
59      g
61      y
67      p
71      b
73      v   
79      k
83      j
89      x
97      q
101     z
  • Multiply the value corresponding with each letter in a word

So, value is 73*5*31*41*2. corresponding is 37*7*23*23.... Each unique set gives a unique answer. It collides for anagrams, so we've accidentally built an anagram detector.

There isn't really a linguistically sound way to do this, though. I suspect word2vec just assigns arbitrary integers to strings.


Post a Comment for "Convert Nl String To Vector Or Some Numeric Equivalent"