Convert Nl String To Vector Or Some Numeric Equivalent
I'm trying to convert a string to a numeric equivalent so I can train a neural-network to classify the strings. I tried the sum of the ascii values, but that just results in larger
Solution 1:
I'm sure you've considered assigning each new word you encounter an integer. You'll have to keep track somewhere, but that's one option.
You could also use whatever built-in hash method js has.
If you don't mind a few hash collisions, and the size of the resulting integers doesn't matter, may I recommend a trick I've used a few times before.
- Assign each letter a prime number based on its frequency:
So, e = 2
, t=3
, a=5
, etc., which gives us:
2 e
3 t
5a7 o
11i13 n
17 s
19 h
23 r
29 d
31 l
37 c
41 u
43 m
47 w
53 f
59 g
61 y
67p71b73 v
79 k
83 j
89 x
97q101 z
- Multiply the value corresponding with each letter in a word
So, value
is 73*5*31*41*2
. corresponding
is 37*7*23*23...
. Each unique set gives a unique answer. It collides for anagrams, so we've accidentally built an anagram detector.
There isn't really a linguistically sound way to do this, though. I suspect word2vec
just assigns arbitrary integers to strings.
Post a Comment for "Convert Nl String To Vector Or Some Numeric Equivalent"