How To Parse Javascript Using Nokogiri And Ruby
I need to parse an array out of a website. The part of the JavaScript I want to parse looks like this: _arPic[0] = 'http://example.org/image1.jpg'; _arPic[1] = 'http://example.org/
Solution 1:
If I read you correctly you're trying to parse the JavaScript and get a Ruby array with your image URLs yes?
Nokogiri only parses HTML/XML so you're going to need a different library; A cursory search turns up the RKelly library which has a parse
function that takes a JavaScript string and returns a parse tree.
Once you have a parse tree you're going to need to traverse it and find the nodes of interest by name (e.g. _arPic
) then get the string content on the other side of the assignment.
Alternatively, if it doesn't have to be too robust (and it wouldn't be) you can just use a regex to search the JavaScript if possible:
/^\s*_arPic\[\d\] = "(.+)";$/
might be a good starter regex.
Solution 2:
The easy way:
_arPic = URI.extract product_page.css("div#main_column script")[0].text
which can be shortened to:
_arPic = URI.extract product_page.at("div#main_column script").text
Post a Comment for "How To Parse Javascript Using Nokogiri And Ruby"