Coding with python: How to extract javascript dictionaries scrapy

Sunday, 3 October 2021

How to extract javascript dictionaries scrapy

Why would you want to do that?

Well, if you are web scraping using Python, and Scrapy for instance, you may need to extract reviews, or comments that are loaded from JavaScript. This would mean you could not use your css or xpath selectors like you can with regular html.

Parse

Instead, in your browser, check if you may be able to parse the code, beginning with ctrl + f, and “json” and track down some JSON in the form of a python dictionary. You ‘just’ need to isolate it.

web-scraping javascript pages — view-source to find occurrences of “JSON” in your page

The response is not nice, but you can gradually shrink it down, in Scrapy shell or python shell…

scrapy-shell-response — Figure 1 – The response

Split, strip, replace

From within Scrapy, or your own Python code you can split, strip, and replace, with the built-in python commands until you have just a dictionary that you can use with json.loads.

x = response.text.split('JSON.parse')[3].replace("\u0022","\"").replace("\u2019m","'").lstrip("(").split(" ")[0].strip().replace("\"","",1).replace("\");","")

Master replace, strip , and split and you won’t need regular expressions!

With the response.text now ready as a JSON friendly dictionary you can do this:

import json
q = json.loads(x)
comment = (q[‘doctor’][‘sample_rating_comment’])
comment.replace(“\u2019″,”‘”)
print(comment)

The key thing to remember to use when parsing the response text is to use the index, to pick out the section you want, and then make use of “\” backslash to escaped characters when you are working with quotes, and actual backslashes in the text you’re parsing.

parsed-response — Figure 2 – The parsed response

Conclusion

Rendering to HTML using Splash, or Selenium, or using regular expressions are not always essential. Hope this helps illustrate how you can extract values FROM a python dictionary FROM json FROM javascript !

You may see a mass of text on your screen to begin with, but persevere and you can arrive at the dictionary contained within…

Welcome to our Coding with python Page!!! hier you find various code with PHP, Python, AI, Cyber, etc ... Electricity, Energy, Nuclear Power

Sunday, 3 October 2021

How to extract javascript dictionaries scrapy

Parse

Split, strip, replace

Conclusion

No comments:

Post a Comment

Rank