Web Scraping For Amazon Prices
Idea:
I wanted to build a basic web scraper where you can get the price for a given product on Amazon. Amazon has a product advertising API which allows you to do this programmatically, but after watching a few videos on this subject, I wanted to try and do it this way as a simple node app.
What is a web scraper
From wikipedia
Steps to solve this problem
1: Setup project
2: Manually inspect the page to see where the the price is. If in a class or div, note that. For this case, it is in a div called #priceblock_ourprice
3: Get the HTML of the page (using axios within getHTML() function)
4: Once we have the HTML, we can get the price from the page via cheerio in the getAmazonPrice() function
Node packages used
cherriojs - Essentially jQuery for node. Allows you to easily pick elements from a page
axios - Promise based HTTP client for the browser and node.js
esm - ECMAScript module loader so we can use import
Setup
mkdir simpleWebScraper
cd simpleWebScraper
npm init -f (-f accepts the defaults)
Install packages
npm i cheerio axios esm
npm i nodemon —save-dev
After you run npm ini, and install the packages it will create a package.json file for you. Once created, you can go into the scripts object and add a command to run the app. See line 8 of the package.json file below
package.json
With the project skeleton setup, we can now add the following files (index.js and scrape.js).
index.js scrape.js
Finally, to run this app, simply go to your terminal, and enter:
npm run dev
Because I am using nodemon, anytime you make a change and save the application, it will run the app again.
Note that in scrape.js, lines 6 - 8, I had to pass headers. Without doing this, I was getting a 503 error returned. Please see notes 2, 3, and 4 below.
I wanted to build a basic web scraper where you can get the price for a given product on Amazon. Amazon has a product advertising API which allows you to do this programmatically, but after watching a few videos on this subject, I wanted to try and do it this way as a simple node app.
What is a web scraper
From wikipedia
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.full wiki reference on web scraping
Steps to solve this problem
1: Setup project
2: Manually inspect the page to see where the the price is. If in a class or div, note that. For this case, it is in a div called #priceblock_ourprice
3: Get the HTML of the page (using axios within getHTML() function)
4: Once we have the HTML, we can get the price from the page via cheerio in the getAmazonPrice() function
Node packages used
cherriojs - Essentially jQuery for node. Allows you to easily pick elements from a page
axios - Promise based HTTP client for the browser and node.js
esm - ECMAScript module loader so we can use import
Setup
mkdir simpleWebScraper
cd simpleWebScraper
npm init -f (-f accepts the defaults)
Install packages
npm i cheerio axios esm
npm i nodemon —save-dev
After you run npm ini, and install the packages it will create a package.json file for you. Once created, you can go into the scripts object and add a command to run the app. See line 8 of the package.json file below
package.json
With the project skeleton setup, we can now add the following files (index.js and scrape.js).
index.js scrape.js
Finally, to run this app, simply go to your terminal, and enter:
npm run dev
Because I am using nodemon, anytime you make a change and save the application, it will run the app again.
Note that in scrape.js, lines 6 - 8, I had to pass headers. Without doing this, I was getting a 503 error returned. Please see notes 2, 3, and 4 below.
Notes
1: Based on scraping tutorial
https://www.youtube.com/watch?v=rWc0xqroY4U&t=1757s
2: Error research (a python thread)
https://www.reddit.com/r/learnpython/comments/4eaz7v/error_503_when_trying_to_get_info_off_amazon/
3: Axios send headers
https://stackoverflow.com/questions/45578844/how-to-set-header-and-options-in-axios
4: Found the headers to send here
https://www.scrapehero.com/tutorial-how-to-scrape-amazon-product-details-using-python/
5: Wikipedia definition of web scraping
web scraping
1: Based on scraping tutorial
https://www.youtube.com/watch?v=rWc0xqroY4U&t=1757s
2: Error research (a python thread)
https://www.reddit.com/r/learnpython/comments/4eaz7v/error_503_when_trying_to_get_info_off_amazon/
3: Axios send headers
https://stackoverflow.com/questions/45578844/how-to-set-header-and-options-in-axios
4: Found the headers to send here
https://www.scrapehero.com/tutorial-how-to-scrape-amazon-product-details-using-python/
5: Wikipedia definition of web scraping
web scraping
No comments: