Press enter or click to view image in full size
Say that you’re managing a shop, or that you are simply monitoring a list of items from Amazon so you can get notified every time the price changes, this quick tutorial is exactly for you.
Writing a simple but effective scraper with a bunch of line of codes today is extremely easy, and we’re here to prove it.
Let’s start!
Kickstart
We are going to use NodeJS and scrape-it to write a simple script able to fetch Amazon prices and SQLite to store items and their prices. We’re using SQLite because of its simplicity and because, fundamentally, it’s just a plain file sitting on the disk requiring no configuration whatsoever! And.. you can easily copy it around or export data as CSV.
Before starting: make sure you have NodeJS and Yarn (NodeJS package manager) installed on your system.
1. Prepare the Environment
Let’s create a folder holding our project and install our dependencies:
yarn init
yarn add scrape-it sqlite3Also, let’s install sqlite3 in our system in order to prepare our database. For Linux / Mac:
sudo apt install sqlite3 // Ubuntu
sudo pacman -S sqlite3 // Arch Linux
brew install sqlite // MacOSIf you’re on Windows, follow this guide.
2. Create our Database
Let’s create a folder keeping the database itself, its schema, and a seeder to store some dummy items for testing.
mkdir db
touch db/schema.sql
touch db/seed.sqlWe now edit schema.sql to define our db schema:
CREATE TABLE items (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
asin TEXT NOT NULL,
price INTEGER,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);In my case, I want to scrape Amazon in order to monitor guitar pedals prices, so I will seed the database with some dummy items (you can use whatever you want):
INSERT INTO items (name, asin)
VALUES
('Boss BD-2', 'B0002CZV6E'),
('Ibanez TS-808', 'B000T4SI1K'),
('TC Flashback', 'JB02HXMRST4R')The asin is basically the Amazon product id, you can easily get it from an Amazon product url:
https://www.amazon.com/TC-Electronic-Flashback-Delay-Effects/dp/B06Y42MJ4NNow, let’s create the database using sqlite3 package:
cd db
sqlite3 database.sqlite < schema.sql
sqlite3 database.sqlite < seed.sqlAnd that’s it! We now have some sample products to play with :)
3. The Scraper
Now, create a file index.js in the root of your project, fire up your favorite text editor and let’s put some code in it!
The scraper will perform the following steps:
- Retrieve the list of items from our database
- For every item scrape the relative Amazon page (using the asin of each product)
- Update the item price on the database
First, we declare dependencies and instantiate our database connector:
const scrapeIt = require('scrape-it')
const sqlite3 = require('sqlite3').verbose()
const db = new sqlite3.Database('./db/database.sqlite')We don’t want to send too many requests to Amazon in a short period of time (let’s be polite), so let’s add a waiting function which we’ll use after every scrape action:
const wait = (time) => new Promise(resolve => setTimeout(resolve, time))Here we have our scrape function:
// scrape Amazon by asin (product id)
const scrapeAmazon = (asin) => {
const url = `https://amazon.com/dp/${asin}`
console.log('Scraping URL', url)
return scrapeIt(url, {
price: {
selector: '#price_inside_buybox',
convert: p => Math.round(parseInt(p.split('$')[1])) * 100
}
})
}It simply accepts the asin and performs the scrape-it function to collect the information we need from the page.
Specifically, we want to scrape the price appearing on the right column of an Amazon product page (red box):
Press enter or click to view image in full size
The process is very simple:
- Go to the Amazon product page URL on your browser
- Open the Development Tools
- Pick the HTML price element
- Get whatever selector / id / class we might need to identify that element
In our case, the id will be just enough ( #price_inside_busybox ).
Keep in mind that you might need to check if Amazon changes the element id from time to time. There are better solutions to this (such as looking for the dollar sign “$” in the page), but let’s keep it simple for now.
We also specify a convert function to parse the price as an Integer (to avoid storing floats on the database).
And this is it! At this point, scrape-it will do the rest outputting all the elements it finds (in our case, just one) under an object called price :)
Also, we return a scrape-it Promise so that we can conveniently use async / await.
In order to update the price on our database we implement the following function:
// update pedal price
const updatePrice = (asin, price) => {
console.log('Updating item:', asin, price)
db.run(`
UPDATE items SET price = '${price}'
WHERE asin = '${asin}'
`, (err) => {
if (err) {
console.log(err)
}
})
}Pretty easy: we simply use the asin to look up the entry we want to update, and perform an update on the database.
Finally, here’s our main function which goes through all the items in the database and calls the function we just prepared:
const scrape = async () => {
db.all('SELECT * FROM items', [], async (err, items) => {
console.log(items)
for (const item of items) {
const scraped = await scrapeAmazon(item.asin)
if (scraped.response.statusCode === 200 && scraped.data.price) {
updatePrice(item.asin, scraped.data.price)
} else {
console.log('Out of stock')
}
await wait(2000)
}
})
}scrape()
Now simply head to your terminal and run it with node index.js .
You can find the code on my GitHub repo.
That’s it!
Wrappin’ Up
Either if you need a plain and simple web scraper or if you’re planning to build the next-gen Cloud scraping platform scaling to millions of users, this little guide should provide you the very basic building block for anything you might want to achieve.
Actually, this is exactly what I’ve used to find prices for my website rigfoot.com
As an exercise, you might try adding a new scraping function targeting a different website, such as eBay or Thomann!
I do really hope that you found it useful, I would love to hear your comments and suggestions on features you’d like to see.
Thanks!