The Student Room Group

A level NEA Computer Science Web Scraping

Hi, I am trying to web scrape through different websites, does anyone have any idea how to automate it so that the program will be able to know what to search for in the html code. I am trying to create a web scraping project to search for jobs, and I would like to be able to scrape through many different sites.
Thanks,
Sam

Reply 1

Original post
by sam10381
Hi, I am trying to web scrape through different websites, does anyone have any idea how to automate it so that the program will be able to know what to search for in the html code. I am trying to create a web scraping project to search for jobs, and I would like to be able to scrape through many different sites.
Thanks,
Sam

So why don't you replicate the Google Search engine? Pick a starter web page and then whenever you get an href, log it and at some point in the future, scrap that web page. Eventually you will have a massive list of every web page in the internet and be ready to start competing with Google itself! Google also ranks sites based on how many links there are to that page. Its a start and by the time you have perfected it, you will have worked out how to work out the title of each (clue - <title>)

Quick Reply

How The Student Room is moderated

To keep The Student Room safe for everyone, we moderate posts that are added to the site.