Python – How to download image from Google Search?

I’m training myself to learn machine learning and A.I. When I get started with Tensorflow for transfer learning, I would like to have some images for my training set. The images can be easily found by Google Image Search but I found no way to get all found images downloaded to my local machine. There’re some add-ons for Google Chrome such as Firefox but they don’t really work. They crash most of the time when I start to download them. Therefore I have to write myself a small tool to get this job done for me. In this blog post, I would like to show you how I made it with Python.

1. Use Google Custom Search API

1.1 Prerequisites

Google provides us Google Custom Search API to integrate his search engine into our apps. What we need is an API key and a Search Engine Id which I already mentioned in this post.

1.2 Install Google API Python Client

After having API key and Search Engine Id for your Google Custom Search, let’s install Google API Python Client package. I’m using Windows and Anaconda Package Manager. If you have other systems, just use standard pip to get the correct package installed

conda install -c conda-forge google-api-python-client=1.6.2

1.3 Supported arguments

Code with Google Custom Search support 3 arguments: query text, destination folder and count of images.

def main(argv):
    """Main"""
    gcs = GoogleCustomSearch()
    gcs.count, gcs.folder, gcs.query = gcs.parse_args(argv)
    gcs.search()

For example, if you would like to search for first 10 images tagged by “Darth Vader” to folder C:\temp\StarWars\DarthMaul, then the syntax should be

python gcse.py -q "darth vader" -f "C:\temp\StarWars\DarthVader" -c 10

1.4 Constructor

The class GoogleCustomSearch is initialized with API key and Search Engine Id got from step above.

class GoogleCustomSearch(HDBase):
    """Google Custom Search"""

    def __init__(self, usage_text=""):
        usage_text = "python gcse.py -q <query> -f <destination folder> -c 100"
        self.api_key = "AIzaSyDQ92Dx35mWmYWEmBdCqBQnkfgdxpCKF-w"
        self.search_engine_id = "003470263288780838160:ty47piyybua"
        HDBase.__init__(self, usage_text)

1.5 Search and download

After initializing GoogleCustomSearch with correct API key and Search Engine Id, we can make a query with paging.

def download_links(self, response):
	"""Download files"""
	for item in response["items"]:
		if "pagemap" in item:
			page_map = item["pagemap"]
			if "cse_image" in page_map:
				link = page_map["cse_image"][0]["src"]
				self.download_link(link)

def search(self):
	"""Search"""
	page_size = 10
	start = 1
	service = build("customsearch", "v1", developerKey=self.api_key)

	while start < self.count:
		response = service.cse().list(
			q=self.query,
			cx=self.search_engine_id,
			start=start
		).execute()

		self.download_links(response)

		if self.count - start < page_size:
			start += self.count - start
		else:
			start += page_size

The response is in form of JSON, therefore we can easily access it as key-value-item. Just loop through the list, extract the link of each image and download them to your local folder.

However, the problem of using Google Custom Search is the images we got are not as same as what we see when we search over the browser because our browsers and Google Custom Search Engine have different settings. Depend on these settings, Google will give different result back. I don’t know how to set these settings so that I receive same results as in the browser. So the idea of using Google Custom Search doesn’t bring what I want. So in next section, I will show you how to use Selenium to simulate browser behaviors such as automating our search actions, scrolling and getting links to images.

2. Use Selenium with Firefox

2.1 Prerequisites

Let’s download latest version of Firefox from his homepage and get the latest Gecko driver. Then copy Gecko driver to same folder as your Python file.

2.2 Install Selenium

Install Selenium package.

conda install -c conda-forge selenium=3.4.2

2.3 Supported arguments

Searching with Selenium support 4 arguments: query text, destination folder, count of images and extension.

def main(argv):
    """Main"""
    gcs = Selenium()
    gcs.count, gcs.extension, gcs.folder, gcs.query = gcs.parse_args(argv)
    gcs.search()

For example, if you would like to search for first 10 images tagged by “Darth Vader” to folder C:\temp\StarWars\DarthMaul with type of JPEG then the syntax should be

python gsel.py -q "darth vader" -f "C:\temp\StarWars\DarthVader" -c 10 -e ".jpg;.jpeg"

2.4 Search and download

We’ll use Selenium to simulate what happens as same as in the browser. Browse to Google Search, make a query, scroll down to view all images and click on the button “Show more results” to view full search result.

def search(self):
        """Search"""
        url = "https://www.google.com/search?q=" + self.query + "&source=lnms&tbm=isch"
        # caps = webdriver.DesiredCapabilities().FIREFOX
        # caps["marionette"] = False
        # driver = webdriver.Firefox(capabilities=caps)
        driver = webdriver.Firefox()
        driver.get(url)
        self.count_downloaded = 0
        while self.count_downloaded < self.count:
            for scroll in range(10):
                driver.execute_script("window.scrollBy(0,1000000)")
                time.sleep(0.2)
            time.sleep(0.5)

            images = driver.find_elements_by_xpath("//div[@class='rg_meta']")
            for image in images:
                if self.count_downloaded >= self.count:
                    break
                image_url = json.loads(image.get_attribute("innerHTML"))["ou"]
                self.download_link(image_url)

            button_smb = driver.find_element_by_xpath(
                "//input[@id='smb']")
            if button_smb is not None:
                try:
                    button_smb.click()
                except ElementNotInteractableException:
                    pass

        driver.quit()

When the code is executed, Firefox will be launched, go to Google Image Search, make a query, scroll down and click on button automatically. At the end, links to images will be extracted and the images will be downloaded to your defined folder. It’ll take a while when downloading process is running, depending on how fast your internet connection is.

3. Source code

The full source code is available at Bitbucket: https://bitbucket.org/hintdesk/python-google-search-image

One thought on “Python – How to download image from Google Search?”

  1. Having run the code , I get an error.
    UnboundLocalError: local variable ‘extension’ referenced before assignment

    Traceback (most recent call last):
    File “gsel.py”, line 63, in
    main(sys.argv[1:])
    File “gsel.py”, line 57, in main
    gcs.count, gcs.extension, gcs.folder, gcs.query = gcs.parse_args(argv)
    File “C:\Development\Codebase\hdbase.py”, line 106, in parse_args
    return count, extension, folder, query
    UnboundLocalError: local variable ‘extension’ referenced before assignment

Leave a Reply

Your email address will not be published. Required fields are marked *