爬虫 | ChromeDriverManager().install()异常处理

使用ChromeDriverManager().install()安装ChromeDriverManager失败

Posted by Haauleon on November 6, 2023

异常分析

  最近的爬虫项目使用到了 selenium 模块来驱动浏览器进行爬虫,但是今天在公司的服务器上去执行的时候,因为网络的原因,使用 ChromeDriverManager().install() 检查并安装最新版本驱动的时候抛出了以下异常:

1
2
3
4
5
http: error: ConnectionError: HTTPSConnectionPool(
host='chromedriver.storage.googleapis.com', port=443): Max retries exceeded with url: /LATEST_RELEASE_107.0.5304 
(Caused by NewConnectionError( '<urllib3.connection.HTTPSConnection object at 0x7fd46be37730>: Failed to 
establish a new connection: [Errno 101] Network is unreachable')) while doing a GET request to URL: 
https://chromedriver.storage.googleapis.com/LATEST_RELEASE_107.0.5304

  我在我自己的电脑本地直接去访问 https://chromedriver.storage.googleapis.com/LATEST_RELEASE_107.0.5304 是没问题的,但是服务器就一直访问异常。



异常处理

  家人们谁懂啊,公司的网络无法访问http://chromedriver.storage.googleapis.com/index.html。考虑到被驱动的 Chrome 浏览器版本是固定的不需要更新,所以驱动也不需要更新,解决方法就是直接指定本地已有的驱动版本和驱动存放的目录即可。以下是 107.0.5304.62 版本的驱动 chromedriver_linux64.zip 存放的绝对路径 src/driver/107.0.5304.62

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
.
|-- Dockerfile
|-- README.md
|-- cache
|-- chatgpt_t.py
|-- common
|-- config
|   |-- __init__.py
|   `-- setting.ini
|-- db_model
|-- deploy.yaml
|-- files
|-- job
|-- logs
|-- model
|-- request
|-- requirements.txt
|-- response
|-- run_amazan_dog_spider.py
|-- run_amazon_product_spider.py
|-- run_keepa_spider.py
|-- sources.list
|-- src
|   |-- driver
|   |   |-- 107.0.5304.62
|   |   |   |-- chromedriver
|   |   |   |-- chromedriver_linux64.zip
|   |   |   |-- drivers
|   |   |   |   `-- chromedriver
|   |   |   |       `-- linux64
|   |   |   |           `-- 107.0.5304.62
|   |   |   |               |-- chromedriver
|   |   |   |               `-- driver.zip
|   |   |   `-- drivers.json
|   |   |-- 114.0.5735.90
|   |   |   `-- chromedriver_linux64.zip
|   |   |-- drivers
|   |   |   `-- chromedriver
|   |   |       `-- linux64
|   |   |           |-- 107.0.5304.62
|   |   |           |   |-- chromedriver
|   |   |           |   `-- driver.zip
|   |   |           `-- 114.0.5735.90
|   |   |               |-- LICENSE.chromedriver
|   |   |               |-- chromedriver
|   |   |               `-- driver.zip
|   |   `-- drivers.json
|   `-- google-chrome-stable_deb_rpm_107.0.5304.122
|       |-- google-chrome-stable_current_amd64.deb
|       `-- google-chrome-stable_current_x86_64.rpm
|-- task
|   |-- __init__.py
|   `-- keepa_comment_get.py
|-- url_t.py
`-- utils

43 directories, 228 files


Linux系统

实现方法:
(1)将文件 chromedriver_linux64.zip 放在目录 src/driver/107.0.5304.62
(2)进入该目录:

1
> cd src/driver/107.0.5304.62

(3)解压该文件:

1
> unzip chromedriver_linux64.zip

(4)拷贝到 bin 目录下:

1
> cp chromedriver /usr/bin/

(5)在代码中指定该驱动的版本号和存放的绝对路径:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import os


_SLASH = os.sep
# chromedriver存放的绝对路径(在项目中存放的位置)
driver_path = os.path.join(root_path, 'src' + _SLASH + 'driver' + _SLASH + '107.0.5304.62' + _SLASH + 'chromedriver')


def start_driver():
    chrome_options = webdriver.ChromeOptions()
    # 禁止浏览器加载图片,提高运算速度
    chrome_options.add_argument('blink-settings=imagesEnabled=false')
    chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    # 服务器运行方式一:将chromedriver拷贝到/usr/bin下
    # driver = webdriver.Chrome(executable_path='/usr/bin/chromedriver', options=chrome_options)
    # 服务器运行方式二:指定chromedriver在项目中存放的位置
    driver = webdriver.Chrome(executable_path=driver_path, options=chrome_options)
    return driver


  最后执行就会提示从缓存中加载已有的驱动:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[WDM] - Current google-chrome version is 107.0.5304
[WDM] - Driver [/home/xxx/Spider/keepa-test/src/driver//drivers/chromedriver/linux64/107.0.5304.62/chromedriver] found in cache

[WDM] - Current google-chrome version is 107.0.5304
[WDM] - Driver [/home/xxx/Spider/keepa-test/src/driver//drivers/chromedriver/linux64/107.0.5304.62/chromedriver] found in cache

[WDM] - Current google-chrome version is 107.0.5304
[WDM] - Driver [/home/xxx/Spider/keepa-test/src/driver//drivers/chromedriver/linux64/107.0.5304.62/chromedriver] found in cache

[WDM] - Current google-chrome version is 107.0.5304
[WDM] - Driver [/home/xxx/Spider/keepa-test/src/driver//drivers/chromedriver/linux64/107.0.5304.62/chromedriver] found in cache

[WDM] - Current google-chrome version is 107.0.5304
[WDM] - Driver [/home/xxx/Spider/keepa-test/src/driver//drivers/chromedriver/linux64/107.0.5304.62/chromedriver] found in cache

.......



windows系统

在代码中指定该驱动的版本号和存放的绝对路径:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import os


_SLASH = os.sep
# chromedriver.exe存放的绝对路径(在项目中存放的位置)
driver_path = os.path.join(root_path, 'src' + _SLASH + 'driver' + _SLASH + '107.0.5304.62' + _SLASH + 'chromedriver.exe')


def start_driver():
    chrome_options = webdriver.ChromeOptions()
    # 禁止浏览器加载图片,提高运算速度
    chrome_options.add_argument('blink-settings=imagesEnabled=false')
    chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    # windows系统运行方式一:如果能正常访问网络,就直接使用ChromeDriverManager进行安装
    # driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
    # windows系统运行方式二:指定chromedriver在项目中存放的位置(公司网络垃圾就使用以下代码)
    driver = webdriver.Chrome(executable_path=driver_path, options=chrome_options)
    return driver




相关链接:
Web自动化框架selenium的介绍与使用
“Webdrivers”可执行文件可能具有错误的权限。请看https://sites.google.com/a/chromium.org/chromedriver/home
我无法使用ChromeDriverManager().install()安装ChromeDriverManager
当webdriver遇到“’ executable may have wrong permissions. ”