博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
python3下multiprocessing、threading和gevent性能对比
阅读量:5810 次
发布时间:2019-06-18

本文共 6524 字,大约阅读时间需要 21 分钟。

  hot3.png

转自: http://blog.csdn.net/littlethunder/article/details/40983031

目前计算机程序一般会遇到两类I/O:硬盘I/O和网络I/O。我就针对网络I/O的场景分析下python3下进程、线程、协程效率的对比。进程采用multiprocessing.Pool进程池,线程是自己封装的进程池,协程采用gevent的库。用python3自带的urlllib.request和开源的requests做对比。代码如下:

import urllib.request  import requests  import time  import multiprocessing  import threading  import queue    def startTimer():      return time.time()    def ticT(startTime):      useTime = time.time() - startTime      return round(useTime, 3)    #def tic(startTime, name):  #    useTime = time.time() - startTime  #    print('[%s] use time: %1.3f' % (name, useTime))    def download_urllib(url):      req = urllib.request.Request(url,              headers={'user-agent': 'Mozilla/5.0'})      res = urllib.request.urlopen(req)      data = res.read()      try:          data = data.decode('gbk')      except UnicodeDecodeError:          data = data.decode('utf8', 'ignore')      return res.status, data    def download_requests(url):      req = requests.get(url,              headers={'user-agent': 'Mozilla/5.0'})      return req.status_code, req.text    class threadPoolManager:      def __init__(self,urls, workNum=10000,threadNum=20):          self.workQueue=queue.Queue()          self.threadPool=[]          self.__initWorkQueue(urls)          self.__initThreadPool(threadNum)        def __initWorkQueue(self,urls):          for i in urls:              self.workQueue.put((download_requests,i))        def __initThreadPool(self,threadNum):          for i in range(threadNum):              self.threadPool.append(work(self.workQueue))        def waitAllComplete(self):          for i in self.threadPool:              if i.isAlive():                  i.join()    class work(threading.Thread):      def __init__(self,workQueue):          threading.Thread.__init__(self)          self.workQueue=workQueue          self.start()      def run(self):          while True:              if self.workQueue.qsize():                  do,args=self.workQueue.get(block=False)                  do(args)                  self.workQueue.task_done()              else:                  break    urls = ['http://www.ustchacker.com'] * 10  urllibL = []  requestsL = []  multiPool = []  threadPool = []  N = 20  PoolNum = 100    for i in range(N):      print('start %d try' % i)      urllibT = startTimer()      jobs = [download_urllib(url) for url in urls]      #for status, data in jobs:      #    print(status, data[:10])      #tic(urllibT, 'urllib.request')      urllibL.append(ticT(urllibT))      print('1')            requestsT = startTimer()      jobs = [download_requests(url) for url in urls]      #for status, data in jobs:      #    print(status, data[:10])      #tic(requestsT, 'requests')      requestsL.append(ticT(requestsT))      print('2')            requestsT = startTimer()      pool = multiprocessing.Pool(PoolNum)      data = pool.map(download_requests, urls)      pool.close()      pool.join()      multiPool.append(ticT(requestsT))      print('3')        requestsT = startTimer()      pool = threadPoolManager(urls, threadNum=PoolNum)      pool.waitAllComplete()      threadPool.append(ticT(requestsT))      print('4')    import matplotlib.pyplot as plt  x = list(range(1, N+1))  plt.plot(x, urllibL, label='urllib')  plt.plot(x, requestsL, label='requests')  plt.plot(x, multiPool, label='requests MultiPool')  plt.plot(x, threadPool, label='requests threadPool')  plt.xlabel('test number')  plt.ylabel('time(s)')  plt.legend()  plt.show()

运行结果如下:

        从上图可以看出,python3自带的urllib.request效率还是不如开源的requests,multiprocessing进程池效率明显提升,但还低于自己封装的线程池,有一部分原因是创建、调度进程的开销比创建线程高(测试程序中我把创建的代价也包括在里面)。

        下面是gevent的测试代码:

import urllib.request  import requests  import time  import gevent.pool  import gevent.monkey    gevent.monkey.patch_all()    def startTimer():      return time.time()    def ticT(startTime):      useTime = time.time() - startTime      return round(useTime, 3)    #def tic(startTime, name):  #    useTime = time.time() - startTime  #    print('[%s] use time: %1.3f' % (name, useTime))    def download_urllib(url):      req = urllib.request.Request(url,              headers={'user-agent': 'Mozilla/5.0'})      res = urllib.request.urlopen(req)      data = res.read()      try:          data = data.decode('gbk')      except UnicodeDecodeError:          data = data.decode('utf8', 'ignore')      return res.status, data    def download_requests(url):      req = requests.get(url,              headers={'user-agent': 'Mozilla/5.0'})      return req.status_code, req.text    urls = ['http://www.ustchacker.com'] * 10  urllibL = []  requestsL = []  reqPool = []  reqSpawn = []  N = 20  PoolNum = 100    for i in range(N):      print('start %d try' % i)      urllibT = startTimer()      jobs = [download_urllib(url) for url in urls]      #for status, data in jobs:      #    print(status, data[:10])      #tic(urllibT, 'urllib.request')      urllibL.append(ticT(urllibT))      print('1')            requestsT = startTimer()      jobs = [download_requests(url) for url in urls]      #for status, data in jobs:      #    print(status, data[:10])      #tic(requestsT, 'requests')      requestsL.append(ticT(requestsT))      print('2')            requestsT = startTimer()      pool = gevent.pool.Pool(PoolNum)      data = pool.map(download_requests, urls)      #for status, text in data:      #    print(status, text[:10])      #tic(requestsT, 'requests with gevent.pool')      reqPool.append(ticT(requestsT))      print('3')            requestsT = startTimer()      jobs = [gevent.spawn(download_requests, url) for url in urls]      gevent.joinall(jobs)      #for i in jobs:      #    print(i.value[0], i.value[1][:10])      #tic(requestsT, 'requests with gevent.spawn')      reqSpawn.append(ticT(requestsT))      print('4')        import matplotlib.pyplot as plt  x = list(range(1, N+1))  plt.plot(x, urllibL, label='urllib')  plt.plot(x, requestsL, label='requests')  plt.plot(x, reqPool, label='requests geventPool')  plt.plot(x, reqSpawn, label='requests Spawn')  plt.xlabel('test number')  plt.ylabel('time(s)')  plt.legend()  plt.show()

 
运行结果如下:

        从上图可以看到,对于I/O密集型任务,gevent还是能对性能做很大提升的,由于协程的创建、调度开销都比线程小的多,所以可以看到不论使用gevent的Spawn模式还是Pool模式,性能差距不大。

        因为在gevent中需要使用monkey补丁,会提高gevent的性能,但会影响multiprocessing的运行,如果要同时使用,需要如下代码:

gevent.monkey.patch_all(thread=False, socket=False, select=False)

 

        可是这样就不能充分发挥gevent的优势,所以不能把multiprocessing Pool、threading Pool、gevent Pool在一个程序中对比。不过比较两图可以得出结论,线程池和gevent的性能最优的,其次是进程池。附带得出个结论,requests库比urllib.request库性能要好一些 

转载于:https://my.oschina.net/sukai/blog/649793

你可能感兴趣的文章
集合内查找的四种方式
查看>>
制作首页的显示列表
查看>>
区块链入门级别认知(blockchain)
查看>>
Linux系统——28个命令行下的工具
查看>>
dubbox、zookeeper BUG记录
查看>>
com.sun.org.apache.xerces.internal.impl.dv.util.Base64出现的问题
查看>>
Problem A 你会定义类吗?
查看>>
COM是一个更好的C++
查看>>
基于SQL Serve“.NET技术”r 2008 Service Broker构建企业级消息系统
查看>>
充分利用系统的组策略 保障共享目录安全
查看>>
PHP将死。何以为继?
查看>>
php学习的第8天
查看>>
android项目的的目录结构
查看>>
阿里云服务器线上环境配置
查看>>
iOS 路由
查看>>
HDU-1016-素数环
查看>>
[翻译]HLS实践
查看>>
js的闭包
查看>>
Q97 交错字符串
查看>>
电脑上调试手机网站的几种方法
查看>>