进程与线程
什么是进程(process)?
An executing instance of a program is called a process.
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
程序并不能单独运行,只有将程序装载到内存中,系统为它分配资源才能运行,而这种执行的程序就称之为进程。程序和进程的区别就在于:程序是指令的集合,它是进程运行的静态描述文本;进程是程序的一次执行活动,属于动态概念。
在多道编程中,我们允许多个程序同时加载到内存中,在操作系统的调度下,可以实现并发地执行。这是这样的设计,大大提高了CPU的利用率。进程的出现让每个用户感觉到自己独享CPU,因此,进程就是为了在CPU上实现多道编程而提出的。
有了进程为什么还要线程?
进程有很多优点,它提供了多道编程,让我们感觉我们每个人都拥有自己的CPU和其他资源,可以提高计算机的利用率。很多人就不理解了,既然进程这么优秀,为什么还要线程呢?其实,仔细观察就会发现进程还是有很多缺陷的,主要体现在两点上:
进程只能在一个时间干一件事,如果想同时干两件事或多件事,进程就无能为力了。
进程在执行的过程中如果阻塞,例如等待输入,整个进程就会挂起,即使进程中有些工作不依赖于输入的数据,也将无法执行。
例如,我们在使用qq聊天, qq做为一个独立进程如果同一时间只能干一件事,那他如何实现在同一时刻 即能监听键盘输入、又能监听其它人给你发的消息、同时还能把别人发的消息显示在屏幕上呢?你会说,操作系统不是有分时么?但我的亲,分时是指在不同进程间的分时呀, 即操作系统处理一会你的qq任务,又切换到word文档任务上了,每个cpu时间片分给你的qq程序时,你的qq还是只能同时干一件事呀。
再直白一点, 一个操作系统就像是一个工厂,工厂里面有很多个生产车间,不同的车间生产不同的产品,每个车间就相当于一个进程,且你的工厂又穷,供电不足,同一时间只能给一个车间供电,为了能让所有车间都能同时生产,你的工厂的电工只能给不同的车间分时供电,但是轮到你的qq车间时,发现只有一个干活的工人,结果生产效率极低,为了解决这个问题,应该怎么办呢?。。。。没错,你肯定想到了,就是多加几个工人,让几个人工人并行工作,这每个工人,就是线程!
什么是线程(thread)?
线程是操作系统能够进行运算调度的最小单位。它被包含在进程之中,是进程中的实际运作单位。一条线程指的是进程中一个单一顺序的控制流,一个进程中可以并发多个线程,每条线程并行执行不同的任务
A thread is an execution context, which is all the information a CPU needs to execute a stream of instructions.
Suppose you're reading a book, and you want to take a break right now, but you want to be able to come back and resume reading from the exact point where you stopped. One way to achieve that is by jotting down the page number, line number, and word number. So your execution context for reading a book is these 3 numbers.
If you have a roommate, and she's using the same technique, she can take the book while you're not using it, and resume reading from where she stopped. Then you can take it back, and resume it from where you were.
Threads work in the same way. A CPU is giving you the illusion that it's doing multiple computations at the same time. It does that by spending a bit of time on each computation. It can do that because it has an execution context for each computation. Just like you can share a book with your friend, many tasks can share a CPU.
On a more technical level, an execution context (therefore a thread) consists of the values of the CPU's registers.
Last: threads are different from processes. A thread is a context of execution, while a process is a bunch of resources associated with a computation. A process can have one or many threads.
Clarification: the resources associated with a process include memory pages (all the threads in a process have the same view of the memory), file descriptors (e.g., open sockets), and security credentials (e.g., the ID of the user who started the process).
进程与线程的区别?
- Threads share the address space of the process that created it; processes have their own address space.
- Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
- Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
- New threads are easily created; new processes require duplication of the parent process.
- Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
- Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.
Python GIL(Global Interpreter Lock)
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)
上面的核心意思就是,无论你启多少个线程,你有多少个cpu, Python在执行的时候会淡定的在同一时刻只允许一个线程运行,擦。。。,那这还叫什么多线程呀?莫如此早的下结结论,听我现场讲。
首先需要明确的一点是GIL
并不是Python的特性,它是在实现Python解析器(CPython)时所引入的一个概念。就好比C++是一套语言(语法)标准,但是可以用不同的编译器来编译成可执行代码。有名的编译器例如GCC,INTEL C++,Visual C++等。Python也一样,同样一段代码可以通过CPython,PyPy,Psyco等不同的Python执行环境来执行。像其中的JPython就没有GIL。然而因为CPython是大部分环境下默认的Python执行环境。所以在很多人的概念里CPython就是Python,也就想当然的把GIL
归结为Python语言的缺陷。所以这里要先明确一点:GIL并不是Python的特性,Python完全可以不依赖于GIL
这篇文章透彻的剖析了GIL对python多线程的影响,强烈推荐看一下:http://www.dabeaz.com/python/UnderstandingGIL.pdf
Python threading模块
IO操作不占用cpu
计算占用cpu,1+1
python多线程,不适合cpu密集操作型的任务,适合IO密集型的任务。
多进程适合cpu密集型的任务。
线程有2种调用方式,如下:
直接调用
import threading
import time def sayhi(num): #定义每个线程要运行的函数 print("running on number:%s" %num) time.sleep(3) if __name__ == '__main__': t1 = threading.Thread(target=sayhi,args=(1,)) #生成一个线程实例
t2 = threading.Thread(target=sayhi,args=(2,)) #生成另一个线程实例 t1.start() #启动线程
t2.start() #启动另一个线程 print(t1.getName()) #获取线程名
print(t2.getName())
继承式调用
import threading
import time class MyThread(threading.Thread):
def __init__(self,num):
threading.Thread.__init__(self)
self.num = num def run(self):#定义每个线程要运行的函数 print("running on number:%s" %self.num) time.sleep(3) if __name__ == '__main__': t1 = MyThread(1)
t2 = MyThread(2)
t1.start()
t2.start()
Join & Daemon
Some threads do background tasks, like sending keepalive packets, or performing periodic garbage collection, or whatever. These are only useful when the main program is running, and it's okay to kill them off once the other, non-daemon, threads have exited.
Without daemon threads, you'd have to keep track of them, and tell them to exit, before your program can completely quit. By setting them as daemon threads, you can let them run and forget about them, and when your program quits, any daemon threads are killed automatically.
join()即等待线程结束,threading.current_thread()显示当前主线程。
threading.active_count()查看当前线程个数。
import threading
import time #函数的方法
def run(n):
print('running task',n)
time.sleep(2) # t1=threading.Thread(target=run,args=('t1',))
# t2=threading.Thread(target=run,args=('t2',))
start_time=time.time()
t_objects=[] #存线程实例
for i in range(50):
t=threading.Thread(target=run,args=('t%s'%i,))
t.setDaemon(True) # 把当前线程设置为守护线程,一定在start之前,程序会等非守护线程执行完毕再退出,但不会等守护线程完毕
t.start()
t_objects.append(t)
for i in t_objects:
i.join() print('---all threads has finished...')
print('cost:',time.time()-start_time)
# t1.start()
# t2.start() # 类的方法 # class MyThread(threading.Thread):
# def __init__(self,n):
# super(MyThread,self).__init__()
# self.n = n
#
# def run(self):
# print('running task',self.n)
#
# t1=MyThread('t1')
# t2=MyThread('t2')
#
# t1.start()
# t2.start()
# t1.join() #wait() #前面直接执行,但需要等第1个线程执行完毕
课程练习代码
#_*_coding:utf-8_*_
__author__ = 'Alex Li' import time
import threading def run(n): print('[%s]------running----\n' % n)
time.sleep(2)
print('--done--') def main():
for i in range(5):
t = threading.Thread(target=run,args=[i,])
t.start()
t.join(1)
print('starting thread', t.getName()) m = threading.Thread(target=main,args=[])
m.setDaemon(True) #将main线程设置为Daemon线程,它做为程序主线程的守护线程,当主线程退出时,m线程也会退出,由m启动的其它子线程会同时退出,不管是否执行完任务
m.start()
m.join(timeout=2)
print("---main thread done----")
Note:Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event
.
线程锁(互斥锁Mutex)
一个进程下可以启动多个线程,多个线程共享父进程的内存空间,也就意味着每个线程可以访问同一份数据,此时,如果2个线程同时要修改同一份数据,会出现什么状况?
import time
import threading def addNum():
global num #在每个线程中都获取这个全局变量
print('--get num:',num )
time.sleep(1)
num -=1 #对此公共变量进行-1操作 num = 100 #设定一个共享变量
thread_list = []
for i in range(100):
t = threading.Thread(target=addNum)
t.start()
thread_list.append(t) for t in thread_list: #等待所有线程执行完毕
t.join() print('final num:', num )
正常来讲,这个num结果应该是0, 但在python 2.7上多运行几次,会发现,最后打印出来的num结果不总是0,为什么每次运行的结果不一样呢? 哈,很简单,假设你有A,B两个线程,此时都 要对num 进行减1操作, 由于2个线程是并发同时运行的,所以2个线程很有可能同时拿走了num=100这个初始变量交给cpu去运算,当A线程去处完的结果是99,但此时B线程运算完的结果也是99,两个线程同时CPU运算的结果再赋值给num变量后,结果就都是99。那怎么办呢? 很简单,每个线程在要修改公共数据时,为了避免自己在还没改完的时候别人也来修改此数据,可以给这个数据加一把锁, 这样其它线程想修改此数据时就必须等待你修改完毕并把锁释放掉后才能再访问此数据。
*注:不要在3.x上运行,不知为什么,3.x上的结果总是正确的,可能是自动加了锁
加锁版本
import time
import threading def addNum():
global num #在每个线程中都获取这个全局变量
print('--get num:',num )
time.sleep(1)
lock.acquire() #修改数据前加锁
num -=1 #对此公共变量进行-1操作
lock.release() #修改后释放 num = 100 #设定一个共享变量
thread_list = []
lock = threading.Lock() #生成全局锁
for i in range(100):
t = threading.Thread(target=addNum)
t.start()
thread_list.append(t) for t in thread_list: #等待所有线程执行完毕
t.join() print('final num:', num )
其他实例:
import threading
import time NUM = 10 def func(l):
global NUM
# 上锁
l.acquire()
NUM -= 1
time.sleep(2)
print(NUM)
# 开锁
l.release() # lock = threading.Lock() # 只能锁一次
lock = threading.RLock() # 支持多重锁和单重锁 for i in range(10):
t = threading.Thread(target=func,args=(lock,))
t.start() 如果不加锁,延时2秒,会等待线程都执行完毕,导致结果全部为0
GIL VS Lock
机智的同学可能会问到这个问题,就是既然你之前说过了,Python已经有一个GIL来保证同一时间只能有一个线程来执行了,为什么这里还需要lock? 注意啦,这里的lock是用户级的lock,跟那个GIL没关系 ,具体我们通过下图来看一下+配合我现场讲给大家,就明白了。
那你又问了, 既然用户程序已经自己有锁了,那为什么C python还需要GIL呢?加入GIL主要的原因是为了降低程序的开发的复杂度,比如现在的你写python不需要关心内存回收的问题,因为Python解释器帮你自动定期进行内存回收,你可以理解为python解释器里有一个独立的线程,每过一段时间它起wake up做一次全局轮询看看哪些内存数据是可以被清空的,此时你自己的程序 里的线程和 py解释器自己的线程是并发运行的,假设你的线程删除了一个变量,py解释器的垃圾回收线程在清空这个变量的过程中的clearing时刻,可能一个其它线程正好又重新给这个还没来及得清空的内存空间赋值了,结果就有可能新赋值的数据被删除了,为了解决类似的问题,python解释器简单粗暴的加了锁,即当一个线程运行时,其它人都不能动,这样就解决了上述的问题, 这可以说是Python早期版本的遗留问题。
RLock(递归锁)
说白了就是在一个大锁中还要再包含子锁,连续锁好次时,必须用递归锁。
import threading,time def run1():
print("grab the first part data")
lock.acquire()
global num
num +=1
lock.release()
return num
def run2():
print("grab the second part data")
lock.acquire()
global num2
num2+=1
lock.release()
return num2
def run3():
lock.acquire()
res = run1()
print('--------between run1 and run2-----')
res2 = run2()
lock.release()
print(res,res2) if __name__ == '__main__': num,num2 = 0,0
lock = threading.RLock()
for i in range(10):
t = threading.Thread(target=run3)
t.start() while threading.active_count() != 1:
print(threading.active_count())
else:
print('----all threads done---')
print(num,num2)
lock 只能锁一次
Rlock 支持多重锁和单重锁
Semaphore(信号量)
互斥锁 同时只允许一个线程更改数据,而Semaphore是同时允许一定数量的线程更改数据 ,比如厕所有3个坑,那最多只允许3个人上厕所,后面的人只能等里面有人出来了才能再进去。
import threading,time def run(n):
semaphore.acquire()
time.sleep(1)
print("run the thread: %s\n" %n)
semaphore.release() if __name__ == '__main__': num= 0
semaphore = threading.BoundedSemaphore(5) #最多允许5个线程同时运行
for i in range(20):
t = threading.Thread(target=run,args=(i,))
t.start() while threading.active_count() != 1:
pass #print threading.active_count()
else:
print('----all threads done---')
print(num)
import threading,time def run(n):
semaphore.acquire()
time.sleep(1)
print('run the thread:%s\n'%n)
semaphore.release() if __name__ == '__main__':
semaphore=threading.BoundedSemaphore(5)
for i in range(22):
t=threading.Thread(target=run,args=(i,))
t.start() while threading.active_count() != 1:
pass #print threading.active_count() 不等于1什么也不执行
else:
print('----all threads done-------')
课堂实例
import time,threading NUM = 10 def func(a,l):
global NUM
# 上锁
l.acquire()
NUM -= 1
time.sleep(2)
print(NUM,i)
# 开锁
l.release() # lock = threading.Lock() # 只能锁一次
# lock = threading.RLock() # 支持多重锁和单重锁
lock = threading.BoundedSemaphore(5) #信号量,一次最多放5个线程 for i in range(10):
t = threading.Thread(target=func,args=(i,lock,))
t.start()
Timer
定时器,指定n秒后执行某操作。
This class represents an action that should be run only after a certain amount of time has passed
Timers are started, as with threads, by calling their start()
method. The timer can be stopped (before its action has begun) by calling thecancel()
method. The interval the timer will wait before executing its action may not be exactly the same as the interval specified by the user.
def hello():
print("hello, world") t = Timer(30.0, hello)
t.start() # after 30 seconds, "hello, world" will be printed
Event
所有线程要放一起放,相当于红绿灯
An event is a simple synchronization object;
the event represents an internal flag, and threads
can wait for the flag to be set, or set or clear the flag themselves.
event = threading.Event()
# a client thread can wait for the flag to be set
event.wait()
# a server thread can set or reset it
event.set()
event.clear()
If the flag is set, the wait method doesn’t do anything.
If the flag is cleared, wait will block until it becomes set again.
Any number of threads may wait for the same event.
通过Event来实现两个或多个线程间的交互,下面是一个红绿灯的例子,即起动一个线程做交通指挥灯,生成几个线程做车辆,车辆行驶按红灯停,绿灯
行的规则。
import threading,time
import random
def light():
if not event.isSet():
event.set() #wait就不阻塞 #绿灯状态
count = 0
while True:
if count < 10:
print('\033[42;1m--green light on---\033[0m')
elif count <13:
print('\033[43;1m--yellow light on---\033[0m')
elif count <20:
if event.isSet():
event.clear()
print('\033[41;1m--red light on---\033[0m')
else:
count = 0
event.set() #打开绿灯
time.sleep(1)
count +=1
def car(n):
while 1:
time.sleep(random.randrange(10))
if event.isSet(): #绿灯
print("car [%s] is running.." % n)
else:
print("car [%s] is waiting for the red light.." %n)
if __name__ == '__main__':
event = threading.Event()
Light = threading.Thread(target=light)
Light.start()
for i in range(3):
t = threading.Thread(target=car,args=(i,))
t.start()
# 红绿灯
import threading,time
event = threading.Event() def lighter():
count=0
event.set() # 先设置成绿灯
while True:
if count>5 and count<10: # 改成红灯
event.clear() # 把标志位清除,wait等待
print('\033[41;1mred light is on...\033[0m') elif count>10: # 改成绿灯
event.set() # 设置标志位,变绿灯
count = 0 # 重新计数
else:
print('\033[42;1mgreen light is on...\033[0m') time.sleep(1)
count+=1 def car(name):
while True:
if event.is_set(): # 绿灯
print('[%s] running' %name)
time.sleep(1)
else:
print('[%s] red light,waiting...' %name)
event.wait()
print('\033[34;1m[%s] green light is on,start...\033[0m') light=threading.Thread(target=lighter,)
light.start() car1=threading.Thread(target=car,args=('tesla',))
car1.start()
课堂实例
这里还有一个event使用的例子,员工进公司门要刷卡, 我们这里设置一个线程是“门”, 再设置几个线程为“员工”,员工看到门没打开,就刷卡,刷完卡,门开了,员工就可以通过。
#_*_coding:utf-8_*_
__author__ = 'Alex Li'
import threading
import time
import random def door():
door_open_time_counter = 0
while True:
if door_swiping_event.is_set():
print("\033[32;1mdoor opening....\033[0m")
door_open_time_counter +=1 else:
print("\033[31;1mdoor closed...., swipe to open.\033[0m")
door_open_time_counter = 0 #清空计时器
door_swiping_event.wait() if door_open_time_counter > 3:#门开了已经3s了,该关了
door_swiping_event.clear() time.sleep(0.5) def staff(n): print("staff [%s] is comming..." % n )
while True:
if door_swiping_event.is_set():
print("\033[34;1mdoor is opened, passing.....\033[0m")
break
else:
print("staff [%s] sees door got closed, swipping the card....." % n)
print(door_swiping_event.set())
door_swiping_event.set()
print("after set ",door_swiping_event.set())
time.sleep(0.5)
door_swiping_event = threading.Event() #设置事件 door_thread = threading.Thread(target=door)
door_thread.start() for i in range(5):
p = threading.Thread(target=staff,args=(i,))
time.sleep(random.randrange(3))
p.start()
其它实例:
import threading def func(i,e):
print(i)
e.wait()
print(i+100) event = threading.Event() for i in range(10):
t = threading.Thread(target=func,args=(i,event))
t.start() event.clear() # 停止在wait方法处
inp = input('>>>') if inp == '':
event.set() # 停止的线程放开执行 执行结果:
C:\Python35\python3.exe E:/python34foexam/threading_test.py
0
1
2
3
4
5
6
7
8
9
>>>1
102
108
105
100
104
107
103
109
101
106 Process finished with exit code 0
event默认为阻塞,执行到wait时,检测是什么状态,如果是阻塞,则停,否则是继续。
event.clear()为设置成红灯,即阻塞
evnet.set()设置成绿灯,即放行
条件(Condition)
使得线程等待,只有满足某条件时,才释放n个线程
import threading def run(n):
con.acquire()
con.wait() # 阻塞住所有线程
print("run the thread: %s" %n)
con.release() if __name__ == '__main__': con = threading.Condition()
for i in range(10):
t = threading.Thread(target=run, args=(i,))
t.start() while True:
inp = input('>>>')
if inp == 'q':
break
con.acquire()
con.notify(int(inp)) # 输入几,放几个线程
con.release() 输出结果:
C:\Python35\python3.exe E:/python34foexam/threading_test.py
0
1
2
3
4
5
6
7
8
9
>>>:1
>>>:100
2
>>>:102
101
def condition_func(): ret = False
inp = input('>>>')
if inp == '':
ret = True return ret def run(n):
con.acquire()
con.wait_for(condition_func)
print("run the thread: %s" %n)
con.release() if __name__ == '__main__': con = threading.Condition()
for i in range(10):
t = threading.Thread(target=run, args=(i,))
t.start()
自定义线程池:
控制线程的数量,使用性能最优。
一个容器,放置一定数量的线程(最多有多少线程),取一个少一个,无线程时等待,线程执行完毕,交还线程
# simple version
import queue
import threading
import time class ThreadPool(object):
def __init__(self,maxsize=5): # 最大5个线程
self.maxsize = maxsize
self._q = queue.Queue(maxsize) for i in range(maxsize):
self._q.put(threading.Thread) # 把线程类放进队列 def get_thread(self):
"""获取线程"""
return self._q.get() def add_thread(self):
"""执行完后,加入线程队列"""
self._q.put(threading.Thread) pool = ThreadPool(5) # 最大线程个数5 def task(arg,p):
print(arg)
time.sleep(1) # 分时执行队列中最大数量的线程,本例中为5个
p.add_thread() for i in range(100):
t = pool.get_thread() # t为获得的threading.Thread类
obj = t(target=task,args=(i,pool))
obj.start()
lowb版
上面版本,执行完的线程成为垃圾,没有重用原线程,而是创建新的。
上面的程序,如果线程数小于5个,则5个没必要。
高级版线程池队列不放线程,放任务。把任务的函数和参数存成元组,放到队列中,作为队列的一个原素,即队列中全是各个任务。
然后while True取任务执行,用一个单独列表,保存队列线程数量。
import queue
import threading
import contextlib
import time StopEvent = object() # 填充队列的空值,用于终于取线程 class ThreadPool(object): def __init__(self, max_num, max_task_num = None):
if max_task_num:
self.q = queue.Queue(max_task_num) # 装任务的队列
else:
self.q = queue.Queue()
self.max_num = max_num
self.cancel = False
self.terminal = False
self.generate_list = [] # 创建了多少线程
self.free_list = [] # 当前还空闲的线程数量 def run(self, func, args, callback=None):
"""
线程池执行一个任务
:param func: 任务函数
:param args: 任务函数所需参数
:param callback: 任务执行失败或成功后执行的回调函数,回调函数有两个参数1、任务函数执行状态;2、任务函数返回值(默认为None,即:不执行回调函数)
:return: 如果线程池已经终止,则返回True否则None
"""
if self.cancel:
return
# 空闲任务为0,且活动线程少于最大数时,创建新线程
if len(self.free_list) == 0 and len(self.generate_list) < self.max_num:
self.generate_thread()
w = (func, args, callback,) # 否则,把任务放进队列,把函数三个参数包装成元组,保存。
self.q.put(w) def generate_thread(self):
"""
创建一个线程
"""
t = threading.Thread(target=self.call)
t.start() def call(self):
"""
循环去获取任务函数并执行任务函数
"""
current_thread = threading.currentThread()
self.generate_list.append(current_thread) event = self.q.get()
while event != StopEvent: func, arguments, callback = event # 把任务参数传给三个变量
try:
result = func(*arguments)
success = True
except Exception as e:
success = False
result = None if callback is not None:
try:
callback(success, result)
except Exception as e:
pass with self.worker_state(self.free_list, current_thread):
if self.terminal:
event = StopEvent
else:
event = self.q.get()
else: self.generate_list.remove(current_thread) def close(self):
"""
执行完所有的任务后,所有线程停止
"""
self.cancel = True
full_size = len(self.generate_list)
while full_size:
self.q.put(StopEvent)
full_size -= 1 def terminate(self):
"""
无论是否还有任务,终止线程
"""
self.terminal = True while self.generate_list:
self.q.put(StopEvent) self.q.queue.clear() @contextlib.contextmanager
def worker_state(self, state_list, worker_thread):
"""
用于记录线程中正在等待的线程数
"""
state_list.append(worker_thread)
try:
yield
finally:
state_list.remove(worker_thread) # How to use pool = ThreadPool(5) def callback(status, result):
# status, execute action status
# result, execute action return value
pass def action(i):
print(i) for i in range(30): # 有30个任务
ret = pool.run(action, (i,), callback) time.sleep(5)
print(len(pool.generate_list), len(pool.free_list))
print(len(pool.generate_list), len(pool.free_list))
# pool.close()
# pool.terminate()
advanced version
参考:http://www.cnblogs.com/wupeiqi/articles/4839959.html
queue队列
解耦,使用程序之间松耦合
提高处理效率
queue is especially useful in threaded programming when information must be exchanged safely between multiple threads.
- class
queue.
Queue
(maxsize=0) #先入先出
- class
queue.
LifoQueue
(maxsize=0) #last in fisrt out - class
queue.
PriorityQueue
(maxsize=0) #存储数据时可设置优先级的队列
-
Constructor for a priority queue. maxsize is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If maxsize is less than or equal to zero, the queue size is infinite.
The lowest valued entries are retrieved first (the lowest valued entry is the one returned by
sorted(list(entries))[0]
). A typical pattern for entries is a tuple in the form:(priority_number, data)
.
- exception
queue.
Empty
-
Exception raised when non-blocking
get()
(orget_nowait()
) is called on aQueue
object which is empty.
- exception
queue.
Full
-
Exception raised when non-blocking
put()
(orput_nowait()
) is called on aQueue
object which is full.
-
Queue.
qsize
()
-
Queue.
empty
() #return True if empty
-
Queue.
full
() # return True if full
-
Queue.
put
(item, block=True, timeout=None) -
Put item into the queue. If optional args block is true and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the
Full
exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise theFull
exception (timeout is ignored in that case).
-
Queue.
put_nowait
(item) -
Equivalent to
put(item, False)
.
-
Queue.
get
(block=True, timeout=None) -
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the
Empty
exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise theEmpty
exception (timeout is ignored in that case).
-
Queue.
get_nowait
() -
Equivalent to
get(False)
.
Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads.
-
Queue.
task_done
() -
Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each
get()
used to fetch a task, a subsequent call totask_done()
tells the queue that the processing on the task is complete.If a
join()
is currently blocking, it will resume when all items have been processed (meaning that atask_done()
call was received for every item that had beenput()
into the queue).Raises a
ValueError
if called more times than there were items placed in the queue.
-
Queue.
join
() block直到queue被消费完毕
import queue
q=queue.PriorityQueue() q.put((-1,'chenronghua'))
q.put((3,'hanyang'))
q.put((10,'alex'))
q.put((6,'wangsen')) print(q.get())
print(q.get())
print(q.get())
print(q.get()) 结果排序了
(-1, 'chenronghua')
(3, 'hanyang')
(6, 'wangsen')
(10, 'alex')
优先级队列
其它实例:
import queue
#先进先出队列
#put放数据,是否阻塞,阻塞时的超时时间
#get取数据,默认也是阻塞
#队列最大长度
#其它方法.empty(),.full(),.qsize()
#.join(),.task_done() #阻塞队列,直到任务完成,不再阻塞
q=queue.Queue(5) # 代表最大放2个数据
# q.put(11)
# q.put(22)
# print(q.qsize())
# q.put(33,block=False, timeout=2) # block表示不再阻塞,没位置直接报错,不等超时时间,timeout表示等2秒,如果2秒后还没有新位置就报错
#
# print(q.qsize())
# print(q.get())
# print(q.get(block=False,timeout=2)) # 不等待,直接报错为空; timeout超时两秒。 q.put(123)
q.put(456) q.get()
q.task_done() q.get()
q.task_done() # 告诉任务完成 q.join() #队列里面的任务没完成,就等待,做完操作要告诉队列任务做完了。
先进先出队列:queue.Queue
后进先出队列:queue.LifoQueue
import queue q=queue.LifoQueue()
q.put(123)
q.put(456)
print(q.get()) C:\Python35\python3.exe E:/python2.7pro/test_que.py
456
优先级队列:放数据和优先级,queue.PriorityQueue
import queue
q=queue.PriorityQueue()
q.put((1,'alex1'))
q.put((0,'alex2'))
q.put((2,'alex3')) print(q.get()) C:\Python35\python3.exe E:/python2.7pro/test_que.py
(0, 'alex2')
双向队列:两头放,两头取,queue.deque
q=queue.deque()
q.append(123)
q.append(333)
q.appendleft(456) # 左边插入
q.pop()
q.popleft()
生产者消费者模型
在并发编程中使用生产者和消费者模式能够解决绝大多数并发问题。该模式通过平衡生产线程和消费线程的工作能力来提高程序的整体处理数据的速度。
为什么要使用生产者和消费者模式
在线程世界里,生产者就是生产数据的线程,消费者就是消费数据的线程。在多线程开发当中,如果生产者处理速度很快,而消费者处理速度很慢,那么生产者就必须等待消费者处理完,才能继续生产数据。同样的道理,如果消费者的处理能力大于生产者,那么消费者就必须等待生产者。为了解决这个问题于是引入了生产者和消费者模式。
什么是生产者消费者模式
生产者消费者模式是通过一个容器来解决生产者和消费者的强耦合问题。生产者和消费者彼此之间不直接通讯,而通过阻塞队列来进行通讯,所以生产者生产完数据之后不用等待消费者处理,直接扔给阻塞队列,消费者不找生产者要数据,而是直接从阻塞队列里取,阻塞队列就相当于一个缓冲区,平衡了生产者和消费者的处理能力。
下面来学习一个最基本的生产者消费者模型的例子
import threading
import queue def producer():
for i in range(10):
q.put("骨头 %s" % i ) print("开始等待所有的骨头被取走...")
q.join()
print("所有的骨头被取完了...") def consumer(n): while q.qsize() >0: print("%s 取到" %n , q.get())
q.task_done() #告知这个任务执行完了 q = queue.Queue() p = threading.Thread(target=producer,)
p.start() c1 = consumer("李闯")
import time,random
import queue,threading
q = queue.Queue()
def Producer(name):
count = 0
while count <20:
time.sleep(random.randrange(3))
q.put(count)
print('Producer %s has produced %s baozi..' %(name, count))
count +=1
def Consumer(name):
count = 0
while count <20:
time.sleep(random.randrange(4))
if not q.empty():
data = q.get()
print(data)
print('\033[32;1mConsumer %s has eat %s baozi...\033[0m' %(name, data))
else:
print("-----no baozi anymore----")
count +=1
p1 = threading.Thread(target=Producer, args=('A',))
c1 = threading.Thread(target=Consumer, args=('B',))
p1.start()
c1.start()
多进程multiprocessing
multiprocessing
is a package that supports spawning processes using an API similar to the threading
module. The multiprocessing
package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing
module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
from multiprocessing import Process
import time
def f(name):
time.sleep(2)
print('hello', name) if __name__ == '__main__': # windows下必须写在此处,windows下慎用多进程
p = Process(target=f, args=('bob',))
p.start()
p.join()
To show the individual process IDs involved, here is an expanded example:
from multiprocessing import Process
import os def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid()) # ppid父进程号
print('process id:', os.getpid()) # pid 自己的进程号
print("\n\n") def f(name):
info('\033[31;1mfunction f\033[0m')
print('hello', name) if __name__ == '__main__':
info('\033[32;1mmain process line\033[0m')
p = Process(target=f, args=('bob',))
p.start()
p.join()
进程间的数据共享为重点
进程间通讯
不同进程间内存是不共享的,要想实现两个进程间的数据交换,可以用以下方法:
#!/usr/bin/env python
#coding:utf-8 from multiprocessing import Process
from multiprocessing import Manager import time li = [] def foo(i):
li.append(i)
print 'say hi',li for i in range(10):
p = Process(target=foo,args=(i,))
p.start() print 'ending',li
进程间默认无法数据共享
#方法一,Array
from multiprocessing import Process,Array
temp = Array('i', [11,22,33,44]) def Foo(i):
temp[i] = 100+i
for item in temp:
print i,'----->',item for i in range(2):
p = Process(target=Foo,args=(i,))
p.start() #方法二:manage.dict()共享数据
from multiprocessing import Process,Manager manage = Manager()
dic = manage.dict() def Foo(i):
dic[i] = 100+i
print dic.values() for i in range(2):
p = Process(target=Foo,args=(i,))
p.start()
p.join()
'c': ctypes.c_char, 'u': ctypes.c_wchar,
'b': ctypes.c_byte, 'B': ctypes.c_ubyte,
'h': ctypes.c_short, 'H': ctypes.c_ushort,
'i': ctypes.c_int, 'I': ctypes.c_uint,
'l': ctypes.c_long, 'L': ctypes.c_ulong,
'f': ctypes.c_float, 'd': ctypes.c_double
类型对应表
from multiprocessing import Process, Queue def f(i,q):
print(i,q.get()) if __name__ == '__main__':
q = Queue() q.put("h1")
q.put("h2")
q.put("h3") for i in range(10):
p = Process(target=f, args=(i,q,))
p.start() Code
当创建进程时(非使用时),共享数据会被拿到子进程中,当进程中执行完毕后,再赋值给原值。
#!/usr/bin/env python
# -*- coding:utf-8 -*- from multiprocessing import Process, Array, RLock def Foo(lock,temp,i):
"""
将第0个数加100
"""
lock.acquire()
temp[0] = 100+i
for item in temp:
print i,'----->',item
lock.release() lock = RLock()
temp = Array('i', [11, 22, 33, 44]) for i in range(20):
p = Process(target=Foo,args=(lock,temp,i,))
p.start()
进程锁实例
Queues
使用方法跟threading里的queue差不多
from multiprocessing import Process, Queue def f(q):
q.put([42, None, 'hello']) if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print(q.get()) # prints "[42, None, 'hello']"
p.join()
Pipes
The Pipe()
function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example:
from multiprocessing import Process, Pipe def f(conn):
conn.send([42, None, 'hello'])
conn.close() if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
print(parent_conn.recv()) # prints "[42, None, 'hello']"
p.join()
The two connection objects returned by Pipe()
represent the two ends of the pipe. Each connection object has send()
and recv()
methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.
A manager object returned by Manager()
controls a server process which holds Python objects and allows other processes to manipulate them using proxies.
A manager returned by Manager()
will support types list
, dict
, Namespace
, Lock
, RLock
, Semaphore
, BoundedSemaphore
, Condition
, Event
, Barrier
, Queue
, Value
and Array
. For example,
from multiprocessing import Process, Manager def f(d, l):
d[1] = ''
d[''] = 2
d[0.25] = None
l.append(1)
print(l) if __name__ == '__main__':
with Manager() as manager:
d = manager.dict() l = manager.list(range(5))
p_list = []
for i in range(10):
p = Process(target=f, args=(d, l))
p.start()
p_list.append(p)
for res in p_list:
res.join() print(d)
print(l)
进程同步
Without using the lock output from the different processes is liable to get all mixed up.
屏幕等资源是各进程抢占的资源
from multiprocessing import Process, Lock def f(l, i):
l.acquire()
try:
print('hello world', i)
finally:
l.release() if __name__ == '__main__':
lock = Lock() for num in range(10):
Process(target=f, args=(lock, num)).start()
进程锁:
lock,Rlock....同线程锁
进程池
进程池内部维护一个进程序列,当使用时,则去进程池中获取一个进程,如果进程池序列中没有可供使用的进进程,那么程序就会等待,直到进程池中有可用进程为止。
进程池中有两个方法:
- apply
- apply_async #异步执行进程
from multiprocessing import Process,Pool
import time def Foo(i):
time.sleep(2)
return i+100 def Bar(arg):
print('-->exec done:',arg) pool = Pool(5) for i in range(10):
pool.apply_async(func=Foo, args=(i,),callback=Bar) #并行执行 callback是回调参数,执行完Foo,再执行Bar
#pool.apply(func=Foo, args=(i,)) #串行执行 print('end')
pool.close()
pool.join()#进程池中进程执行完毕后再关闭,如果注释,那么程序直接关闭。
上面适用于linux,windows上代码,多进程的实例
from multiprocessing import Process,Pool,freeze_support # windows多进程导入freeze_support
import time,os def Foo(i):
time.sleep(2)
print('in process',os.getpid())
return i+100 def Bar(arg):
print('-->exec done:',arg) if __name__ == '__main__': # windows上必须加此行执行多进程
#freeze_support()
pool=Pool(5)
for i in range(10):
#pool.apply_async(func=Foo, args=(i,), callback=Bar)
pool.apply(func=Foo,args=(i,)) print('end')
pool.close()
pool.join()
windows上代码
if __name__ == '__main__':主动执行脚本时会执行它的内容,如果别的代码导入它,不执行。
总结:
一、线程:
基本使用
线程锁
自定义线程池
生产消费者模型(队列)
二、进程
基本使用
进程锁
进程数据共享
默认数据不共享
queues
array
Manager.dict
进程池
PS:IO密集型-多线程 爬虫、
计算密集型-多进程
三、协程
原理,利用一个线程,分解一个线程成为多个微线程。
http://www.cnblogs.com/ld1977/p/6352256.html
人工创建的。