在AWS上的Ubuntu使用游泳池Python的多进程失败游泳池、进程、AWS、Ubuntu

由网友(始于初见止于终老)分享简介:我有一个测试只是罚款,最多可容纳8池的工人在我的本地MAC多处理4个核心一个简单的字符串匹配的脚本。然而,在一个AWS c1.xlarge与8核的同一个脚本通常杀死所有,但2工人,CPU只在25%以上,经过几个回合停止与的MemoryError 。 我不是太熟悉服务器配置,所以我不知道是否有任何的设置来调整?池实现如...

我有一个测试只是罚款,最多可容纳8池的工人在我的本地MAC多处理4个核心一个简单的字符串匹配的脚本。然而,在一个AWS c1.xlarge与8核的同一个脚本通常杀死所有,但2工人,CPU只在25%以上,经过几个回合停止与的MemoryError

我不是太熟悉服务器配置,所以我不知道是否有任何的设置来调整?

池实现如下所示,但似乎并没有成为问题,因为它工作在本地。会有每个工人几千目标,并且它不过去的前五个左右的运行。高兴,如果有必要分享更多的code。

 池=池(进程= numProcesses)
totalTargets = LEN(getTargets('所有'))
targetsPerBatch = totalTargets / numProcesses
pool.map_async(runMatch,itertools.izip(itertools.repeat(targetsPerBatch)的xrange(0,totalTargets,targetsPerBatch)))。得到(99999999)
pool.close()
pool.join()
 

解决方案

的MemoryError 意味着你用完了整个系统的虚拟内存。多少虚拟内存,你已经是一个抽象的东西,根据实际物理内存加交换文件大小加东西,这分页从其他文件和材料的内存未分页的任何地方,因为操作系统正在被聪明等等。

根据您的意见,每个进程平均0.75GB的实际内存和虚拟内存4GB。因此,您的总虚拟机的用法是32GB。

一个常见的​​原因是每个进程可能的峰的在4GB,但花几乎所有的时间用了很多少一些。 Python中很少释放内存,操作系统;它会一下就调出。

总之,真正的6GB内存是8GB的苹果或7GB c1.xlarge实例没有问题。

和虚拟机的32GB是在Mac上没有问题。一个典型的OS X系统具有几乎无限的虚拟内存大小,如果你真的尝试使用所有的一切,它会自动开始创造更多的交换空间,寻呼像疯了一样,和你的系统慢如蜗牛和/或运行的磁盘空间,但不会影响在此情况

但是,虚拟机32GB的很可能是Linux上的问题。典型的Linux系统具有固定大小的交换,并且不会让你把虚拟机超出了它可以处理。 (它具有的不同的的伎俩,避免造成摆在首位大概-不必要的网页...但一​​旦你创建的网页,你必须有空间他们。)我不知道什么是XLARGE所配置的,但 swapon命令 工具会告诉你多少交换你有(多少你正在使用)。

总之,最简单的办法是创建并在您XLARGE启用额外32GB的交换文件。

然而,一个更好的解决办法是减少你的虚拟机使用。通常,每个子是做了一大堆的设置工作,创造了从未再次需要中间数据;您可以使用来安装程序推入,一旦他们完成退出,释放了虚拟机不同的过程。或者,也许你能找到一种方法做处理更懒洋洋地,以避免需要摆在首位的所有中间数据。

Python Aws Ubuntu20 配置多IP

I have a simple string matching script that tests just fine for multiprocessing with up to 8 Pool workers on my local mac with 4 cores. However, the same script on an AWS c1.xlarge with 8 cores generally kills all but 2 workers, the CPU only works at 25%, and after a few rounds stops with MemoryError.

I'm not too familiar with server configuration, so I'm wondering if there are any settings to tweak?

The pool implementation looks as follows, but doesn't seem to be the issue as it works locally. There would be several thousand targets per worker, and it doesn't run past the first five or so. Happy to share more of the code if necessary.

pool = Pool(processes = numProcesses)
totalTargets = len(getTargets('all'))
targetsPerBatch = totalTargets / numProcesses
pool.map_async(runMatch, itertools.izip(itertools.repeat(targetsPerBatch), xrange(0, totalTargets, targetsPerBatch))).get(99999999)
pool.close()
pool.join()

解决方案

The MemoryError means you're running out of system-wide virtual memory. How much virtual memory you have is an abstract thing, based on the actual physical RAM plus swapfile size plus stuff that's paged into memory from other files and stuff that isn't paged anywhere because the OS is being clever and so on.

According to your comments, each process averages 0.75GB of real memory, and 4GB of virtual memory. So, your total VM usage is 32GB.

One common reason for this is that each process might peak at 4GB, but spend almost all of its time using a lot less than that. Python rarely releases memory to the OS; it'll just get paged out.

Anyway, 6GB of real memory is no problem on an 8GB Mac or a 7GB c1.xlarge instance.

And 32GB of VM is no problem on a Mac. A typical OS X system has virtually unlimited VM size—if you actually try to use all of it, it'll start creating more swap space automatically, paging like mad, and slowing your system to a crawl and/or running out of disk space, but that isn't going to affect you in this case.

But 32GB of VM is likely to be a problem on linux. A typical linux system has fixed-size swap, and doesn't let you push the VM beyond what it can handle. (It has a different trick that avoids creating probably-unnecessary pages in the first place… but once you've created the pages, you have to have room for them.) I'm not sure what an xlarge comes configured for, but the swapon tool will tell you how much swap you've got (and how much you're using).

Anyway, the easy solution is to create and enable an extra 32GB swapfile on your xlarge.

However, a better solution would be to reduce your VM use. Often each subprocess is doing a whole lot of setup work that creates intermediate data that's never needed again; you can use multiprocessing to push that setup into different processes that quit as soon as they're done, freeing up the VM. Or maybe you can find a way to do the processing more lazily, to avoid needing all that intermediate data in the first place.

阅读全文

相关推荐

最新文章