使用AWS用于并行处理,其中RAWS

由网友(有你就好)分享简介:我想通过建立一个模型,为每一位客户采取射击在 Kaggle Dunnhumby挑战。我想将数据分成十组,并使用亚马逊网络服务(AWS)的生成使用R于十组并行模型。一些相关的链接我所遇到的是:I want to take a shot at the Kaggle Dunnhumby challenge by build...

我想通过建立一个模型,为每一位客户采取射击在 Kaggle Dunnhumby挑战。我想将数据分成十组,并使用亚马逊网络服务(AWS)的生成使用R于十组并行模型。一些相关的链接我所遇到的是:

I want to take a shot at the Kaggle Dunnhumby challenge by building a model for each customer. I want to split the data into ten groups and use Amazon web-services (AWS) to build models using R on the ten groups in parallel. Some relevant links I have come across are:

的 SEGUE包; 系统 presentation 使用亚马逊的并行网络服务。 The segue package; A presentation on parallel web-services using Amazon.

我不明白的是:

如何获取数据到十个节点? 如何发送和执行R的功能节点上?

我会很感激,如果你可以分享的建议和提示点我在正确的方向。

I would be very grateful if you could share suggestions and hints to point me in the right direction.

PS我使用AWS免费使用账户,但它是很难从源头上亚马逊的Linux AMI的安装R上。

PS I am using the free usage account on AWS but it was very difficult to install R from source on the Amazon Linux AMIs (lots of errors due to missing headers, libraries and other dependencies).

推荐答案

您可以在AWS手工建立的一切。你必须有多个实例建立你自己的Amazon计算机集群。有可用的一个很好的教程视频在亚马逊网站: http://www.youtube.com/watch?v=YfCgK1bmCjw

You can build up everything manually at AWS. You have to build your own amazon computer cluster with several instances. There is a nice tutorial video available at the Amazon website: http://www.youtube.com/watch?v=YfCgK1bmCjw

但它会带你几个小时才能运行一切:

But it will take you several hours to get everything running:

在首发11人EC2实例(每个组一个实例+一个脑袋实例) 在所有机器上的R和MPI(检查preinstalled图片) 在正确配置MPI(可能增加安全层) 在最好的情况下,将被安装到所有节点文件服务器(共享数据) 在这个基础架构的最佳解决方案是使用的雪或包装的foreach(与RMPI)的

该SEGUE包是好的,但你一定会得到的数据通信问题!

The segue package is nice but you will definitely get data communication problems!

该simples解决方案是cloudnumbers.com(http://www.cloudnumbers.com)。这个平台提供了方便地访问计算机集群在云中。您可以测试5小时免费与云中的小计算机集群!检查从用户大会的幻灯片:http://cloudnumbers.com/hpc-news-from-the-user2011-conference

The simples solution is cloudnumbers.com (http://www.cloudnumbers.com). This platform provides you with easy access to computer clusters in the cloud. You can test 5 hours for free with a small computer cluster in the cloud! Check the slides from the useR conference: http://cloudnumbers.com/hpc-news-from-the-user2011-conference

阅读全文

相关推荐

最新文章