使用Amazon S3和的Cloudfront的智能缓存webapges缓存、智能、Amazon、webapges

由网友(QQWMLY.COM)分享简介:我有一个网站(在Tomcat的弹性魔豆上运行)产生的艺术家唱片分类目录(单页一位艺术家)。这可以是资源密集型的,所以作为艺术家页面不超过一个月的时间我把CloudFront的分布在它前面的改变。 我认为这将意味着没有过送达不止一次被我的服务器但是它不太好作为艺术家的要求。这篇文章解释说,每一个边缘位置(欧洲,美国等),...




不过: 的要求还是要过到服务器,以检查艺术家页面存在。 如果艺术家页面存在,那么所述网页(和它们有时可以是大的向上到20MB)首先下载到服务器,然后服务器返回的页面。

所以,我想知道如果我能改善这一点 - 我知道你可以构建一个S3桶作为重定向到另一个网站。是否有一个单页的方式我能得到艺术家要求,到S3的桶,然后把它返回的页面是否存在,或拨打服务器,如果它不?







AWS支持 $ P $对 - 签署网址其可以是有效的时间短量;我们可以尝试使用同样在这里,以避免在安全性等问题。


                    | ------>生成内容------->保存到S3和服务




基本上,对于每一个进入的请求,返回S3中的URL,如果该数据已经存在,否则为它创建的SQS一个任务,生成数据,并将其推到S3。根据您的使用模式不同的艺术家,你应该有多少时间才能在数据齐心协力平均,所以返回的将是有效的estimated_time_for_completetion( T的URL的估计)的任务。

客户端等待时间 T ,然后让到URL之前返回的请求。它使高达3说的尝试失败的情况下获取这些数据。事实上,在S3中已有的数据可以被认为是基本情况,当 T = 0

亚马逊s3的使用方法 使用Amazon S3 –第一部分


客户端-------->服务器--------------------------------> S3
                    | --------------->生成内容------->保存到S3

客户端-------------------------> S3



因此​​,在第一击,我们检查内容曾经被生成pviously $ P $,在这种情况下,我们得到一个成功的URL,或者错误消息。如果成功,下一个点击进入S3。


客户端-------->服务器--------------------------------> S3

                                       | --->添加SQS任务推到S3

I have a website (running within Tomcat on Elastic Beanstalk) that generates artist discographies (a single page for one artist). This can be resource intensive, so as the artist pages don't change over a month period I put a CloudFront Distribution in front of it.

I thought this would mean no artist request ever had to be served more than once by my server however its not quite as good as that. This post explains that every edge location (Europe, US etc.) will get a miss the first time they look up the resource and that there is a limit to how many resources are kept in the cloudfront cache so they could be dropped.

So to counter this I have changed by server code to store a copy of the webpage in a bucket within S3 AND to check this first when a request comes in, so if the artist page already exists in S3 then the server retrieves it and returns its contents as the webpage. This greatly reduces the processing as it only constructs a webpage for a particular artist once.


The request still has to go to the server to check if the artist page exists. If the artist page exists then the webpage (and they can sometimes be large up-to 20mb) is first downloaded to the server and then server returns the page.

So I wanted to know if I could improve this - I know you can construct an S3 bucket as a redirect to another website. Is there a per-page way I could get the artist request to go to the S3 bucket and then have it return the page if it exists or call server if it does not?

Alternatively could I get the server to check if page exists and then redirect to the S3 page rather than download the page to the server first?


OP says:

they can sometimes be large up-to 20mb

Since the volume of data you serve can be pretty large, I think it is feasible for you to do this in 2 requests instead of one, where you decouple the content generation from the content serving part. The reason to do this is so as to minimize the amount of time/resources it takes on the server to fetch data from S3 and serve it.

AWS supports pre-signed URLs which can be valid for a short amount of time; We can try using the same here to avoid issues around security etc.

Currently, your architecture looks something like below, wherein. the client initiates a request, you check if the requested data exists on the S3 and then fetch and serve it if there, else you generate the content, and save it to S3:

                           if exists on S3
client --------> server --------------------> fetch from s3 and serve
                    |------> generate content -------> save to S3 and serve

In terms of network resources, you always consume 2X the amount of bandwidth and time here. If the data exists, then once you have to pull it from server and serve it to customer (so it is 2X). If the data doesn't exist, you send it to customer and to S3 (so again it is 2X)

Instead, you can try 2 approaches below, both of which assume that you have some base template, and that the other data can be fetched via AJAX calls, and both of which bring down that 2X factor in the overall architecture.

Serve the content from S3 only. This calls for changes to the way your product is designed, and hence may not be that easily integrable.

Basically, for every incoming request, return the S3 URL for it if the data already exists, else create a task for it in SQS, generate the data and push it to S3. Based on your usage patterns for different artists, you should be having an estimate of how much time it takes to pull together the data on the average, and so return a URL which would be valid with the estimated_time_for_completetion(T) of the task.

The client waits for time T, and then makes the request to the URL returned earlier. It makes upto say 3 attempts for fetching this data in case of failure. In fact, the data already existing on S3 can be thought of as the base case when T = 0.

In this case, you make 2-4 network requests from the client, but only the first of those requests comes to your server. You transmit the data once to S3 only in the case it doesn't exists and the client always pulls in from S3.

                           if exists on S3, return URL
client --------> server --------------------------------> s3
                    |else SQS task
                    |---------------> generate content -------> save to S3 
                     return pre-computed url

           wait for time `T`
client  -------------------------> s3

Check if data already exists, and make second network call accordingly.

This is similar to what you currently do when serving data from the server in case it doesn't already exist. Again, we make 2 requests here, however, this time we try to serve data synchronously from the server in the case it doesn't exist.

So, in the first hit, we check if the content had ever been generated previously, in which case, we get a successful URL, or error message. When successful, the next hit goes to S3.

If the data doesn't exist on S3, we make a fresh request (to a different POST URL), on getting which, the server computes data, serves it, while adding an asynchronous task to push it to S3.

                           if exists on S3, return URL
client --------> server --------------------------------> s3

client --------> server ---------> generate content -------> serve it
                                       |---> add SQS task to push to S3


