如何创建的Java Web爬虫?爬虫、Java、Web

由网友(萝莉小仙女)分享简介:您好我想在Java中创建一个网络爬虫中,我希望以检索从网页标题一样的一些数据,描述和存储在数据库中的DATAS Hi i want to create a web crawler in java in which i want to retrive some data like title, description...

您好我想在Java中创建一个网络爬虫中,我希望以检索从网页标题一样的一些数据,描述和存储在数据库中的DATAS

Hi i want to create a web crawler in java in which i want to retrive some data like title, description from the web page and store the datas in database

推荐答案

如果你想要做自己使用包含的

If you want to do your own use the included HttpClient in the android API.

HttpClient的实例的使用(你只需要分析出:

Example usage of HttpClient (you only need to parse out the :

public class HttpTest {
    public static void main(String... args) 
    throws ClientProtocolException, IOException {
        crawlPage("http://www.google.com/");
    }

    static Set<String> checked = new HashSet<String>();

    private static void crawlPage(String url) throws ClientProtocolException, IOException {

        if (checked.contains(url))
            return;

        checked.add(url);

        System.out.println("Crawling: " + url);

        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet("http://www.google.com");
        HttpResponse response = client.execute(request);

        Reader reader = null;
        try {
            reader = new InputStreamReader(response.getEntity().getContent());

            Links links = new Links();
            new ParserDelegator().parse(reader, links, true);

            for (String link : links.list) 
                if (link.startsWith("http://"))
                    crawlPage(link);

        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }



    static class Links extends HTMLEditorKit.ParserCallback {

        List<String> list = new LinkedList<String>();

        public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
            if (t == HTML.Tag.A)
                list.add(a.getAttribute(HTML.Attribute.HREF).toString());
        }
    }
}
阅读全文

相关推荐

最新文章