{"id":1921,"date":"2020-12-03T14:28:54","date_gmt":"2020-12-03T06:28:54","guid":{"rendered":"https:\/\/www.yusian.com\/blog\/?p=1921"},"modified":"2020-12-03T14:28:54","modified_gmt":"2020-12-03T06:28:54","slug":"xml%e8%a7%a3%e6%9e%90%e5%b7%a5%e5%85%b7jsoup%e7%9a%84%e5%9f%ba%e6%9c%ac%e4%bd%bf%e7%94%a8","status":"publish","type":"post","link":"https:\/\/www.yusian.com\/blog\/java\/2020\/12\/03\/1428541921.html","title":{"rendered":"XML\u89e3\u6790\u5de5\u5177jsoup\u7684\u57fa\u672c\u4f7f\u7528"},"content":{"rendered":"<h3>1\u3001\u5bfc\u5305<\/h3>\n<p>\u5b98\u7f51\u5730\u5740\uff1a<a class=\"wp-editor-md-post-content-link\" href=\"https:\/\/jsoup.org\/\">https:\/\/jsoup.org\/<\/a> \u8fd9\u91cc\u53ef\u4ee5\u627e\u5230jar\u5305\u7684\u4e0b\u8f7d\u94fe\u63a5\uff1b<\/p>\n<h3>2\u3001\u7b80\u8981\u8bf4\u660e<\/h3>\n<ul>\n<li>\u6309\u7167\u5b98\u7f51\u7684\u4ecb\u7ecd\uff0c<code>jsoup<\/code>\u662f\u4e00\u4e2a\u7528\u4e8e\u89e3\u6790HTML\u7684Java\u5e93\uff0cXML\u76f8\u5bf9\u4e8eHTML\u6765\u8bb2\u66f4\u4e3a\u4e25\u683c\u80af\u5b9a\u4e5f\u662f\u6ca1\u6709\u95ee\u9898\u7684\uff1b<\/li>\n<li><code>jsoup<\/code>\u4ee5\u6587\u4ef6\u3001\u5b57\u8282\u6d41\u6216URL\u7b49\u5404\u79cd\u65b9\u5f0f\u5c06HTML\/XML\u6587\u4ef6\u8f6c\u6362\u4e3a\u4e00\u4e2a<code>Document<\/code>\u5bf9\u8c61\uff1b<\/li>\n<li>\u8fd9\u4e2a<code>Document<\/code>\u5bf9\u8c61\u548cHTML\u4e2d\u7684Document\u5bf9\u8c61\u6781\u4e3a\u7c7b\u578b\uff0c\u751a\u81f3\u5f88\u591a\u65b9\u6cd5\u90fd\u5b8c\u5168\u76f8\u540c\uff0c\u53ef\u4ee5\u7b80\u5355\u5730\u7406\u89e3\u4e3aDOM\u6811\u7684Java\u5b9e\u73b0\uff1b<br \/>\n<!--more--><\/li>\n<li>DOM\u64cd\u4f5c\u4e2d\u6bd4\u8f83\u91cd\u8981\u7684\u4e00\u4e2a\u73af\u8282\u5c31\u662f\u5b50\u5143\u7d20\u7684\u83b7\u53d6\n<ul>\n<li><code>Element<\/code>\uff1aDocument\u7ee7\u627f\u81eaElement\uff0c\u5373\u7ee7\u627f\u4e86Element\u83b7\u53d6\u5b50\u5143\u7d20\u7684\u5404\u79cd\u65b9\u6cd5<\/li>\n<li><code>Selector<\/code>\uff1ajsoup\u4e2d\u81ea\u5e26\u7684\u9009\u62e9\u5668\u7c7b\uff0c\u76f8\u5f53\u4e8eH5\u7684selector<\/li>\n<li><code>JsoupXpath<\/code>\uff1a\u652f\u6301XPath\u8bed\u6cd5\u7684\u89e3\u6790\u5668<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>3\u3001\u76f8\u5173\u793a\u4f8b<\/h3>\n<h4>3.0\u3001XML\u6587\u4ef6<\/h4>\n<pre data-language=XML><code class=\"language-markup line-numbers\">&lt;?xml version=\"1.0\" ?&gt;\n&lt;students&gt;\n    &lt;student&gt;\n        &lt;name number=\"s01\"&gt;\u5f20\u4e09&lt;\/name&gt;\n        &lt;age&gt;19&lt;\/age&gt;\n        &lt;gender&gt;\u7537&lt;\/gender&gt;\n    &lt;\/student&gt;\n    &lt;student&gt;\n        &lt;name number=\"s02\"&gt;\u674e\u56db&lt;\/name&gt;\n        &lt;age&gt;18&lt;\/age&gt;\n        &lt;gender&gt;\u5973&lt;\/gender&gt;\n    &lt;\/student&gt;\n&lt;\/students&gt;\n<\/code><\/pre>\n<h4>3.1\u3001Element\u65b9\u6cd5\u83b7\u53d6\u5b50\u8282\u70b9<\/h4>\n<pre><code class=\"language-java line-numbers\">private static void demo01() throws IOException {\n    \/\/ \u901a\u8fc7\u7c7b\u52a0\u8f7d\u5668\u5b9a\u4f4d\u83b7\u53d6xml\u6587\u4ef6\u8def\u5f84\n    ClassLoader loader = JsoupDemo.class.getClassLoader();\n    URL url = loader.getResource(\"students.xml\");\n    \/\/ \u8fd9\u4e2aDocument\u5bf9\u8c61\u548cweb\u4e2d\u7684Document\u5bf9\u8c61\u6781\u5176\u5730\u7c7b\u4f3c\n    Document document = Jsoup.parse(new File(url.getPath()), \"utf-8\");\n    \/\/ System.out.println(document);\n    Elements elements = document.getElementsByTag(\"name\");\n    for (Element element : elements) {\n        System.out.println(element.text());\n    }\n}\n<\/code><\/pre>\n<h4>3.2\u3001Selector\u83b7\u53d6\u5b50\u5143\u7d20<\/h4>\n<pre><code class=\"language-java line-numbers\">private static void demo03() throws IOException {\n    \/\/ \u901a\u8fc7\u7c7b\u52a0\u8f7d\u5668\u5b9a\u4f4d\u83b7\u53d6xml\u6587\u4ef6\u4f4d\u7f6e\n    ClassLoader loader = JsoupDemo.class.getClassLoader();\n    String path = loader.getResource(\"students.xml\").getPath();\n    \/\/ \u83b7\u53d6Document\u5bf9\u8c61\uff0c\u4e5f\u53ef\u4ee5\u8bf4\u662f\u6839\u8282\u70b9\u5143\u7d20\n    Document doc = Jsoup.parse(new File(path), \"utf-8\");\n    \/\/ \u652f\u6301id\u9009\u62e9\u5668\u6807\u8bc6'#'\uff0c\u7c7b\u9009\u62e9\u5668\u6807\u8bc6'.'\uff0c\u6807\u7b7e\u9009\u62e9\u5668\u53ca\u5c5e\u6027\u9009\u62e9\u5668\u7b49\n    Elements element = doc.select(\"name[number=s01]\");\n    System.out.println(element);\n}\n<\/code><\/pre>\n<p>\u8f93\u51fa\u7ed3\u679c\uff1a<\/p>\n<pre><code class=\"language-cmd line-numbers\">&lt;name number=\"s01\"&gt;\n \u5f20\u4e09\n&lt;\/name&gt;\n<\/code><\/pre>\n<h4>3.3\u3001JsoupXpath<\/h4>\n<ul>\n<li>github\u5730\u5740\uff1a<a class=\"wp-editor-md-post-content-link\" href=\"https:\/\/github.com\/zhegexiaohuozi\/JsoupXpath\">https:\/\/github.com\/zhegexiaohuozi\/JsoupXpath<\/a><\/li>\n<li>\u6839\u636e\u5b98\u65b9\u7684\u8bf4\u6cd5\uff0cJsoupXpath\u8bed\u6cd5\u89e3\u6790\u5904\u7406\u91c7\u7528Antlr4\u8fdb\u884c\u4e86\u91cd\u6784\uff0c\u56e0\u6b64\u4f7f\u7528\u65f6\u8fd8\u9700\u8981\u6dfb\u52a0\u5176\u4ed6\u989d\u5916\u7684\u4f9d\u8d56\u5305\n<ul>\n<li>antlr4-runtime-4.7.2.jar<\/li>\n<li>slf4j-api-1.7.25.jar<\/li>\n<li>commons-lang3-3.3.2.jar<\/li>\n<\/ul>\n<\/li>\n<li>JsoupXpath\u6709\u81ea\u5df1\u7684Document\u5bf9\u8c61<code>JXDocument<\/code>\u4e0eNode\u5bf9\u8c61<code>JXNode<\/code><\/li>\n<\/ul>\n<pre><code class=\"language-java line-numbers\">public class JsoupXpathDemo {\n    public static void main(String[] args) throws IOException {\n        \/\/ \u83b7\u53d6Document\u5bf9\u8c61\n        ClassLoader loader = JsoupXpathDemo.class.getClassLoader();\n        String path = loader.getResource(\"students.xml\").getPath();\n        Document doc = Jsoup.parse(new File(path), \"utf-8\");\n        \/\/ \u901a\u8fc7Document\u5bf9\u8c61\u521d\u59cb\u5316JXDocument\u5bf9\u8c61\n        JXDocument jxDoc = JXDocument.create(doc);\n        \/\/ \u4f7f\u7528Xpath\u8bed\u6cd5\u8fdb\u884c\u5b50\u5143\u7d20\u9009\u62e9\n        JXNode node = jxDoc.selNOne(\"\/\/name[@number='s01']\");\n        \/\/ \u5bf9\u8c61\u4e4b\u95f4\u76f8\u4e92\u8f6c\u6362\uff0c\u7c7b\u6bd4jQuery\u5bf9\u8c61\u4e0eDOM\u5bf9\u8c61\u4e4b\u95f4\u7684\u5173\u7cfb\n        Element element = node.asElement();\n        System.out.println(element);\n        System.out.println(node);\n    }\n}\n<\/code><\/pre>\n<p>\u8fd0\u884c\u7ed3\u679c\uff1a<\/p>\n<pre><code class=\"language-cmd line-numbers\">&lt;name number=\"s01\"&gt;\n \u5f20\u4e09\n&lt;\/name&gt;\n&lt;name number=\"s01\"&gt;\n \u5f20\u4e09\n&lt;\/name&gt;\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>1\u3001\u5bfc\u5305 \u5b98\u7f51\u5730\u5740\uff1ahttps:\/\/jsoup.org\/ \u8fd9\u91cc\u53ef\u4ee5\u627e\u5230jar\u5305\u7684\u4e0b\u8f7d\u94fe\u63a5\uff1b 2\u3001\u7b80\u8981\u8bf4\u660e \u6309\u7167\u5b98\u7f51\u7684\u4ecb\u7ecd\uff0cjsoup\u662f\u4e00\u4e2a\u7528\u4e8e\u89e3\u6790HTML\u7684Java\u5e93\uff0cXML\u76f8\u5bf9\u4e8eHTML\u6765\u8bb2\u66f4\u4e3a\u4e25\u683c\u80af\u5b9a\u4e5f\u662f\u6ca1\u6709\u95ee\u9898\u7684\uff1b jsoup\u4ee5\u6587\u4ef6\u3001\u5b57\u8282\u6d41\u6216URL\u7b49\u5404\u79cd\u65b9\u5f0f\u5c06HTML\/XML\u6587\u4ef6\u8f6c\u6362\u4e3a\u4e00\u4e2aDocument\u5bf9\u8c61\uff1b \u8fd9\u4e2aDocument\u5bf9\u8c61\u548cHTML\u4e2d\u7684Document\u5bf9\u8c61\u6781\u4e3a\u7c7b\u578b\uff0c\u751a\u81f3\u5f88\u591a\u65b9\u6cd5\u90fd\u5b8c\u5168\u76f8\u540c\uff0c\u53ef\u4ee5\u7b80\u5355\u5730\u7406\u89e3\u4e3aDOM\u6811\u7684Java\u5b9e\u73b0\uff1b<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[35],"tags":[338,339,301],"class_list":["post-1921","post","type-post","status-publish","format-standard","hentry","category-java","tag-jsoup","tag-jsoupxpath","tag-xml"],"_links":{"self":[{"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/posts\/1921","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/comments?post=1921"}],"version-history":[{"count":0,"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/posts\/1921\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/media?parent=1921"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/categories?post=1921"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.yusian.com\/blog\/wp-json\/wp\/v2\/tags?post=1921"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}