轻松学会：不用插件，纯Java读取.docx文件的5种方法

在Java中，读取.docx文件通常需要使用一些第三方库，如Apache POI或Microsoft Office的COM接口。然而，如果你不想安装任何插件，以下是一些纯Java读取.docx文件的方法：

方法一：使用Apache POI的SXSSF实现

Apache POI是一个开源的Java库，用于处理Microsoft Office格式文件。SXSSF是Apache POI的一个组件，专门用于处理.xlsx文件，但也可以用来读取.docx文件。

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import java.io.FileInputStream;
import java.io.FileNotFoundException;

public class ReadDocx {
    public static void main(String[] args) {
        try (FileInputStream fis = new FileInputStream("example.docx")) {
            XWPFDocument document = new XWPFDocument(fis);
            for (XWPFParagraph paragraph : document.getParagraphs()) {
                for (XWPFRun run : paragraph.getRuns()) {
                    System.out.print(run.getText(0));
                }
                System.out.println();
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

方法二：使用Apache POI的HSSF实现

虽然HSSF主要用于处理.xls文件，但也可以用来读取.docx文件。这种方法需要将.docx文件转换为.xls格式，然后再使用HSSF读取。

import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;

import java.io.FileInputStream;
import java.io.FileNotFoundException;

public class ReadDocxWithHSSF {
    public static void main(String[] args) {
        try (FileInputStream fis = new FileInputStream("example.docx")) {
            HSSFWorkbook workbook = new HSSFWorkbook(fis);
            Sheet sheet = workbook.getSheetAt(0);
            for (Row row : sheet) {
                for (Cell cell : row) {
                    System.out.print(cell.getStringCellValue() + "\t");
                }
                System.out.println();
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

方法三：使用Java的ZipFile类

.docx文件实际上是一个压缩包，包含多个XML文件。你可以使用Java的ZipFile类来读取这些XML文件。

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;

public class ReadDocxWithZipFile {
    public static void main(String[] args) {
        try (ZipFile zipFile = new ZipFile("example.docx")) {
            ZipEntry entry = zipFile.getEntry("word/document.xml");
            try (FileInputStream fis = zipFile.getInputStream(entry);
                 FileOutputStream fos = new FileOutputStream("document.xml")) {
                byte[] bytes = new byte[1024];
                int length;
                while ((length = fis.read(bytes)) >= 0) {
                    fos.write(bytes, 0, length);
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

方法四：使用Java的DOM解析器

你可以使用Java的DOM解析器来读取XML文件，这是另一种读取.docx文件的方法。

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;

public class ReadDocxWithDOM {
    public static void main(String[] args) {
        try {
            File inputFile = new File("document.xml");
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(inputFile);
            doc.getDocumentElement().normalize();

            NodeList nList = doc.getElementsByTagName("w:p");
            for (int temp = 0; temp < nList.getLength(); temp++) {
                Element eElement = (Element) nList.item(temp);
                System.out.println(eElement.getTextContent());
            }
        } catch (ParserConfigurationException | SAXException | IOException e) {
            e.printStackTrace();
        }
    }
}

方法五：使用Java的SAX解析器

SAX解析器是另一种读取XML文件的方法，它允许你逐个处理元素，而不是一次性加载整个文档。

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.File;
import java.io.IOException;

public class ReadDocxWithSAX {
    public static void main(String[] args) {
        try {
            File inputFile = new File("document.xml");
            SAXParserFactory factory = SAXParserFactory.newInstance();
            SAXParser saxParser = factory.newSAXParser();
            DefaultHandler handler = new DefaultHandler() {
                public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
                    if (qName.equals("w:p")) {
                        System.out.println("Start Element : " + qName);
                    }
                }

                public void endElement(String uri, String localName, String qName) throws SAXException {
                    if (qName.equals("w:p")) {
                        System.out.println("End Element : " + qName);
                    }
                }

                public void characters(char ch[], int start, int length) throws SAXException {
                    System.out.print(new String(ch, start, length));
                }
            };
            saxParser.parse(inputFile, handler);
        } catch (ParserConfigurationException | SAXException | IOException e) {
            e.printStackTrace();
        }
    }
}

以上五种方法都可以用来读取.docx文件，但每种方法都有其优缺点。选择哪种方法取决于你的具体需求和偏好。希望这些方法能帮助你轻松学会纯Java读取.docx文件！

正文

轻松学会：不用插件，纯Java读取.docx文件的5种方法

方法一：使用Apache POI的SXSSF实现

方法二：使用Apache POI的HSSF实现

方法三：使用Java的ZipFile类

方法四：使用Java的DOM解析器

方法五：使用Java的SAX解析器

相关阅读

轻松学会：用Java读取CSV文件的实用代码示例详解

Java轻松读取CLOB值：步骤详解，避免数据丢失！

掌握Java读取Bean文件，轻松实现数据导入技巧分享

如何轻松用Java读取并展示.ttf字体文件内容

学会Java读取文件中的汉字：简单步骤解析与实操案例

Java高效读取DTO实例教程：轻松掌握数据传输对象操作技巧

掌握Java读取图片文件的小技巧：轻松上手图片读取与处理全攻略

Java快速上手：轻松掌握读取List集合内容的方法与技巧

轻松掌握Java读取log文件全攻略，从入门到精通，告别文件处理难题！

新手必看：从零开始，轻松掌握Java开发框架Spring核心技巧与实战案例