Diff word documents with pandoc

Posted on September 27, 2014

Thanks to the work of Jesse Rosenthal, pandoc has recently become capable of reading docx documents. This additional functional now makes it extremely easy to diff two docx files without any additional tools.

Given two documents old.docx and new.docx the following command can calculate the diff. (The choice to convert to markdown is arbritary but works best in my experience)

diff <(pandoc -t markdown old.docx) <(pandoc -t markdown new.docx)

The output from two example documents, the second identical to the first with an additional line:

> diff <(pandoc -t markdown old.docx) <(pandoc -t markdown new.docx )
> This is the new line in the new file

Easy as that!