Python subprocess call to xpdf's pdftotext not working with encoding -
i trying run pdftotext
using python subprocess
module.
import subprocess pdf = r"path\to\file.pdf" txt = r"path\to\out.txt" pdftotext = r"path\to\pdftotext.exe" cmd = [pdftotext, pdf, txt, '-enc utf-8'] response = subprocess.check_output(cmd, shell=true, stderr=subprocess.stdout)
tb
calledprocesserror: command '['path\\to\\pdftotext.exe', 'path\\to\\file.pdf', 'path\\to\\out.txt', '-enc utf-8']' returned non-zero exit status 99
when remove last argument '-enc utf-8' cmd, works ok in python.
when run pdftotext pdf txt -enc utf-8
in cmd
, works ok.
what missing?
thanks.
subprocess
has complicated rules handling commands. docs:
the shell argument (which defaults false) specifies whether use shell program execute. if shell true, recommended pass args string rather sequence.
more details explained in answer here.
so, docs explain, should convert command string:
cmd = r"""{} "{}" "{}" -enc utf-8""".format('pdftotext', pdf, txt)
now, call subprocess
as:
subprocess.call(cmd, shell=true, stderr=subprocess.stdout)
Comments
Post a Comment