Python subprocess call to xpdf's pdftotext not working with encoding -
i trying run pdftotext using python subprocess module.
import subprocess pdf = r"path\to\file.pdf" txt = r"path\to\out.txt" pdftotext = r"path\to\pdftotext.exe" cmd = [pdftotext, pdf, txt, '-enc utf-8'] response = subprocess.check_output(cmd, shell=true, stderr=subprocess.stdout) tb
calledprocesserror: command '['path\\to\\pdftotext.exe', 'path\\to\\file.pdf', 'path\\to\\out.txt', '-enc utf-8']' returned non-zero exit status 99 when remove last argument '-enc utf-8' cmd, works ok in python.
when run pdftotext pdf txt -enc utf-8 in cmd, works ok.
what missing?
thanks.
subprocess has complicated rules handling commands. docs:
the shell argument (which defaults false) specifies whether use shell program execute. if shell true, recommended pass args string rather sequence.
more details explained in answer here.
so, docs explain, should convert command string:
cmd = r"""{} "{}" "{}" -enc utf-8""".format('pdftotext', pdf, txt) now, call subprocess as:
subprocess.call(cmd, shell=true, stderr=subprocess.stdout)
Comments
Post a Comment