The pdf format is basically a vector format that can also include bitmapped ("raster") images.
If the original pdf contains a scanned document, it will usually only contain a bitmapped image (often in tiff or jpeg format) and then converting it to png is fine (if you stick to the original resolution of the image).
But if the original contains vector graphics (including text strings), converting those to a bitmap will generally introduce sampling errors. To avoid those, you canuse 1-bit color depth ("black-and-white" format) and a resolution that at least matches the printer. This will produce quite a large file png file, though. Using the tiff format might yield a smaller file. The "tiff-inside-pdf" format is something you see often when large drawings are scanned. According to ImageMagick's identify program, such a tiff file looks something like this:
Format: TIFF (Tagged Image File Format)
Class: DirectClass
Geometry: 13231x9355+0+0
Resolution: 400x400
Print size: 33.0775x23.3875
Units: PixelsPerInch
Type: Bilevel
Base type: Bilevel
Endianess: MSB
Colorspace: Gray
Depth: 1-bit
Channel depth:
gray: 1-bit
Dispite the huge size, the tiff file is only 144 kb. The tiff2pdf program (part of the tiff package) can convert these to nice and small pdf files.
But the best way to preserve the document's format is to edit the pdf file itself, instead of converting it to another format.
There is a Python module for manipulating pdf documents; PyPDF2. But since you don't specifiy what you want to do with the document, it is impossible to say if this can do what you want. There is also ReportLab, but that's more for generating pdf files. If you have the cairo library installed on your system, pycairo is a less heavyweight option to generate pdf documents.
An excellent utility in general for manipulating pdf files is pdftk (written in java).
Edit: Sampling in grayscale will always introduce sampling artefacts. These are not errors in themselves, just a consequence of the sampling process.
Decompiling the pdf file into PostScript as Ben Jackson mentions can be done. There are a couple of utilities that can help you with that; pdftops from the poppler-utils package, and pdf2ps that comes with ghostscript. In my experience, pdftops tends to produce better usable output.
But I haven't found a good way to automate this process. Below is a fragment from the Numpy User Guide decompiled with pdftops:
(At)
[7.192997
0
2.769603
0] Tj
-314 TJm
(the)
[2.769603
0
4.9813
0
4.423394
0] Tj
-313 TJm
(core)
[4.423394
0
4.9813
0
3.317546
0
4.423394
0] Tj
-314 TJm
(of)
[4.9813
0
3.317546
0] Tj
-313 TJm
(the)
[2.769603
0
4.9813
0
4.423394
0] Tj
-314 TJm
(NumPy)
[7.192997
0
4.9813
0
7.750903
0
5.539206
0
4.9813
0] Tj
-314 TJm
(package,)
[4.9813
0
4.423394
0
4.423394
0
4.9813
0
4.423394
0
4.9813
0
4.423394
0
2.49065
0] Tj
-329 TJm
This produces the sentence "At the core of the Numpy package," So if you look into the PostScript file for anything between (), you'll get the strings.
So changing individual words or removing short pieces is not that hard;
- Find the correct word(s) in the decompiled PostScript.
- Edit them (and the surrounding parameters!)
- Re-compile to pdf (with ghostscript).
But you would have to look into the beginning of the document and see what the functions Tj and TJm do. If you want to replace text, you'll have to remove them and put in new text and code with the correct parameters for Tj and TJm. This requires an understanding of PostScript. And if you are replacing a sentence, you usually cannot replace it with a longer sentence; there will not be enough space...
Therefore it is generally advisable to try and get the original application to change the output.