Where can I get Adobe-Identity-UCS cmap file？

Question

I have a pdf file which can not be extracted text by pdfbox or itext7. The font is encoded by Identity-H with Adobe-Identity-UCS. The details of ToUnicode are given below.


    /CIDInit /ProcSet findresource begin

    12 dict begin

    begincmap

    /CIDSystemInfo > def
    /CMapName /Adobe-Identity-UCS def
    /CMapType 2 def

    1 begincodespacerange
    <0000><FFFF>
    endcodespacerange

    endcmap
    CMapName currentdict /CMap defineresource pop
    end
    end

The ToUnicode is invalid. Is there any way to fixed it?

I tried to download an intact Adobe-Identity-UCS cmap file and to replace it. But after a lot of google searching, I can't find the Adobe-Identity-UCS cmap file.

Any help? Thanks.

Edit:

Chinese-cidmap-broken.pdf

https://stackoverflow.com/questions/39485920/how-to-add-unicode-in-truetype0font-on-pdfbox-2-0-0 — Tilman Hausherr, Jul 31 '18 at 12:09
@TilmanHausherr. Thanks. I know the way to rewrite ToUnicode, but can not find Adobe-Identity-UCS cmap. — KlSoft, Aug 01 '18 at 03:15
@KlSoft I thought that maybe the file used the character Unicode codes as glyph IDs and then used this generic cmap name without filling the actual cmap, but that is not the case here. — Mihai Iancu, Aug 02 '18 at 10:44
@KlSoft Your link returns a `{"error_code":31360,"error_msg":"expire time out error","error_info":"","request_id":338316530258407808}` — mkl, Aug 02 '18 at 11:45

score 4 · Answer 1 · answered Jul 31 '18 at 12:49

The ToUnicode CMap you show corresponds to the example ToUnicode CMap in the PDF specification ISO 32000 (either part), merely without any bfrange or bfchar section.

Thus, what you have essentially is a template into which one can put arbitrary mappings.

Concerning your question, therefore:

Is there any way to fixed it?

Yes and no.

Yes, you can fix it by adding the appropriate bfrange or bfchar sections with the correct mappings.

BUT... to do so you need to know which codes map to which Unicode strings respectively for the font at hand, the name Adobe-Identity-UCS by itself usually does not imply the mapping. So also:

No, not without additional information.

@Tilman in his comment to your question referenced one of his answers in which he showed how to add a missing ToUnicode map using information on the actual mappings gathered from different sources.

Where can I get Adobe-Identity-UCS cmap file？

1 Answers1