I have a pdf file which can not be extracted text by pdfbox or itext7. The font is encoded by Identity-H with Adobe-Identity-UCS. The details of ToUnicode are given below.
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo > def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<0000><FFFF>
endcodespacerange
endcmap
CMapName currentdict /CMap defineresource pop
end
end
The ToUnicode is invalid. Is there any way to fixed it?
I tried to download an intact Adobe-Identity-UCS cmap file and to replace it. But after a lot of google searching, I can't find the Adobe-Identity-UCS cmap file.
Any help? Thanks.
Edit: