Page 1 of 1

Y-coordinates in PDFs

Posted: Sat Jul 13, 2019 12:20 pm
by jPV
Is there a reason why you get coordinates from pdf.GetRects and pdf.GetPageLinks (and probably elsewhere too) so that origin is at bottom?

For example with this PDF: sample-link_1.pdf

And this code:

Code: Select all

@REQUIRE "polybios", {Version=1, Revision=1}
pdf.OpenDocument(1, "sample-link_1.pdf")
pdf.LoadPage(1, 1, True)
t = pdf.GetRects(1, 1, 0, 3)
DebugPrint("Top:", Int(t[0].top), "Bottom:", Int(t[0].bottom))
t = pdf.GetPageLinks(1, 1)
DebugPrint("Top:", Int(t[0].top), "Bottom:", Int(t[0].bottom))
You get this output:
Top: 736 Bottom: 727
Top: 479 Bottom: 464

So, the top values are bigger than bottom values.

But one funny detail... I've found one PDF that does give the link (but not rects) values in different (more logical) order:
Top: 796 Bottom: 782
Top: 723 Bottom: 737

...but all the others so far have shown it like mentioned. Soo, I have to check in code which, top or bottom, is bigger... but any idea why it's like this?

Re: Y-coordinates in PDFs

Posted: Sun Jul 14, 2019 4:58 pm
by airsoftsoftwair
Yes, that's normal. PDF uses an origin in the bottom-left corner instead of the top-left corner (as described here). I don't know, though, what's the deal with that PDF you mentioned which seems to have the coordinates in canonical order?! Can you provide that one so I can take a look?

Re: Y-coordinates in PDFs

Posted: Mon Jul 15, 2019 2:54 am
by PEB
My guess is that the PDF with strange coordinates is a non-standard size. If I create a PDF with the standard 8.5 x 11 dimensions, the coordinates work as expected (with 0, 0 in the lower left-hand corner). But if I create a PDF in a non-standard size, then the coordinate system is way off (as if the document were rotated 90 degrees to the left).

Re: Y-coordinates in PDFs

Posted: Thu Jul 18, 2019 9:52 pm
by airsoftsoftwair
After examining the PDF I can confirm that it really has the coordinates in the unusual order. But according to the Adobe PDF specification, this is completely legit:
Rectangles are used to describe locations on a page and bounding boxes for a variety of objects. A rectangle shall be written as an array of four numbers giving the coordinates of a pair of diagonally opposite corners.

Although rectangles are conventionally specified by their lower-left and upper-right corners, it is acceptable to specify any two diagonally opposite corners. Applications that process PDF should be prepared to normalize such rectangles in situations where specific corners are required. Typically, the array takes the form [llx lly urx ury] specifying the lower-left x, lower-left y, upper-right x, and upper-right y coordinates of the rectangle, in that order. The other two corners of the rectangle are then assumed to have coordinates (llx, ury) and (urx, lly).
So this means your code really has to be able to deal with this.