-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scan pages in wrong order #437
Comments
I never noticed this before today, but now that I did, I checked back on the different uploads in the dev system and I noticed it there too. For example: |
I tested the system by uploading a 60 pages pdf file. The pages are numbered from 1 to 60 |
The splitting of pages happens on the DFKI side. We do not re-order them in any way and I assume @anlausch also just passes them through aswell. @rtahseen can you tell whether in-order processing is guaranteed for splitted multi-page scans? Unfortunately, there would not be a way to do so, since we only have the first_page and last_page of the whole embodiment. And one embodiment can span over multiple pages. So at the moment we retain the order as we receive the scans and just enumerate them. |
I have just added the code to explicitly sort the results. @LauraErhard Can you please try again now? |
I uploaded a new pdf with 40 pages and the pages 13 (should be page 3), 24 (should be page 4), 35, 36, 37, 38, 39, 40 (should be page 5-11) were wrong. |
Now I am making sure that the page results are sorted in ascending order. |
@LauraErhard We are working on that right now. We will inform you when it is ready |
@LauraErhard can you check now? |
The scan I uploaded is now finished processing. Sadly I have to tell you that the order is still wrong. A few days ago @abdelqader-mohammad mentioned the same pattern:
So there seem to be no changes?! |
@LauraErhard when did you uploaded this file for processing? |
I uploaded "Haunberger, Sigrid: Teilnahmeverweigerung in Panelstudien /, VS-Verl.," (5c6c26a8d0704e026a6c1690) with 24 pages 2 days ago, but I only saw the processed scans yesterday. I will upload another pdf right now. |
I just uploaded: Hardering, Friedericke: Unsicherheiten in Arbeit und Biographie : zur Ökonomisierung der Lebensführung /, VS Verlag für Sozialwissenschaften (ID: 5c6eab3dee21453a61bd6140) |
@LauraErhard I am not sure about this error. It says "error 502". Close the webpage and try again to upload the file, maybe this will fix the problem. I am not sure |
@LauraErhard This is very strange, it looks like that changes I made in the code are not propagated in the service. I have restarted the service. Please try again. |
@abdelqader-mohammad yesterday I tried firefox and chrome and now I am sitting on a different pc and it still doesn't work. I just tried to upload it to the demo system but there I get the same error. But just this one scan. I uploaded another pdf and this worked fine. |
@rtahseen I just uploaded a new scan and it's still the wrong order. |
After careful and detailed debugging, sorting check has been added to every possible use case. @LauraErhard Please try one last time. If it still does not work then I assure you that the problem is somewhere else :) |
I have bad news, the scans are still in the wrong order. |
@LauraErhard I have tested all interfaces of my web service and verified that results are sorted correctly in every case. I am not sure where exactly is cause of problem. From @lgalke comment above, I can only guess that we there could be something happening in the backend. It will be great if someone from backend can let me know the exact order in which they are calling different functions of Automatic Reference Extraction Service. So that I can re-verify the output of my web service. |
We uploaded two pdfs with 18 and 48 pages and we now noticed, that in the scan view the order of the pages is not the same as in the pdfs we uploaded. E.g. on page 3 we start with letter M even if page 2 was only B and the whole section between B and M is on later pages. Are the scans not sorted or identified in the right order? This is problematic because we sometimes have references which start on one page and end on the next page and if the order is not kept we have to search for the second part which is really time consuming.
The text was updated successfully, but these errors were encountered: