Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan pages in wrong order #437

Open
LauraErhard opened this issue Dec 6, 2018 · 21 comments
Open

Scan pages in wrong order #437

LauraErhard opened this issue Dec 6, 2018 · 21 comments

Comments

@LauraErhard
Copy link

We uploaded two pdfs with 18 and 48 pages and we now noticed, that in the scan view the order of the pages is not the same as in the pdfs we uploaded. E.g. on page 3 we start with letter M even if page 2 was only B and the whole section between B and M is on later pages. Are the scans not sorted or identified in the right order? This is problematic because we sometimes have references which start on one page and end on the next page and if the order is not kept we have to search for the second part which is really time consuming.

@LauraErhard
Copy link
Author

I never noticed this before today, but now that I did, I checked back on the different uploads in the dev system and I noticed it there too. For example:
White, Gregory Whayne: Climate change and migration : security and borders in a warming world = ID: 5bc74d8f4fb7d00dfeff7fa2
Kidd, Dustin: Social media freaks : digital identity in the network society = ID: 5bf429fc4b39d61a6e964ea7
I can't remember when we uploaded these, maybe you can see this?! Then maybe you can track down at what point this issue arose. There never seems to have been any problem with smaller scans (1-5 pages). But I didn't check every scan yet.

@abdelqader-mohammad
Copy link

I tested the system by uploading a 60 pages pdf file. The pages are numbered from 1 to 60
They reordered after the uploading as follow:
1, 2, 11, 12, 13, ..., 20, 3, 21, 22, 23, ...., 30, 4, 31, 32,... and so on.
@chah3d is the numbering system made this? for example, if the page numbers have the same number of digits (01 rather than 1), can this solve the problem?

@lgalke
Copy link
Member

lgalke commented Feb 14, 2019

The splitting of pages happens on the DFKI side. We do not re-order them in any way and I assume @anlausch also just passes them through aswell. @rtahseen can you tell whether in-order processing is guaranteed for splitted multi-page scans? Unfortunately, there would not be a way to do so, since we only have the first_page and last_page of the whole embodiment. And one embodiment can span over multiple pages. So at the moment we retain the order as we receive the scans and just enumerate them.

@rtahseen
Copy link
Member

I have just added the code to explicitly sort the results. @LauraErhard Can you please try again now?

@LauraErhard
Copy link
Author

I uploaded a new pdf with 40 pages and the pages 13 (should be page 3), 24 (should be page 4), 35, 36, 37, 38, 39, 40 (should be page 5-11) were wrong.

@rtahseen
Copy link
Member

rtahseen commented Feb 19, 2019

Now I am making sure that the page results are sorted in ascending order.
@LauraErhard Please try again now

@LauraErhard
Copy link
Author

I can't open the dev system right now:
fehler-dev

@abdelqader-mohammad
Copy link

@LauraErhard We are working on that right now. We will inform you when it is ready

@abdelqader-mohammad
Copy link

@LauraErhard can you check now?

@LauraErhard
Copy link
Author

The scan I uploaded is now finished processing. Sadly I have to tell you that the order is still wrong.
I don't know how you count in the backend so here the two possibilities:
Page order: 231, 232, 241 - 250, 233, 251 - 254, 234 - 240
Number order: 1, 2, 11-20, 3, 21-24, 4-10

A few days ago @abdelqader-mohammad mentioned the same pattern:

I tested the system by uploading a 60 pages pdf file. The pages are numbered from 1 to 60
They reordered after the uploading as follow:
1, 2, 11, 12, 13, ..., 20, 3, 21, 22, 23, ...., 30, 4, 31, 32,... and so on.

So there seem to be no changes?!

@rtahseen
Copy link
Member

@LauraErhard when did you uploaded this file for processing?
In the logs I can see only 2 files processed yesterday. One of them with 34 pages I tested yesterday and the other one has only 4 pages.

@LauraErhard
Copy link
Author

I uploaded "Haunberger, Sigrid: Teilnahmeverweigerung in Panelstudien /, VS-Verl.," (5c6c26a8d0704e026a6c1690) with 24 pages 2 days ago, but I only saw the processed scans yesterday. I will upload another pdf right now.

@LauraErhard
Copy link
Author

I just uploaded: Hardering, Friedericke: Unsicherheiten in Arbeit und Biographie : zur Ökonomisierung der Lebensführung /, VS Verlag für Sozialwissenschaften (ID: 5c6eab3dee21453a61bd6140)
And the pages are still wrong. I assume it is the same pattern as before.

@LauraErhard
Copy link
Author

While uploading a different pdf I got an error:
201-02-21
Any idea, why the pdf above was no problem, but this one is?

@abdelqader-mohammad
Copy link

@LauraErhard I am not sure about this error. It says "error 502". Close the webpage and try again to upload the file, maybe this will fix the problem. I am not sure

@rtahseen
Copy link
Member

@LauraErhard This is very strange, it looks like that changes I made in the code are not propagated in the service. I have restarted the service. Please try again.

@LauraErhard
Copy link
Author

@abdelqader-mohammad yesterday I tried firefox and chrome and now I am sitting on a different pc and it still doesn't work. I just tried to upload it to the demo system but there I get the same error. But just this one scan. I uploaded another pdf and this worked fine.

@LauraErhard
Copy link
Author

@rtahseen I just uploaded a new scan and it's still the wrong order.

@rtahseen
Copy link
Member

After careful and detailed debugging, sorting check has been added to every possible use case. @LauraErhard Please try one last time. If it still does not work then I assure you that the problem is somewhere else :)

@LauraErhard
Copy link
Author

I have bad news, the scans are still in the wrong order.
If the problem is not with you, can you take a guess where the problem could be? Backend? Frontend? Any ideas?!

@rtahseen
Copy link
Member

@LauraErhard I have tested all interfaces of my web service and verified that results are sorted correctly in every case. I am not sure where exactly is cause of problem. From @lgalke comment above, I can only guess that we there could be something happening in the backend. It will be great if someone from backend can let me know the exact order in which they are calling different functions of Automatic Reference Extraction Service. So that I can re-verify the output of my web service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants