 |
 |
 |
New Search Engine, No Read, Only Post |
|
Nov 8 2022, 08:53
|
Tenboro

|
I've been thinking a lot about pagination for the last uh, two years or so. And my general feelings on "page-based pagination" for all types of sites that continuously add new content to the "front" is: it's kinda shit. It's what people are used to, but semantically it has no relevance for the origin of the semantic concept, which is the (dead tree format) book. Because unlike a DTF-book, where Jason always kills Alice on page 358, the actual page number has absolutely no association with what content you find on the page except that "page 358 has the content that was posted immediately prior to the content on page 359". And if you check back tomorrow, the actual content on both page 358 and 359 has been replaced with something completely different. Using markers based on "content posted before/after $work" as well as "content posted before/after $date" just makes so much more sense for this type of content, because it'll generally be exact same "page" when you check it again tomorrow. While people tend to deeply dislike change, and while the fundamental reason for the change is primarily performance/scaling, just because no one (?) has done it before does not (necessarily) make it a bad idea. Something about Henry Ford and faster horses. What people currently seem to find most lacking in the new search engine compared to what they had with "page-based pagination" is a way to see how deep they are into a result, as well as a more visual way to jump deep into a result. So if we can do this without pages (and we can, it'll just take some work), then there shouldn't be a single advantage left for "page-based pagination" outside of "we always used page-based navigation and that's the way we likes it". QUOTE(FabulousCupcake @ Nov 8 2022, 00:10)  -- This also kinda makes me question my own assumption(s) on how the current search engine works: how expensive is it to search for a tag that spans over e.g. 2M gid to show 50 results in a page? The search engine dynamically picks different strategies depending on how sparse and numerous the tag is. For a tag with 300 hits, it just fetches all of them. QUOTE(FabulousCupcake @ Nov 8 2022, 00:10)  --- If it's cheap, could pagination perhaps be shown conditionally when the result count is under say, 10 pages? The main problem with having pagination only for small results is that while it already fetches all of the inclusion results for tags with few hits, it does not process for filtering or exclusion until a later stage. This was the main reason for the duplicate results when going backwards with the old pagination, as well as the reason for why you had that awkward "you're on page 4-8 of 12" thing if a lot of results were filtered. QUOTE(FabulousCupcake @ Nov 8 2022, 00:10)  -- Spanning over ~2.4M gid on 800 pixels, each pixel represents about 3000 GID. When each page can only display max 100 results, it becomes almost meaningless (and this will only get worse as more galleries are uploaded).
-- Assuming that isn't a problem (e.g. because the set filter results in a smaller total), the information shown is also not particularly accurate (and thus impacts the usefulness) since the "density" over GID is dynamic; there could be a lot of results in a given gid range and then very little or nothing in another: how to tell if there are still a lot/a few results left? I'm hopeful to hear that there's a better way to do this on the server side. You have the same problem with pages though - while there are smaller fixed segments, the page number itself tells you nothing about what dates or GIDs are on that page, so it would still be a rough guess as to exactly where you land. But yes, the server-side should be able to make a bar that is reasonably scaled to the actual density of most results even when the density varies over time, but I do need to build histograms for the various tags and such. (It already builds a form of bloom filter for most things, so it just requires a bit more storage space.)
|
|
|
|
 |
|
Nov 8 2022, 09:36
|
冷 毁
Lurker
Group: Recruits
Posts: 9
Joined: 7-September 15

|
Is it really impossible to search for both undeleted and deleted galleries, theoretically it is possible to do two searches and then sort them by time and select only the top 25 galleries, is this really a feature that cannot be done on the back end? Can someone answer my question?
This post has been edited by 冷 毁: Nov 8 2022, 09:37
|
|
|
Nov 8 2022, 10:19
|
chibi_author
Lurker
Group: Lurkers
Posts: 1
Joined: 9-December 09

|
Hi, I've been thinking about and i havent read all the thread, but assuming we agree that pagination isnt necessary for the whole site, could it be possible to keep it for favorites? it's not like they are always in motion like the main site.
|
|
|
|
 |
|
Nov 8 2022, 10:44
|
Tenboro

|
QUOTE(冷 毁 @ Nov 8 2022, 08:36)  Is it really impossible to search for both undeleted and deleted galleries, theoretically it is possible to do two searches and then sort them by time and select only the top 25 galleries, is this really a feature that cannot be done on the back end? Can someone answer my question? In general, the indexes are now split between visible/expunged and weak tags/strong tags for performance reasons, so combining visible+expunged or weak+strong is unlikely. Though, while not really relevant for your question, I'm considering a different approach for dealing with weak tags, where it would use a separate qualifier instead of the checkbox. QUOTE(chibi_author @ Nov 8 2022, 09:19)  Hi, I've been thinking about and i havent read all the thread, but assuming we agree that pagination isnt necessary for the whole site, could it be possible to keep it for favorites? it's not like they are always in motion like the main site. I suppose it isn't technically infeasible for favorites, at least when exclusions aren't used, but you're back to the problem where the result count might need to be limited since you can't really do "proper" pagination without having a sorted result. Favorites probably wouldn't qualify for the "bar approach" but I have to mull over that one for a bit.
|
|
|
|
 |
|
Nov 8 2022, 10:46
|
EasyDeath
Newcomer
 Group: Recruits
Posts: 10
Joined: 25-December 10

|
QUOTE(Tenboro @ Nov 8 2022, 09:53)  What people currently seem to find most lacking in the new search engine compared to what they had with "page-based pagination" is a way to see how deep they are into a result, as well as a more visual way to jump deep into a result. So if we can do this without pages (and we can, it'll just take some work), then there shouldn't be a single advantage left for "page-based pagination" outside of "we always used page-based navigation and that's the way we likes it".
Yes, this is exactly what most people, and myself need. As for me, instead of css progress bar, i find better a percentage number, since you have 100 numbers to represent the position. Also from backend you can first deploy an rough average count estimate, and then fine tune as you like in future. Css bar instead would be far more limiting. Also: [ gist.githubusercontent.com] https://gist.githubusercontent.com/Meldiron..._and_cursor.csvThis is the speed compared for 1M documents, classic pagination vs cursor pagination Tenboro choice to use cursor pagination is right as per technical reasons, as for user UX we need to improve a little bit by adding the information to tell the users how deep they are into results. As for results count, this also need to be fine tuned For example: female:milf language:english female:"double penetration" female:"prostitution" the results are 5 pages, so about 125 results, instead i see "Found many results." that isn't very helpful. Can this be improved since we don't have so "many" results? Since you already have the result count for a single tag female:milf 72,850 results language:english 179,895 results. female:"double penetration" 60,413 results. female:"prostitution" 14,381 results. at which point the speed is irrelevant to do a normal query with the exact count vs the new search query? For said search theoretically you can get maximum 14.381 results but we all know that this is basically impossible since we have 4 filter conditions If you can share with me or everyone more details how the current search engine works, maybe i/other users can help you by giving you some ideas to work with
|
|
|
|
 |
|
Nov 8 2022, 11:35
|
Tenboro

|
QUOTE(EasyDeath @ Nov 8 2022, 09:46)  As for me, instead of css progress bar, i find better a percentage number, since you have 100 numbers to represent the position. Also from backend you can first deploy an rough average count estimate, and then fine tune as you like in future. Most likely it would just latch onto the seek/jump mechanism, so that in addition to going by relative and absolute dates you can also seek to 7% or whatever. But it would necessarily have to include some sort of clickable visual indicator as well. QUOTE(EasyDeath @ Nov 8 2022, 09:46)  As for results count, this also need to be fine tuned
female:milf language:english female:"double penetration" female:"prostitution"
the results are 5 pages, so about 125 results, instead i see "Found many results." that isn't very helpful.
Can this be improved since we don't have so "many" results? Since you already have the result count for a single tag
female:milf 72,850 results language:english 179,895 results. female:"double penetration" 60,413 results. female:"prostitution" 14,381 results.
at which point the speed is irrelevant to do a normal query with the exact count vs the new search query?
For said search theoretically you can get maximum 14.381 results but we all know that this is basically impossible since we have 4 filter conditions
If you can share with me or everyone more details how the current search engine works, maybe i/other users can help you by giving you some ideas to work with When you are searching for multiple tags/titles and they all have a lot of results (10K+), and you also are not using user searches, and you also are not using a comment search with reasonably few result, it uses a segmenting strategy where it loads in consecutive chucks of tags segmented by GID ranges until it has enough results. So to make it estimate result counts for those types of searches, it would currently have to guesstimate the result count from the overlap in the current result, which could make it fluctuate (a lot) between pages. So to make it provide a reasonable and stable result estimate, I would probably have to use some form of self-learning to improve the accuracy over time. (The search engine already has some self-learning stuff, so it's not entirely out of the question.)
|
|
|
|
 |
|
Nov 8 2022, 12:32
|
FabulousCupcake
Group: Gold Star Club
Posts: 495
Joined: 15-April 14

|
QUOTE(Tenboro @ Nov 8 2022, 07:53)  The search engine dynamically picks different strategies depending on how sparse and numerous the tag is. For a tag with 300 hits, it just fetches all of them.
...
The main problem with having pagination only for small results is that while it already fetches all of the inclusion results for tags with few hits, it does not process for filtering or exclusion until a later stage. This was the main reason for the duplicate results when going backwards with the old pagination, as well as the reason for why you had that awkward "you're on page 4-8 of 12" thing if a lot of results were filtered.
It's kinda minor, but I'd like to clarify: when I meant pagination, I didn't mean true pagination down to the backend, but just some semblance of pagination elements; the ui/visuals. In my mind this could be done by mapping the page numbers to GID (e.g. 2: ?next=2123123, 3: ?next=1712312, …). Or perhaps just send over all the results when the result count is low enough (i.e. the cost for the server is low enough to justify) and have it be navigable/paginated client-side. The leading question (of whether it's cheap to load a few results over large gid range / me guessing how the search engine works internally) was mostly related to this. The thought I had was: if a lot of the legwork is already done when loading the first 50/100 results and/or the result number is low (and as you mentioned (if I understand correctly), all results are fetched when hit count is low), perhaps it may be worth it to bite the bullet and pay the (slightly more) extra compute cost to load all the results and show pagination when the result count is low? To clarify, by pagination here I don't mean ?page=N search param, but just continue the search and tell me the 50th, 100th, 150th, … GID (assuming 50 results per page) after the current results that I can click with Next/Prev. Or if that doesn't make sense, just return the whole ~300 results and have it be paginated client side …Though this also raises the question if showing different navigation methods based on obscure conditions is a good idea 😅
|
|
|
|
 |
|
|
 |
|
Nov 8 2022, 14:05
|
Tenboro

|
QUOTE(FabulousCupcake @ Nov 8 2022, 11:32)  The leading question (of whether it's cheap to load a few results over large gid range / me guessing how the search engine works internally) was mostly related to this. The thought I had was: if a lot of the legwork is already done when loading the first 50/100 results and/or the result number is low (and as you mentioned (if I understand correctly), all results are fetched when hit count is low), perhaps it may be worth it to bite the bullet and pay the (slightly more) extra compute cost to load all the results and show pagination when the result count is low? To clarify, by pagination here I don't mean ?page=N search param, but just continue the search and tell me the 50th, 100th, 150th, … GID (assuming 50 results per page) after the current results that I can click with Next/Prev. Or if that doesn't make sense, just return the whole ~300 results and have it be paginated client side …Though this also raises the question if showing different navigation methods based on obscure conditions is a good idea It would basically be the same cost as providing all 300 results on one page except saving the few bytes by not actually listing them, so it's too expensive to consider doing. And as you say, having a pagination element that only pops up if certain conditions are met would feel really inconsistent. I wouldn't add one unless it could at least provide a ballpark figure for all normal searches - that is, everything except for popular, file, gid and favorite searches. (The three first ones would not generally need one, and the last one being a completely different beast than everything else.)
|
|
|
|
 |
|
Nov 8 2022, 15:23
|
christantoan
Newcomer
 Group: Gold Star Club
Posts: 18
Joined: 1-April 13

|
Sorry if this has been asked before, I've tried to search but didn't find anything. Will this change reduce the maximum number of search result count? In the old version I have mine set to 200 while in the new version it's now set to 100 maximum. Thank you.
|
|
|
Nov 8 2022, 15:23
|
Black-lights
Group: Members
Posts: 165
Joined: 20-September 14

|
Please, please go back to having pages. The new system sucks to use. I like to for instance look at all galleries with specific tags, and go page by page over the course of some months. And the way it is currently I cannot easily remember which page I was up to. If the issue is server costs, why not add WebP support or other formats that would help bandwidth?
Or.. ask for more donations?
|
|
|
|
 |
|
Nov 8 2022, 15:30
|
Shank
Group: Global Mods
Posts: 9,321
Joined: 19-May 12

|
QUOTE(christantoan @ Nov 8 2022, 13:23)  Sorry if this has been asked before, I've tried to search but didn't find anything. Will this change reduce the maximum number of search result count? In the old version I have mine set to 200 while in the new version it's now set to 100 maximum. Thank you.
Results per page will be capped at 100, but hath spent on the perk will get refunded for those who had purchased it. For total results for the entire search, the old version I believe was capped at 100k results, but will now be uncapped in the new version.
|
|
|
|
 |
|
Nov 8 2022, 15:37
|
EasyDeath
Newcomer
 Group: Recruits
Posts: 10
Joined: 25-December 10

|
QUOTE(Black-lights @ Nov 8 2022, 16:23)  Please, please go back to having pages. The new system sucks to use. I like to for instance look at all galleries with specific tags, and go page by page over the course of some months. And the way it is currently I cannot easily remember which page I was up to. If the issue is server costs, why not add WebP support or other formats that would help bandwidth?
Or.. ask for more donations?
When you search by specific tag now, just let the page open where you left it, or bookmark it, the galleries that you see on that page will be the same tomorrow or even after a month if not deleted.
|
|
|
|
 |
|
Nov 8 2022, 15:52
|
mundomuñeca
Group: Members
Posts: 4,221
Joined: 14-July 17

|
QUOTE(Black-lights @ Nov 8 2022, 14:23)  Please, please go back to having pages. The new system sucks to use. I like to for instance look at all galleries with specific tags, and go page by page over the course of some months. And the way it is currently I cannot easily remember which page I was up to. If the issue is server costs, why not add WebP support or other formats that would help bandwidth?
Or.. ask for more donations?
The issue for old-style paged searches is not bandwith, it is ram caching and cpu cycles growing exponentially when the archive grows linearly. No way around this, old system just had to go. Besides, your "remember which page I was up to" was already a flowed method; because after a few months (or just after some days for frequently used tags, even a few hours for some) the same page number would not take you to the same content. What is now on page say 25 would be on page 33 or 44 or whatever after some time. Better jot down the date of upload where you are (say, 25 dec 2015); next time search the same tag starting from that date and voilà ... you are exactly where you left off. (or the same day, at least)
|
|
|
|
 |
|
Nov 8 2022, 16:06
|
epa
Group: Gold Star Club
Posts: 102
Joined: 23-August 08

|
QUOTE(waringer @ Nov 8 2022, 12:43)  I have implemented relative dynamic pagination to the script EH-Page-Scrobbler. It's far from perfect but gives a little more control while browsing the search results. (IMG:[ github.com] https://github.com/Meldo-Megimi/EH-Page-Scrobbler/raw/main/sample.png) Looking good, Now if we can do thing like convert the gid into page (not all, but for specific range before and after current "page" to show up, like 5 pages number befor and after), since we know the gid at start of page (like those number shown in the bar before). Can it works that way? This only from perspective of normal, no searching tag show result btw, I don't know how script actually react or 'know' how much the result return when searching tho. And I only glance into your script just a little bit since I very not a front-end guys so may took some time to understand that and come up with proper approch tho. This post has been edited by epa: Nov 8 2022, 16:08
|
|
|
|
 |
|
Nov 8 2022, 16:12
|
jadoeman
Group: Members
Posts: 108
Joined: 28-February 15

|
QUOTE(Tenboro @ Nov 8 2022, 01:53)  I've been thinking a lot about pagination for the last uh, two years or so. And my general feelings on "page-based pagination" for all types of sites that continuously add new content to the "front" is: it's kinda shit. It's what people are used to, but semantically it has no relevance for the origin of the semantic concept, which is the (dead tree format) book. Because unlike a DTF-book, where Jason always kills Alice on page 358, the actual page number has absolutely no association with what content you find on the page except that "page 358 has the content that was posted immediately prior to the content on page 359". And if you check back tomorrow, the actual content on both page 358 and 359 has been replaced with something completely different.
Using markers based on "content posted before/after $work" as well as "content posted before/after $date" just makes so much more sense for this type of content, because it'll generally be exact same "page" when you check it again tomorrow. While people tend to deeply dislike change, and while the fundamental reason for the change is primarily performance/scaling, just because no one (?) has done it before does not (necessarily) make it a bad idea. Something about Henry Ford and faster horses.
You know, thinking about it, a lot of the examples saying "how does google do it" (and whether or not they do is another question) and stuff actually tie back to this. This site's search is pretty binary - each result is either included or excluded, no middle ground. Either a gallery has a tag or not, matches a title or not, etc. And all the various tags and stuff make a single authoritative list in the end. We can display that list by "pages", sure, but the aforementioned problems exist - shifting result windows as time goes on, as well as the performance cost to calculate the full list just to show a full page counter. Whereas something like google doesn't try to do that. You search for "bananas" on google, there will be millions of results... but they don't attempt to show you chronological order or anything. They instead try to show you results based on relevancy. A site that mentions bananas in passing won't be as relevant as the wikipedia article on tree fruits, or the National Banana Museum or whatever. Or ads from grocery stores. Odds are, the highest results won't be the newest ones - just the most relevant ones. Unless news stories start covering something, like a guy shooting up a banana plant - but then they're near the front due to relevancy too. In a situation like this, there is no strong mapping based on date, or crawl time, or anything of the sort. Just a single fuzzy concept of "relevance". And the sorting between those can change at any point as web sites update or new ones are added. But that's what people want from google - relevant results. So sure, pages work fine there, because while pages are arbitrary and vary, the entire concept of relevance is arbitrary and can vary too. ...That, and probably nobody ever wants the 4827492th page of banana results. It would probably be someone's geocities page from1997 talking about their highschool crush with a single dancing banana gif on the bottom.
|
|
|
|
 |
|
Nov 8 2022, 16:21
|
EasyDeath
Newcomer
 Group: Recruits
Posts: 10
Joined: 25-December 10

|
QUOTE(Tenboro @ Nov 8 2022, 12:35)  Most likely it would just latch onto the seek/jump mechanism, so that in addition to going by relative and absolute dates you can also seek to 7% or whatever. But it would necessarily have to include some sort of clickable visual indicator as well. When you are searching for multiple tags/titles and they all have a lot of results (10K+), and you also are not using user searches, and you also are not using a comment search with reasonably few result, it uses a segmenting strategy where it loads in consecutive chucks of tags segmented by GID ranges until it has enough results. So to make it estimate result counts for those types of searches, it would currently have to guesstimate the result count from the overlap in the current result, which could make it fluctuate (a lot) between pages. So to make it provide a reasonable and stable result estimate, I would probably have to use some form of self-learning to improve the accuracy over time. (The search engine already has some self-learning stuff, so it's not entirely out of the question.)
For visual indicator, something like this, minimal without any library would to the trick: [ www.w3schools.com] https://www.w3schools.com/howto/howto_js_rangeslider.aspFor searching by multiple tags/titles, if the search engine work by chunks until it reaches enough results, you mean like this?: 25 for current page +1 before +1 after, so it knows when to disable "next/prev" buttons? So the search engine don't have any real number of total results for that search, since it stops the query at 27results, correct? As for "self learning", probably pre-calculating the search count for the most popular searches will make happy many users.
|
|
|
|
 |
|
Nov 8 2022, 16:48
|
Pharaoh_KM
Lurker
Group: Lurkers
Posts: 2
Joined: 13-April 16

|
Don't know if this belongs here but....
Is there like any Extremely simplified or dumbed down explation on how to use the search or navigate or what jumping during searching does or how to know how deep i am in when i searching? This whole new search engine is leaving me lost, frustrated, and confused to point i am on the verge of going to other sites because at least then i know what works and where i am. Like if I jump "76" would that take me to what used to be page 76 or i guess be 76 clicks of "Next" or would it jump me 76 days worth of galleries meaning like i am 180 clicks of next?
Do tags still work the same, would "Incest" + "english" still give me the results as it would in the old system or would it be different and I need to search something more specific to give me what i want? Like is this gonna give all the galleries tagged with incest and every galleries that are in english or it giving me galleries that contain both? I honestly can't tell anymore, especially with no indication if results are even slightly different.
My favorites are sorted by favorited time and not published time and having page numbers actually helped me be able to navigate them easily and to know if any galleries were deleted. with the new search basically i am now blind to know what is added, removed, and the advanced search options seems to only remove results now?
I don't mind some changes but I really don't like how we went from numbers and information we could actually see to being left in the dark with all the numbers basically being "lol idk" amount of galleries found now. This new Search engine could even be one of the best of its kind and way better then the old one, but like because I have no clue or idea on if i am actually going anywhere and it makes me unsure if i am even getting correct searches, that i rather just not use e-hentai. Because it makes me feel like I am doing a test exam just to search for something. While on "Doujins" or "Nhentai" i can search and see that there are "x results with y amount of pages" and i can easily navigate from there while here on e-hentai it is basically "There are results with a large sum of pages to go through" which is basically meaningless to me if I am always gonna be lost in the results anyway. If a bunch of galleries went missing how would we know? Afterall the results will tell us the galleries went from "many results found" to "many results found".
TLDR: Please make and or explain the search function for even the dumbest and laziest of us to understand how to use it.
|
|
|
|
 |
|
Nov 8 2022, 16:49
|
FabulousCupcake
Group: Gold Star Club
Posts: 495
Joined: 15-April 14

|
QUOTE(EasyDeath @ Nov 8 2022, 15:21)  For visual indicator, something like this, minimal without any library would to the trick:
Nice idea to use what's already available by default, but input type=range or input type=progress is insufficient imo, considering what was originally described. The "currently displayed" range information will be lost with this. Without the range information, it's hard to digest / not very useful. E.g.: [=====||==============]: I am 25% deep into the search result. How many more times do you have to click next to view all the results? You can extrapolate this by counting the total result number and the fact that each page contains fixed number of results (50 or 100), or by hitting next/prev a few times to get some sense of it, but it also means that the visual indicator failed its purpose / is not doing a good enough job. [=====[=====]=========]: I am 25% deep into the results, and I can hit next about twice and I would have seen everything. In pagination parlance: I am at page 2 out of 4.
|
|
|
|
 |
|
Nov 8 2022, 17:19
|
7th_Tzar
Newcomer
  Group: Members
Posts: 69
Joined: 28-July 11

|
It's getting usable with the script. My issue is still with separate expunged and low-power tags search. There aren't many expunged to begin with (relative to normal gallery index) so I don't understand why it's not treated like other tags. There are filters and excluded languages solution already in place.
This post has been edited by 7th_Tzar: Nov 8 2022, 17:20
|
|
|
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:
|
 |
 |
 |
|