Welcome Guest ( Log In | Register )

90 Pages V  1 2 3 > »   
Closed TopicStart new topic
> New Search Engine, No Read, Only Post

 
post Nov 2 2022, 12:09
Post #1
Tenboro

Admin




If you want to know more about the rationale for the changes, start by reading the Change Rationale below (original version in this post).

If you think you found a bug, post the EXACT QUERY you were using, not a vague description of it.


FAQ

Q: Why did you change the search engine? (TL;DR version)
A: The way the old search engine worked could no longer scale with the size of the site's index, and was failing on an increasing number of queries. No amount of money or hardware could have fixed this long-term, so the only option was to fundamentally change how it works. The new search engine is the best tradeoff of functionality and performance available.

Q: Can we have page numbers back?
A: No. Read the Change Rationale below.

Q: But I really need page numbers because of reasons. What if I give you money? Can *I* have page numbers back?
A: No. Read the Change Rationale below.

Q: But some rando on the internet told me that it's actually really easy to have page-addressed search results with tens of thousands of pages for database indexes with hundreds of millions of rows and I believe them because I want it to be true which means I think you are lying. Can we have page numbers back?
A: No. Read the Change Rationale below.

Q: What if I threaten to kill your house and burn your dog to the ground? Can we have page numbers back?
A: No. Read the Change Rationale below.

Q: But what if-
A: Just no. Read the Change Rationale below.


Change Rationale

or; (Not having an exact page selector is worse than having an exact page selector / Not having an exact result count is worse than having an exact result count) and the new search engine is therefore worse than the old search engine!

If you ignore all the new and improved functionality and the vastly higher performance, and focus only on "but I want page-addressed results" and "but I want exact result counts", this might be the case. These specific changes were not made because I thought it would be an improvement by itself, but out of necessity.

With the old search engine, because of the ever increasing size of the index, results were taking longer and longer to generate, and it required more and more RAM to do so. Many queries would take on the order of three seconds to generate at the time the search engine was replaced, which would be doubled in a couple of years at the current index growth rate, leading to (non-controllable) timeouts. Furthermore, RAM usage for generating a result was more or less linear with the size of the result plus the size of the index for each of the queried terms, and there are practical limits to how much RAM can be made available for a particular process, so at some point queries would just start failing in unpredictable ways.

Notably, this was already the case for a non insignificant number of complex queries.

In other words, if we kept the old search engine, in a few years, if you tried to search for anything with many results, you would inevitably either get a Cloudflare timeout page, or the query would fail with a memory error. And that's if the site itself isn't completely unusable since all its CPU time might be tied down into trying and failing to create search results. Which, obviously, is bad.

The conclusion is, even with unlimited hardware (which we do not have the necessary unlimited funds for), the old search engine would be effectively unusable for most if not all queries with many results in two to three years without significant changes.

The available options were:

1. Replace the old search engine that uses a naive approach of effectively building the full result for a query (which is necessary for full-range page navigation and exact-ish result counting) with a brand new search engine that is a lot more clever about doing stuff.

2. Put a band-aid on the old search engine by significantly curtailing the maximum size of the search result and/or the range of searchable content.

3. Remove functionality that was expensive in the old search engine, such as hybrid title/tag searching and comment searching.

With the second option, you would have pages, but there might be a maximum 10 of them with 100 results per page. You'd have "exact" result counts, but it would just be capped to 1000. Many sites use variants of this approach, like Google, Nyaa and most if not all large Boorus, but if you think for a second that this would cause any less of a shitstorm if we changed to it, I guess I should welcome you to the internet, because it's obviously your first time here.

With the third option, you would only be able to search for titles or tags, but not both at the same time. For example, if you searched for "part of title" english you would only be able to find things with those two terms in the title, not galleries tagged with "english". Comment searches and various other functionality would just be removed entirely. Searching would be all around hampered and unintuitive. See; shitstorm.

Alternatively, you might have a curtailment that only galleries posted in the last couple of years are searchable. Shitstorm.

Alternatively alternatively, those things but with donator-only unlocks and higher limits. Shitstorm.

I went with the first option, which involved three months of active development plus an additional month of testing + optimization, and is in my opinion by far the best possible tradeoff of performance and functionality. Even if some people disagree with the changes, this is a hill I'm willing to die on.

You are allowed to both disagree and/or dislike change in general, but if people keep accusing me of lying about the current state of things and the reasoning for the changes because they read a bunch of misinformation and conspiracy theories posted by clueless autists on 4chan, I'll just start handing out bans to preserve my sanity, so stop doing that.


2023-03-10

- When using exclusion terms, it will no longer just flag the search as "about", instead it shows how many results were excluded on that particular page. (If a gallery would have been both excluded and filtered, it is counted as excluded.)

- For consistency reasons, when the search result fits on a single page, excluded galleries are now included in the result count.


2023-03-07

- In some filtered search modes, when using multiple search terms, the search engine would previously use an "exact" count based on the unfiltered index that could be significantly off. It now uses the count estimator in these cases. This just flags it as "about" for now, and may be revised later.


2023-01-09

- Corrected a search indexing issue where some substrings consisting of three characters enclosed in square brackets did not get indexed properly.


2023-01-07

- Corrected an issue that prevented the range bar and range jumping from working correctly with searches involving weak tags.

- Corrected an issue with title exclusions where some characters that are stripped from search queries were not stripped from titles before comparison, causing unexpected behavior.


2022-12-22 - Bugfix

- For index searches, if a very rare combination of internal buffer states occurred, the search would act as if there were no more results when there actually were. This should be fixed now.


2022-12-20 - Update

The search engine will now attempt to give a ballpark estimate for result counts in all standard searches except for searches with inclusive comment terms. The estimate is based on internal stats, index sampling, and a history of the span of results found on each page.

The new estimator is primarily used for complex queries where all terms have many hits, where it would previously only say "many". It is also used whenever a page range filter is set, and for index searches when several categories are unselected.

For any query that has not been searched recently, the initial estimate will usually be a vague and conservative lower bound (like "thousands" or "10,000+"). A more precise estimate may be provided when enough samples have been collected.

It generally prefers to under-estimate rather than over-estimate the count. As such, it should generally be interpreted as "probably more than".

The accuracy for the estimate depends on the accuracy of the range map used by the range indicator, and the same caveats apply. Complex searches with non-dependent terms will generally be less accurate.

Note the estimate can fluctuate a bit as you go between pages. This is expected.

Other Changes:

- Fixed some issues with underscores in favorite searches. Similar to username searches, spaces and underscores should now be equivalent for all favorite search usage.

- You can now use favnote:* or -favnote:* to filter favorites with any favorite note.

- Several caching issues were found with the setting to exclude namespaces by default when searching. A fix would be complicated and essentially make searches uncacheable for everyone using it, and since it's only used by a small fraction of a percent of visitors and a lot of people seem to be confused about what it does, this setting has been disabled for now. It will likely be reintroduced in a slightly different form in the future.

- Some fully updated but less than bleeding edge browsers were having issues with the javascript generated by the javascript optimizer we now use. This optimization now targets an older level of compatibility, which should fix this issue.


2022-12-05 - Update

A new result range indicator + range jump mechanism has been added. The range indicator will let you see roughly where you are in a search result and how much of it is found on the current page, while the range jump mechanism will let you jump an approximate number of percent into a search result. Range jumping is done by clicking on the range indicator bar.

This mechanism is almost, but not quite, entirely unlike pages. While it has a fleeting similarity, don't expect it to behave exactly like them.

Most importantly, the range indicator uses various internal statistics to work with basically zero overhead; it does not actually generate pagination for the full search result. This means you should for example not expect the displayed number of pips for each page (or pages per pip for large results) to be fully consistent across the entire result. While it does to a large degree correct for variations in volume and usage over time, there will still be unpredictable natural variations (clusters and gaps) in the distribution of results. This especially applies to comment searches, which cannot make use of any precomputed statistics.

The range indicator is only available for normal searches; that is, not for favorites, watched tags, or in gid/file searches. Favorites will be revisited at a later date, as part of a larger rework of the favorite system.

Other Changes:

- Added the Jump postfix "g" for GID (Gallery ID) jumps. Using this with any GID will jump to the position in the search result with this gallery as the first (or last) result on the page. (If the gallery does not exist in the search result, it will still work, the gallery just won't be there.)

- Added a new setting to disable the new range indicator.

- Corrected a rare edge case where the search UI would act as if a search had no more results even if it did.


2022-11-25 - Minor Fix

- Fixed an issue where some more characters in uploader usernames were not properly searchable.


2022-11-21 - Improvements

- Added some significant optimizations for a frequently used search strategy for when multiple name+tag/comment search terms are used and at least one of the name+tag terms has less than 10000 hits. (For some cases this will reduce processing time by >90%).

- The search query parser will now handle various cases where repeated or redundant search qualifiers are used, such as weak:tag:foo or tag:tag:tag:bar.


2022-11-18 - Fixes

- The publish date adjustment for galleries created with the old uploaders (predating October 2021) has been completed. This should fix the remaining quirkiness with gallery sort placement as well as with the seek/jump mechanism. Note that these galleries are now considered "published" when the gallery was created rather than when it was actually published, though in most cases this would only shift the date by a few minutes to a few hours.


2022-11-17 - Minor Fixes

- When searching for comments, if the search term was too short after being stripped of non-indexable characters, the term was silently ignored. It now properly fails the search with an error message instead.

- Fixed tags hidden under My Tags not being displayed with search results when filters are disabled.


2022-11-16 - Deployment + Fixes

- This update is now fully deployed.

- Fixed an issue with how some dynamic stats were generated that only manifested under high load.


2022-11-15 - Minor Fixes

- Fixed a bug in favorite searching where, depending on internal state and order of operation, title-only searches could break when multiple terms were used.

- The wording of "default filters" was changed to "custom filters" to make it clearer that it is referring to your personalized/customized tag, uploader and language filters, rather than some global default filter.


2022-11-13 - Minor Fixes

- Fixed some more search issues with uploader usernames with leading or trailing underscores as well as multiple consecutive spaces/underscores.

- We now avoid using the /uploader/ shorthand URLs for uploader usernames containing forward slashes since the resulting URLs are broken.


2022-11-11 - Minor Additions/Tweaks

- When searching for tags (or titles+tags) where there is just one tag match and you have that tag filtered, the system will now specifically ignore that filter. If you actually want the tag filtered, you can use the title: qualifier.

- The search engine will now stop looking for more results for a page if more than 1000 galleries have been filtered. (This is mostly relevant in edge cases where you are intentionally searching for things you heavily filtered.)

- Fixed search warnings not being displayed for favorite searches.

- Added a setting to remove the "Your default filters removed XX galleries from this page" message.

- Added a new qualifier "weak:" to search for weak tags. This replaces the "Search Low-Power Tags" checkbox. Using weak: in front of a keyword works the same as using tag: except it will search weak tags (<10 power) instead of active (10+) ones.

This change allows for some additional flexibility, since you can now search for various combinations of weak tags and active tags - for example, all galleries with an active parody tag from a particular series, and weak character tags from said series.

Weak tags cannot be used for exclusions or searched in favorites. Additionally, if you are using OR searches, either all or none of the OR terms must use the weak: qualifier.

It is not possible to search for both active and weak instances of the same tag at the same time, or mix normal and weak OR terms in general, since they use different indexes. These are not artificial limitations. The weak tag search is there to aid in tagging and cleanup in order to either get rid of them or make them into active tags, not to get "more results" in casual browsing.


2022-11-07 - Bugfixes

- Corrected an issue with tag/name searching in uploader results.

- Corrected glitchy behavior with the new jump/seek selector on the favorite page, as well as an issue with the favorite checkbox selector positioning.

- Corrected seek/jump offsets not being kept if you switched display mode (minimal/compact/etc) right after using it.

- Corrected an issue where some characters weren't properly stripped for name index lookups.

- Corrected an issue where, when encountering terms that were long enough to search but that contained characters that are not valid in tags, it would still attempt to parse it as a tag except with those characters stripped, but if there were less than 3 stripped characters, it would then fail the term as being too short. Terms with characters that cannot be used in tags are now instead parsed as title-only unless a different qualifier is used.


2022-11-06 - Minor Addition

- Incorporated a clickable jump/seek selector based on a suggested code addition from FabulousCupcake.

Note that the date selector uses the built-in browser one, and as such it will use your browser's locale for the date format. (This is automatically translated to the site's date format by your browser.)


2022-11-05 - Update

New Feature: Seek/Jump Navigation

You can now do arbitrary jumps (number of days/weeks/months/years) backwards and forwards in search results, as well as arbitrary seeks to a specific date in the search results, by clicking the new Jump/Seek button in the navigation bar and entering a number or date in the box that appears.

Entering a number will make it jump backwards or forwards by the specified number of days, aligned to the start or end of each day. Adding w, m or y to the number will make it jump by that number of weeks, months or years instead. When jumping forwards (Jump >), the jump is based off the posted time of the oldest (bottom-most) gallery on the current page. When jumping backwards (< Jump), the jump is based off the posted time of the newest (topmost) gallery on the current page.

Entering a date with the YYYY-MM-DD will make it seek to that date in the search result (inclusive). Note that the semantics of < Seek and Seek > is somewhat different than < Next/Jump and Next/Jump > - specifically, which button you use determines whether it uses the date as the starting point or the ending point.

You can also use the YYYY-MM shorthand date. In this case, it will start from the first day in the month when going backwards and the last day in the month when going forward. (In other words, in either case it will include that entire month.)

If you only enter a number (not followed by d w m or y) and it is between 2007 and 2099, it will be interpreted as a year. In this case, it will seek to the last day the year when going forwards and the first day of the year when going forwards.

With the YYYY-MM-DD and YYYY-MM formats, the two first Ys can be left out - in other words, 22-11-05 will be interpreted as 2022-11-05.

Seeks and Jumps to galleries posted before October 2021 or so will be wonky until I run a script to make some fixes to the publish timestamps to match the behavior of newer galleries. This correction will happen shorty after the update is fully deployed.


Bugfixes

- Corrected an issue where galleries were no longer displayed under favorites if they are unavailable.

- Corrected an issue where, when using the /tag/ URLs (such as when clicking tags from the gallery page), it would keep adding additional quotes if you clicked the navigation links.

- Corrected some issues with uploader usernames with underscores and spaces. Note that for syntax and visual ambiguity reasons, underscores and spaces are now considered equivalent in uploader username searches.

- Corrected excluded categories still appearing on the Popular Pane. (They are still supposed to appear with file, gid and favorite searches.)

- Corrected a potential issue where the file/gid searches weren't including expunged galleries even though they were supposed to.

- Corrected an issue with dashes/hyphens in name searches where they weren't properly stripped for the index lookup.

- Corrected an issue where if you were using advanced search and *only* picked a minimum rating, the navigation wouldn't include it, so it would reset between pages.


2022-11-01 - Original Post

This update is a complete rewrite of the gallery search engine, meaning that the usage and behavior of searches has changed in a number of more or less significant ways.

The most significant and visible fundamental change is that the internal segmenting of search results is now done by gallery ID (GID) ranges rather than "pages". While this means jumping to an arbitrary "page" in the result is no longer supported, this is arguably an improvement since you can now jump to an arbitrary GID instead. This also means each page of results will be fixed on the same set of galleries even if it is refreshed after new galleries are added. The page navigation has been reworked to reflect this.

This also fundamentally fixes a long-standing issue where going backwards in the results via the page navigation (as opposed to the browser back button) would often include results from the following page if you were using any form of filtering.

Overall, these changes allow for massive performance improvements (three orders of magnitude in some common cases) as well as significant new functionality (keep reading), and there are no longer any limits to how large a search result can be. Search terms that were previously capped to 100,000 results (like say "big breasts" which is tagged on 350K+ galleries) can now be browsed in their entirety.


OR Tag Searching

OR searching is now supported for tags. (Probably the most requested feature of all time.)

To use OR tag searching, prefix the keyword with ~

Example: ~yuri ~"females only" ~f:sole_female$

Specifically, if you have at least two keywords with the OR operator, the search will return all galleries that contain at least one of the tags in question. Using the OR operator will imply the tag: qualifier. If you use it with any other qualifier that isn't a tag namespace, the OR operator is ignored and the keyword will run as a standard AND search.

Using OR searching will "consume" one of the allowed inclusion search terms. If you only specify one OR term, it will be treated as an AND tag-only term. There are no specific limits to how many OR terms you can specify, though it will still be practically limited by the search string length cap. It will additionally bail if the overall OR search is matching more than 1000 tags internally, so consider using exact tags to allow for more terms.

Wildcards cannot be used for OR terms.


Exclude-Only Searching

You can now do exclude-only searches. (Probably the other most requested feature of all time.)

Example: -yaoi -m:footjob -"glory hole" -sole_male$ -title:"novel ai" -comment:pixiv -uploader:BigDickDave69

You can use up to 10 comment+favnote exclusion terms and 10 tag (or hybrid tag+name) exclusion terms in a search.

The gid, uploader, uploaduid and title qualifiers are not specifically limited for exclusions, though they will still be practically limited by the search string length cap.


Tag Watching

The time cutoff for the tag watching page has been significantly increased:

- For non-donators, the cutoff was increased from one week to at least one month. The exact cutoff depends on internal segmenting, the rate new galleries are added, and the total index count for your watched tags. It will generally be somewhere between one and six months.

- For donators (gold star+), there are no longer any cutoffs. In other words, you can browse and search watched tags back to the launch of the site if you want. Note however that searching for terms that have few matches in your watched tags may produce fewer than expected results per page.


UI => Search Syntax Changes

The "Search Gallery Name", "Search Gallery Tags" and "Search Gallery Description" checkboxes as well as the corresponding search checkboxes on the Favorite page have all been removed; this functionality is now part of the search syntax instead.

By default, each search term will be interpreted as a hybrid tag+title search, and will match the gallery name (both english/romaji and japanese) as well as the gallery tags.

To only match gallery names, prefix the term with the title: qualifier
* Example: title:keyword -title:"string of keywords"

To only match gallery tags, prefix the term with a tag namespace, or tag: for all namespaces, or use the exact tag operator $, or use the OR operator ~
* Example: f:"big breasts" tag:group -futanari$ ~twintails

To search uploader gallery comments, prefix the term with the comment: qualifier
* Example: comment:"insightful uploader musings" -comment:"less insightful ones"

Favorite searches only: To search favorite notes, prefix the term with the favnote: qualifier
* Example: favnote:"this is my favorite gallery" -favnote:"on the citadel"

Note that this means combined tag+name+comment/favnote search terms are no longer supported.


Search Parsing Changes

- When doing unquoted searches with unqualified short and/or non-indexable words (a, an, ai, to, the, and, so, on, and so on), as well as some common adjectives (small, big, huge, gigantic), they will now be automatically appended or combined with the following priority:

* If there is a non-qualified search term immediately following the short word, it will be combined with that one.

For example, searching for "a dick in a box" without quotes will be searched as "a dick" "in a box". Everyone's new favorite "ai generated" without quotes will be searched as if it had quotes.

* If there is a non-qualified search term immediately preceding the short word, it will be combined with that one.

For example, searching for "novel ai" without quotes will be searched as if it had quotes.

* If there are only short words, they will be combined into one quoted word if there is more than one.

For example, searching for "ex on the ox" without quotes will be searched as if it had quotes.

* If there is just one short word, or the short words are between qualified search terms, it will be searched as an exact tag. A warning is printed in this case.

For example, searching for "9s c:a2 2b" without quotes will be searched as "tag:9s$" "character:a2$" "tag:2b$"

To combine short words with a different priority, use quotes or underscores. ("word1 word2 word3" and word1_word2_word3 are equivalent.)

To avoid combining short words when searching tags, use the tag: or tag namespace qualifiers.

Note that there is a single two-character word "3d" that was specifically whitelisted for title searches, but it is not an indexable word for comment searches so it cannot be used for that.

- Support for single-character wildcarding was dropped, and the * wildcard can now only be used at the end of keywords. Title, comment and favnote searches are implicitly wildcarded for indexing reasons, so adding a wildcard will only affect tag searching.


Search Term Limits

Exclusions and inclusions now have separate limits. A query can have up to 5 name+tag inclusion terms, 10 name+tag exclusion terms, and 10 comment+favnote inclusion+exclusion terms.

For both inclusions and exclusions, uploader:, uploadid: and gid: terms aren't specifically limited, but would still be limited by the max length of the search string (200 chars).

For exclusions, title: terms are also not limited.


GID Searching

You can now use the gid: search qualifier to search (publicly visible) galleries by Gallery ID. If you search a GID that has been replaced, it will list the current gallery instead.

Inclusion gid: terms cannot be combined with keyword searches or used in watch mode. This does not apply to exclusion terms. If used for exclusion, it will not exclude any galleries that replaced the provided GID.

You can specify multiple gid: terms in the same query for an implicit OR search.

This search mode will show both normal and expunged galleries. Default tag, language and uploader filters are automatically disabled for these searches.


Result Counting

For performance reasons, the search engine will no longer count the exact number of results in large result sets; instead result counts will usually be approximated based on various metrics. It will say "about" if the count is an estimate.

For complex multi-term searches with large result sets, it may not have enough information to give a reasonable estimate. In these cases, rather than showing a potentially wildly inaccurate one, it will just show "many". This only affects the count readout, navigation for these search results works the same as for smaller ones.

Smaller result sets (i.e. those that fit on one page) should return the exact count in all cases. Filtered galleries are included in this count, to match the behavior for estimates.

The page range filter, exclusion search terms and default language/uploader/tag filters will not generally be reflected in approximate result count estimates.

If you use the category, rating or torrent filters, it will use precomputed adjustment factors to correct the estimate. For some searches this estimate may be fairly inaccurate, say if you search for terms that are mostly applicable for specific categories then unselect other categories.

Result counts are not displayed in favorite searches or on the popular page. In the former case, it would only be able to display one for small result sets, and in the latter, it's all one page of results anyway. You can however still see the total for each favorite category.


Tag Search Behavior

- Tag searching now defaults to matching on word boundaries to reduce unwanted matches. In other words, searching for "tag:mana" will still match all tags that have "mana" as one of the words (like "secret of mana" [=> seiken densetsu] or "mana inuyama"), but it does not match "manabe", "manatsu", "manami" and so on. Searching for "tag:mana*" will restore the previous behavior.

- If there are too many tag matches for a term, it will now automatically rerun the term as an exact search instead of erroring out.

- Selecting "Search Low-Power Tags" will now only search low-power tags. This mode will also not do hybrid title/tag searches, so if a term is left unqualified (i.e. "big breasts") it will only search the tag. You can still search titles by using the title: qualifier.

- The "Search Downvoted Tags" option was removed.


Comment Search Behavior

Uploader comments and favorite notes are now searched using the comment: and favnote: qualifiers. favnote: is only available in favorite searches.

The way comments are indexed have been fundamentally changed, and there will be some subtle differences between normal text searches and favorite + exclusion-only text searches, since the former will usually use indexes while the latter do not.

Most notably, some otherwise-searchable common words (like "this" and "with") are not comment-searchable when the index is used but will be searchable when it is not. Also, when the index is used, words starting with these short words will not be matched unless you search for that exactly (like "with" and "withhold").

Furthermore, when the index is used it will only find word matches that start with the string, but when it's not it will also find matches that have the string as part of a word.

The index is only used for normal inclusion comment searches, but even for those it may not be used for some words and searches depending on various internal factors and thresholds, so you should not rely on this behavior.


Other Changes

- Various issues and limitations with favorite searches have been resolved. Searches in favorites should now behave the same as normal searches except for the noted comment/favnote search behavior.

- Exclusion searches for titles, tags (except for exact tags), comments and favnotes will now match any part of a word; i.e. -"laughter" will exclude "slaughter".

- Indexes are now generally updated immediately when the underlying data changes, which should reduce the delay until changes are reflected in searches. (Due to caching, there can still be some delay.)

- Whenever a gallery title has a mixed string of unicode and latin characters without any spaces or other breakable characters, like romaji漢字moreromaji, it would previously only be searchable with terms starting with "rom...", "漢字..." and "字mo..". It is now also searchable for "mor...".

- The "Your default filters removed..." message is now more consistent and specifically counts all galleries filtered by your default uploader, tag and language search filter settings. (When using both filters and exclusions and a gallery would have been removed by both, it is counted as an exclusion.)

- Selecting "Search Expunged Galleries" will now only search expunged galleries in normal searches. (File searches, GID searches and favorite searches will always display both normal and expunged galleries.)

- File searches can no longer be combined with keyword searches or other filters. This search mode will show both normal and expunged galleries. Default tag, language and uploader filters are now automatically disabled for these searches.

- Excessively narrow page range filters (min > 1000, max < 10, min/max > 0.5, min-max < 20) are no longer allowed.

- The max number of results per page is now 100. Paging Enlargement III was removed and will be refunded Soon™.


Known Issues/Quirks/Complaints/Workingasintendedisms

- You may sometimes see galleries appear out-of-order when going from one page to the next - in other words, going by the posted date, you would have expected the gallery to be on another page. This mostly applies to older galleries that predated the latest uploader update. This is because, prior to said update, a gallery could have been assigned a GID long before it was actually posted. This might eventually be addressed after a future redesign of the gallery metadata tables by renumbering galleries that are significantly out of order.

- If you are browsing from the end of a search results (backwards browsing mode) all the way to the start, the "last" page in the result (the one with the oldest results) will have a full page of results and the "first" page in the result (with the most recent ones) will have the remainder. This is working as intended.

- If you go backwards in a search result and get to the "first" page (with the most recent results), the "<< First" link will be lit up to flip back to the first page in forwards browsing mode even if there are no further pages and "< Prev" is disabled. This is working as intended.

- If you search for several AND inclusion tag terms (or hybrid title+tag terms), where every term has many results (~10K+) and some have a lot of results (~100K+), and there is a low degree of overlap between the tags, you may see fewer than expected results per page. You can usually use exact tags to avoid this.

- In general, "results per page" should be considered a target rather than a guarantee. For example, as an internal optimization, if a result page is at least 95% full after a search cycle, it may return with a couple of results "missing" instead of starting another search cycle (which can be expensive). This does not mean it's withholding results from you, you'll find them on the next page.

- "But $tool/$script needs the ability to access arbitrary pages in search results and/or accurate search result counts" is out of scope/wontfix. Update it to use the new gid-based navigation. And no, the old search engine was not "working just fine the way it was", it was failing on an ever-increasing number of searches due to running out of RAM when building results and badly needed a fundamental redesign to cope with the ever-increasing size of the index.


This is likely the most complicated update in the site's history, so there will probably be bugs and other subtle behavioral changes. Please don't hestiate to ask whether something is intentional if it's not noted in these patch notes.
User is online!Profile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:22
Post #2
negavamas



Newcomer
*
Group: Members
Posts: 33
Joined: 23-August 13
Level 56 (Expert)


I may sound like an idiot, but I don't understand the GID search, doesn't removing the paging effectively kill what's between the first and last pages?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:28
Post #3
Nobodycaresaboutme



Lurker
Group: Lurkers
Posts: 3
Joined: 27-March 11


QUOTE(negavamas @ Nov 2 2022, 13:22) *

I may sound like an idiot, but I don't understand the GID search, doesn't removing the paging effectively kill what's between the first and last pages?


This.

Is 'page' a technical term in this context? Because I don't care much how a collection of galleries is handled in the backend, but I do care very much about the ability to jump forward x amount of galleries. So remove pages if that gives you performance improvements, but for the love of god, please give us a 'Next (x times)' button.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:32
Post #4
Azya22



Lurker
Group: Lurkers
Posts: 2
Joined: 15-July 15


How exactly do you "jump to an arbitrary GID"?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:33
Post #5
Vulkandrache



Lurker
Group: Lurkers
Posts: 3
Joined: 30-October 09
Level 19 (Novice)


"While this means jumping to an arbitrary "page" in the result is no longer supported, this is arguably an improvement since you can now jump to an arbitrary GID instead."

How could that possibly be any sort of improvement.
The ability to see the amount of pages and be able to jump to any of them was
the most important thing in the entire seach function.



I usualy do "Tankoubon English" as a standard search term.
I know that this only has a few hundred result pages with 25 each.
Now it shows as "many".
How is a search that cant count to triple digits any useful?

If in that search i go to "last" the browser shows a 1 in the adress bar.
Every klick on "Prev" jump a different amount of numbers.
How am i supposed to find anything specific like say "the middle"?


This post has been edited by Vulkandrache: Nov 2 2022, 12:39
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:36
Post #6
astral02



Regular Poster
*****
Group: Members
Posts: 696
Joined: 18-May 13
Level 254 (Godslayer)


Is the page selector gone for good? or are you in the middle of updating that? is there a reason it goes away?

This post has been edited by astral02: Nov 2 2022, 12:39
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:36
Post #7
Nrj Gangsta Rap



Newcomer
*
Group: Recruits
Posts: 13
Joined: 25-October 14
Level 242 (Godslayer)


Some nonsense, how to navigate the search results now? Let's say there are 100 search pages, how do you think I should go to page 50 now? Or do you think that clicking the next button 50 times to get to this page is much more convenient?

This post has been edited by Nrj Gangsta Rap: Nov 2 2022, 12:40
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:38
Post #8
Tenboro

Admin




Fixed some instances where it would produce invalid queries and error out if intermediary results were empty. (There would be no results in this case anyway.)

QUOTE(negavamas @ Nov 2 2022, 11:22) *
I may sound like an idiot, but I don't understand the GID search, doesn't removing the paging effectively kill what's between the first and last pages?


Concisely, instead of navigating to "results on <pagenumber>" you navigate to "results immediately following <galleryid>" or "results immediately preceding <galleryid>" depending on whether you are going forwards (from newer to older) or backwards (from older to newer). So you can navigate all of the galleries from start to end, you just can't go to "pagenumber=364" because the system doesn't know what that is.

QUOTE(Azya22 @ Nov 2 2022, 11:32) *
How exactly do you "jump to an arbitrary GID"?


If you look at the URL, you'll see "next=number" or "prev=number" at the end of it. This number is the gallery id, and you can change it to whatever to go to the results immediately following (next) or preceding (prev) that gallery.

QUOTE(astral02 @ Nov 2 2022, 11:36) *
Is the page selector gone for good? or are you in the middle of updating that? is there a reason it goes away?


It's gone as the index has grown to a size where it's not computationally feasible to have it anymore.

QUOTE(Nrj Gangsta Rap @ Nov 2 2022, 11:36) *
Some nonsense, how to navigate the search results now? Let's say there are 100 search pages, how do you think I should now go to page 50? Or do you think that clicking on the next button 50 times to get to this page is much more convenient?


If you explain why you need to go to page 50, I'll tell you how it's intended to work with the new search engine.
User is online!Profile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:41
Post #9
Crystalium



Casual Poster
****
Group: Members
Posts: 284
Joined: 14-August 13
Level 75 (Champion)


I apologize if this has been addressed somewhere in the post (I did try to see if it was mentioned), but on the sad panda version of the site, the thumbnails in galleries appear broken for me (they are just blank). This is the case both on desktop and mobile. I assume this is somehow related to the changes made.

Edit: Switching to "Large"-size thumbnails gets them to show up, but the "Normal"-size thumbnails (which is what I use by default) are broken for me in the manner I described earlier.

This post has been edited by Crystalium: Nov 2 2022, 13:40
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:41
Post #10
astral02



Regular Poster
*****
Group: Members
Posts: 696
Joined: 18-May 13
Level 254 (Godslayer)


QUOTE(Nrj Gangsta Rap @ Nov 2 2022, 06:36) *

Some nonsense, how to navigate the search results now? Let's say there are 100 search pages, how do you think I should go to page 50 now? Or do you think that clicking the next button 50 times to get to this page is much more convenient?


please at least bring back page selector (IMG:[invalid] style_emoticons/default/biggrin.gif)

the rest of the update is awesome
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:44
Post #11
billtt



Lurker
Group: Lurkers
Posts: 1
Joined: 10-May 12
Level 18 (Novice)


I'm sorry but it's worst update ever.
Without pages I can't skip to certain pages like I just read 10 pages and need to go to the 11th page.
And scripts bind to pages is unusable like page number+1 etc.
Also I can't see thumbnail in galleries, it's just blank picture until I click it.
My browser version is firefox 56.
e-hentai was great until yesterday.

edit:thumbnail "normal" won't show but "large" is no problem.

This post has been edited by billtt: Nov 2 2022, 13:15
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:46
Post #12
Vulkandrache



Lurker
Group: Lurkers
Posts: 3
Joined: 30-October 09
Level 19 (Novice)


Why i would need to jump to page 50?
Maybe i did a search for some tags and want to go throught the results one by one.
But there are too many for one sitting so i just remember or bookmark the last page i checked.
Sure it might slide by a few results next time but that way easier to deal with than whatever this new thing is supposed to be.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:49
Post #13
negavamas



Newcomer
*
Group: Members
Posts: 33
Joined: 23-August 13
Level 56 (Expert)


QUOTE(Tenboro @ Nov 2 2022, 11:38) *

Concisely, instead of navigating to "results on <pagenumber>" you navigate to "results immediately following <galleryid>" or "results immediately preceding <galleryid>" depending on whether you are going forwards (from newer to older) or backwards (from older to newer). So you can navigate all of the galleries from start to end, you just can't go to "pagenumber=364" because the system doesn't know what that is.

I don't see how that is improving the search engine from a user standpoint, I doubt many people know about the GID or how to work with it.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:49
Post #14
blehrg



Lurker
Group: Lurkers
Posts: 4
Joined: 25-October 15


Might as well ask Tenboro since he might see it, mostly to sate my curiosity as someone who is working on a local gallery system for offline organization.

Was E/EH using limit/offset prior to this change?

Is/Was "Tag search" being done through the usage of FTS or
does E/EH use a form of Toxi Tri-Table for tagging or something else?

One of the original ways of pagination that still threw me for a loop is/was the support of semi-stable random page jumping even with complex search And/Exclude.

From the surface it seems the change now has shifted to use keyset pagination based on gallery id and forgo support for random access pages.

I don't know what database E/EH uses, but have you considered looking into/toying around with relational division supported through abuse of join collapses. Requires DB query planner to support folding multiple self joins by smallest index size (works with postgresql)?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:52
Post #15
Liadis



Lurker
Group: Lurkers
Posts: 1
Joined: 25-April 11
Level 59 (Expert)


The removal of page selector is understandable if it's not feasible, but it does stop me from browsing the way that I like. Often I just go to page, say, 6000ish just to find stuffs from a certain era, as an example. I am not always sure what I am looking for, but just "something from roughly 5 years ago". That has become exceedingly difficult to do now.

But well, understandable. I just hate it.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:52
Post #16
astral02



Regular Poster
*****
Group: Members
Posts: 696
Joined: 18-May 13
Level 254 (Godslayer)


QUOTE(Tenboro @ Nov 2 2022, 06:38) *

Fixed some instances where it would produce invalid queries and error out if intermediary results were empty. (There would be no results in this case anyway.)
Concisely, instead of navigating to "results on <pagenumber>" you navigate to "results immediately following <galleryid>" or "results immediately preceding <galleryid>" depending on whether you are going forwards (from newer to older) or backwards (from older to newer). So you can navigate all of the galleries from start to end, you just can't go to "pagenumber=364" because the system doesn't know what that is.
If you look at the URL, you'll see "next=number" and "prev=number" at the end of it. This number is the gallery id, and you can change it to whatever to go to the results immediately following (next) or preceding (prev) that gallery.
It's gone as the index has grown to a size where it's not computationally feasible to have it anymore.
If you explain why you need to go to page 50, I'll tell you how it's intended to work with the new search engine.


isn't it fine to keep the page selector? because sometime for nostalgia's sake people go through gallery from specific years by clicking on page number. there is also time where one might search some old gallery where we only remember few of the tags and searching those tags still result in more than 50+ pages. At least with page selector we can just try to pinpoint the gallery bay pages where the current one we either search from beginning or search from all the way back.

Please Tenboro... I hope you can reconsider the removal of page selector
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:52
Post #17
Tenboro

Admin




I'm aware that removing the pagenumber addressing breaks your favorite 3rd party downloading or browsing tool, it even says so in the patch notes, but there is no world line in which the site still has pagenumber addressing and also doesn't just keel over in the next year or two, so there really is no point in demanding its return.

QUOTE(Vulkandrache @ Nov 2 2022, 11:46) *
Why i would need to jump to page 50?
Maybe i did a search for some tags and want to go throught the results one by one.
But there are too many for one sitting so i just remember or bookmark the last page i checked.
Sure it might slide by a few results next time but that way easier to deal with than whatever this new thing is supposed to be.


In that case, if you leave the search page open or bookmark it, the results on that page will conveniently be exactly where you left it when you continue reading no matter how long it takes.

QUOTE(blehrg @ Nov 2 2022, 11:49) *
Was E/EH using limit/offset prior to this change?

Is/Was "Tag search" being done through the usage of FTS or
does E/EH use a form of Toxi Tri-Table for tagging or something else?


The search engine is fully custom, both before and after the change. It was never using limit/offset.
User is online!Profile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:54
Post #18
negavamas



Newcomer
*
Group: Members
Posts: 33
Joined: 23-August 13
Level 56 (Expert)


Actually, I've never thought of this, could we search before within an upload date range? Like between march 2020 to june 2020? If this is possible it might be better than the new ID system.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:55
Post #19
astral02



Regular Poster
*****
Group: Members
Posts: 696
Joined: 18-May 13
Level 254 (Godslayer)


QUOTE(Tenboro @ Nov 2 2022, 06:52) *

I'm aware that removing the pagenumber addressing breaks your favorite 3rd party downloading or browsing tool, it even says so in the patch notes, but there is no world line in which the site still has pagenumber addressing and also doesn't just keel over in the next year or two, so there really is no point in demanding its return.
In that case, if you leave the search page open or bookmark it, the results on that page will conveniently be exactly where you left it when you continue reading no matter how long it takes.


is there really no chance to reconsider? or can you at least keep the page selector on favorites or when we search for somthing?

This post has been edited by astral02: Nov 2 2022, 12:57
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 2 2022, 12:55
Post #20
Gorince



Lurker
Group: Lurkers
Posts: 2
Joined: 5-July 14
Level 230 (Lord)


QUOTE(Tenboro @ Nov 2 2022, 04:38) *

If you explain why you need to go to page 50, I'll tell you how it's intended to work with the new search engine.

If one has a specific gallery one is looking for, and is aware of the general timeframe it was uploaded, but doesn't recall the tags used, it would be much easier to jump to page XX and search around there than it would be to... I don't even know what you might do with the new search in that scenario, honestly.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post


90 Pages V  1 2 3 > » 
Closed TopicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 


Lo-Fi Version Time is now: 21st November 2024 - 18:49