 |
 |
 |
The Rise of [AI Generated] Galleries on e-hentai, Analysis and Food for Discussion |
|
Apr 17 2023, 01:48
|
booleantrue1
Newcomer
 Group: Recruits
Posts: 12
Joined: 10-October 20

|
While [AI Generated] galleries have been around in significant numbers since DALL-E2 released to the public on September 28, 2022, in recent weeks the number of [AI Generated] posts has risen alarmingly, and over this weekend it has surpassed 5% of new posts on an ongoing basis, with no sign of slowing down. (IMG:[ i.imgur.com] https://i.imgur.com/wUKHIbf.png) (IMG:[ i.imgur.com] https://i.imgur.com/ZSaqkB0.png) As I mentioned in a previous thread about the e-hentai API, there's a trimodal distribution in the ratings of these posts, which I assume is down to some quirk of the rating system (IMG:[ i.imgur.com] https://i.imgur.com/9aHTgLE.png) Looking at the most prolific uploaders of these galleries, much of this spike is driven by new activity. While there are some uploaders that are consistently rated poorly, the higher rated uploaders still aren't consistently highly rated - for example, Drizz46MaleWV is always low for photorealistic images, but KEYLUN isn't always high for their content, it's a mixed bag. (IMG:[ i.imgur.com] https://i.imgur.com/uV3AImq.png) Looking at the ratings impact of some selected tags on [AI Generated] posts, I'd like to point out the worrying bias toward toddlercon / small breasts / lolicon tags, which I've confirmed is across a large number of unique uploaders, not just a few individuals. While I don't have issue with lolicon in Doujinshi for people who can distinguish art from reality, the semi-realistic nature of some of these images does worry me. The reality is that AI Generated Content is a Pandora's Box that isn't getting closed anytime soon, and we should have some constructive community discussion and understanding of what it will look like going forward on this site. It will only become faster and cheaper to generate with time, and who knows where this percent of new posts curve will end up topping off - 5%? 10%? 25%? I think AI Generated content can be artistically valuable and fills the want for certain niche content that would be challenging to commission otherwise, but I would also suggest that we expect a certain amount of minimum effort and quality for AI Generated posts, whether that be in creativity, nicheness, beauty, or some other guidelines. Anyway, with that I think I've plated out enough food for discussion - if anyone has more specific questions about the data or wants to see code, feel free to DM me or comment here. Imgur images are downscaled, full-size pngs are attached. This post has been edited by booleantrue1: Apr 17 2023, 02:05
|
|
|
|
|
 |
|
Apr 17 2023, 02:34
|
Meight
Group: Gold Star Club
Posts: 190
Joined: 11-December 19

|
Would it be possible to separate Ehentai user generated AI stuff from stuff scraped from another website, or at least which website it came from if in title? I'm noticing a fair chunk are titled with PIXIV, Fanbox, or Patreon. Some match with the user that uploaded, but other users are clearly uploading stuff they found on another website. I think page length might also be a useful metric to look at. A final thing, I would compare the number of AI posts to total posts for the users. The top poster, for example, posts a lot of stuff in general and AI makes up very little overall.
This post has been edited by Meight: Apr 17 2023, 02:42
|
|
|
|
|
 |
|
Apr 17 2023, 02:37
|
Glovelove.
Group: Members
Posts: 4,258
Joined: 11-June 17

|
QUOTE(booleantrue1 @ Apr 17 2023, 00:48)  Looking at the ratings impact of some selected tags on [AI Generated] posts, I'd like to point out the worrying bias toward toddlercon / small breasts / lolicon tags, which I've confirmed is across a large number of unique uploaders, not just a few individuals. While I don't have issue with lolicon in Doujinshi for people who can distinguish art from reality, the semi-realistic nature of some of these images does worry me.
Yeah for stuff like this, I really wish I could just blacklist a specific tag combination without turning it into a whole puzzle sorting out the right weights to make everything work just right for watched tags, but as with realistic 3d, it mostly comes down to users reporting the most realistic ones for investigation and administration making the final call. QUOTE(booleantrue1 @ Apr 17 2023, 00:48)  I think AI Generated content can be artistically valuable and fills the want for certain niche content that would be challenging to commission otherwise, but I would also suggest that we expect a certain amount of minimum effort and quality for AI Generated posts, whether that be in creativity, nicheness, beauty, or some other guidelines.
Do you have any objective way of measuring quality, or a way to quantify effort for that matter? ratings definitely aren't going to work for that because a lot of people will just rate 1/5 as soon as they see a Guro, Scat, Rough Grammar, ect. tag instead of just blacklisting the shit they don't want to see, most discussions on quality control ultimately loop back to "ok so how do define 'low quality' in a way a majority agrees on where to draw the line and how do we prevent some random discord brigade from hijacking this system to push their own agenda" I could definitely do with less AI garbage, I think most people could agree with that at the very least, getting a majority to agree on what is and isn't garbage and then ensuring that people stick to that definition (even if they weren't part of that majority) when cleaning it up is the hard part. This post has been edited by Glovelove.: Apr 17 2023, 03:00
|
|
|
|
|
 |
|
Apr 17 2023, 03:07
|
Meight
Group: Gold Star Club
Posts: 190
Joined: 11-December 19

|
QUOTE(Meight @ Apr 17 2023, 02:34)  Would it be possible to separate Ehentai user generated AI stuff from stuff scraped from another website, or at least which website it came from if in title? I'm noticing a fair chunk are titled with PIXIV, Fanbox, or Patreon. Some match with the user that uploaded, but other users are clearly uploading stuff they found on another website. I think page length might also be a useful metric to look at. A final thing, I would compare the number of AI posts to total posts for the users. The top poster, for example, posts a lot of stuff in general and AI makes up very little overall.
One last thing I forgot to ask, could "other" and "disowned" be put into its own charts? I'm interested in the role of power users.
|
|
|
|
|
 |
|
Apr 17 2023, 03:16
|
booleantrue1
Newcomer
 Group: Recruits
Posts: 12
Joined: 10-October 20

|
QUOTE(Meight @ Apr 16 2023, 19:07)  One last thing I forgot to ask, could "other" and "disowned" be put into its own charts? I'm interested in the role of power users.
Unsure what is meant by this - regenerate a new set of plots for just galleries not uploaded by the most prolific uploaders? Since the most active uploaders are posting more recently, it will probably artificially drop the curve? -- Edit Some Time Later -- Gave it a shot, here's what the rolling percent of new posts looks like without the top 14 most prolific uploaders I showed previously. Still shows a steep increase in the last few weeks, meaning it's a larger group of uploaders who are starting to post more AI Generated galleries, rather than a select few posting more often. This post has been edited by booleantrue1: Apr 17 2023, 03:47
|
|
|
|
|
 |
|
Apr 17 2023, 03:29
|
booleantrue1
Newcomer
 Group: Recruits
Posts: 12
Joined: 10-October 20

|
QUOTE(Meight @ Apr 16 2023, 18:34)  Would it be possible to separate Ehentai user generated AI stuff from stuff scraped from another website, or at least which website it came from if in title? I'm noticing a fair chunk are titled with PIXIV, Fanbox, or Patreon. Some match with the user that uploaded, but other users are clearly uploading stuff they found on another website.
As requested, here's the source of posts as pulled from the title. There's a significant amount of content reposting, mainly from Pixiv. 
|
|
|
|
|
 |
|
Apr 17 2023, 03:33
|
Meight
Group: Gold Star Club
Posts: 190
Joined: 11-December 19

|
QUOTE(booleantrue1 @ Apr 17 2023, 03:16)  Unsure what is meant by this - regenerate a new set of plots for just galleries not uploaded by the most prolific uploaders? Since the most active uploaders are posting more recently, it will probably artificially drop the curve?
Yes, but mostly just for the 3rd graph. Like you said, the most prolific have been posting mostly recently, but it makes it hard to tell what the trend is with the various random users. Also, thank you providing these graphs in general. This post has been edited by Meight: Apr 17 2023, 03:42
|
|
|
|
|
 |
|
Apr 17 2023, 04:08
|
decidua
Group: Members
Posts: 389
Joined: 9-July 20

|
This is kind of neat OP. I've read your other thread before and wish there's a more thorough explanation about that trimodal distribution. Having certain tags does contribute but it still is just part of a bigger picture.
Do you mind DM-ing me the code? I'm interested in trying it with Cosplay category, where it seems like the nature of cosplayer race/ethnicity matters more to the galleries' rating rather than the effort/quality of the cosplay itself or how much skin they're showing (surprising for this site). Although, frankly that seems to require a lot more manual cataloging since cosplay tagging system is a bit lackluster in here.
|
|
|
|
|
 |
|
Apr 17 2023, 04:22
|
pork:zero
Group: Catgirl Camarilla
Posts: 2,887
Joined: 10-August 13

|
QUOTE(booleantrue1 @ Apr 17 2023, 01:29)  As requested, here's the source of posts as pulled from the title. There's a significant amount of content reposting, mainly from Pixiv.  You should be able to better identify the source with the uploader description and filenames.
|
|
|
|
|
 |
|
Apr 17 2023, 04:39
|
booleantrue1
Newcomer
 Group: Recruits
Posts: 12
Joined: 10-October 20

|
QUOTE(decidua @ Apr 16 2023, 20:08)  Do you mind DM-ing me the code? I'm interested in trying it with Cosplay category, where it seems like the nature of cosplayer race/ethnicity matters more to the galleries' rating rather than the effort/quality of the cosplay itself or how much skin they're showing (surprising for this site).
I didn't consider it might be of general interest so I'll throw it up on Github [ github.com] https://github.com/ChloeElo/e-hentai_processingSteps to use: Use the regular front page search to filter for whatever you want to look at. Use Ctrl-S to save the entire webpage as an html file, and repeat this for however many pages you want to process data from. This is clunky and slow, but as far as I know there isn't currently an API to search with tags, and dealing with cookies and tokens to do it via browser emulation (eg Selenium) is too much of a pain in the ass. Once you have html files for each page of search results, you can use process.py to pull the gid and token of each gallery you want to get metadata for. This gets saved as idtokens.p Next, request.py uses the metadata JSON API to grab metadata for the galleries in batches of 25 at a time, since that's the max allowed. It waits a second between each request to avoid spamming the server, it's generally good practice to spread out requests if you don't want to get IP banned. The metadata for all the galleries gets saved to metadata.p Finally, visualize.py compiles all the data and draws all the fancy charts from the metadata I've included an example html doc in the repo to show what the saved search results should look like. If anyone knows a better way to do any of these steps, please for the love of good programming practice let me know. Like I mentioned in my previous thread I couldn't find a way to get favorited and view counts without scraping each page individually, which is bad practice and I don't want to do unless I have to, since for thousands of galleries I'd probably have to spread out those requests over a day or more.
|
|
|
|
|
 |
|
Apr 17 2023, 07:16
|
decidua
Group: Members
Posts: 389
Joined: 9-July 20

|
QUOTE(booleantrue1 @ Apr 17 2023, 11:39)  Use the regular front page search to filter for whatever you want to look at. Use Ctrl-S to save the entire webpage as an html file, and repeat this for however many pages you want to process data from. This is clunky and slow, but as far as I know there isn't currently an API to search with tags, and dealing with cookies and tokens to do it via browser emulation (eg Selenium) is too much of a pain in the ass.
Hmm, I'm not familiar with scraping EH, but should there not be any API for search function, can't you just use Requests instead of Selenium or manually saving each pages? I didn't try it with cookies but the response for GET request to 'https://e-hentai.org/tag/other:ai+generated' gave me navbar URL for next search page and resulting galleries URL to parse for Gid & token which I think is enough to automate the scraping with interval.
|
|
|
|
|
 |
|
Apr 17 2023, 09:34
|
blood vash 1
Newcomer
 Group: Members
Posts: 20
Joined: 3-August 15

|
Perhaps a way to filter out some low quality content without being too vulnerable to brigading would be to require a certain number of 3+ star ratings after the first week or the content is automatically deleted. That way it doesn't matter how many people rate it low, as long as some people think it's good enough.
Some of the AI stuff is good, particularly for niches that are otherwise a little sparse. Even then, it seems people have a tendency to find a prompt that gives a good output, then run it a hundred times and upload an entire gallery of almost identical images...
|
|
|
|
Apr 21 2023, 17:09
|
weakref
Group: Members
Posts: 198
Joined: 8-February 14

|
I doubt the impact of dall-e 2 public release since most content creators are using stable diffusion. I would consider the milestone to be novelai leak (October 6).
|
|
|
|
|
 |
|
Apr 22 2023, 17:19
|
Necromusume
Group: Catgirl Camarilla
Posts: 7,419
Joined: 17-May 12

|
Now you have galleries like this, [WetLady] - Sisters in no panties and dresses [AI Generated]This would be a realporn expunge. It's completely generic photography, except you can still barely tell it's 'AI', and so it's allowed. I think the reason realporn isn't allowed is that there's so much of it out there that it would just flood the site, and overwhelm the content the site is actually intended to host. Just tagging realporn and telling people to blacklist the tag isn't considered a sufficient solution. So, the prediction is that you are probably going to have to disallow AI content for the same reason. Excessive future-proofing of the rule aside, at this point, we can still tell what is AI with no drama about 99.9% of the time. And for content like this, if it wasn't AI, it wouldn't be allowed anyway. There's fierce competition for new uploadable material, especially material which is available for free, without having to pay DLsite or buy physical doujinshi and scan them. Even doing a quality photoshoot costs hundreds or thousands of dolllars, to hire a model and a photographer and rent costumes and a venue. AI, by contrast, is limitless. It's a pool from which you can draw another 600 pictures at any time. We had thought that people not getting much GP would make AI uploading self-limited, and so far, for whatever reason, that hasn't occured. Either they do get enough views, or people just upload out of OCD, or hope, or wanting to be an uploader. There were what-ifs about artists using AI as part of their process. There has turned out to be far less of that, because it requires actual work. Those could be treated separately. I wonder how much interest people in the future will have in viewing endless archived AI generations from 2023. I'd much rather have all the hand-drawn Touhou fan work that used to be on Pixiv, but never got uploaded anywhere else because they were black and white sketches, or something weird, or a 4-panel comic that someone couldn't read. A lot of those were great, but they're probably lost forever, and they can't be recreated. The zeitgeist and the mind-state abstract of all the artists can't be stored. For AI, if someone saves a copy of the software, all they need to store is the prompt string and the random seed and they can generate the same 20,000 pictures 10 years from now. Ideally, people wanting to share large quantities of interesting stablediffusion output would have a specialized site where they can store and share that information more efficiently. It would also make it more resistant to censorship of the outputs, since anyone could recreate it. This post has been edited by Necromusume: Apr 22 2023, 17:47
|
|
|
|
|
 |
|
Apr 22 2023, 18:03
|
Meight
Group: Gold Star Club
Posts: 190
Joined: 11-December 19

|
QUOTE(Necromusume @ Apr 22 2023, 17:19)  Now you have galleries like this, [WetLady] - Sisters in no panties and dresses [AI Generated]Ideally, people wanting to share large quantities of interesting stablediffusion output would have a specialized site where they can store and share that information more efficiently. It would also make it more resistant to censorship of the outputs, since anyone could recreate it. There's at least one for furry stuff in e6ai, though furry AI is relatively rare on eh anyway. I wonder what the rate of AI looks like now because it seems like some of the power users have stopped posting the stuff. Maybe they saw the threads people made or just got bored. Pixiv seems to have a good amount of AI stuff considering how much gets reposted here. This post has been edited by Meight: Apr 22 2023, 18:04
|
|
|
|
|
 |
|
Apr 22 2023, 18:21
|
Necromusume
Group: Catgirl Camarilla
Posts: 7,419
Joined: 17-May 12

|
Well, Pixiv is for profit, so they're not likely to care so long as they still get Pixiv Premium subscriptions, ad views, and can track everyone's browsing behind the login wall to build a targeted advertising profile on them.
|
|
|
|
|
 |
|
Apr 22 2023, 20:54
|
fletchlives
Newcomer
 Group: Members
Posts: 24
Joined: 1-May 08

|
QUOTE(booleantrue1 @ Apr 17 2023, 01:48)  The reality is that AI Generated Content is a Pandora's Box that isn't getting closed anytime soon, and we should have some constructive community discussion and understanding of what it will look like going forward on this site. It will only become faster and cheaper to generate with time, and who knows where this percent of new posts curve will end up topping off - 5%? 10%? 25%?
I think AI Generated content can be artistically valuable and fills the want for certain niche content that would be challenging to commission otherwise, but I would also suggest that we expect a certain amount of minimum effort and quality for AI Generated posts, whether that be in creativity, nicheness, beauty, or some other guidelines.
Anyway, with that I think I've plated out enough food for discussion - if anyone has more specific questions about the data or wants to see code, feel free to DM me or comment here. Imgur images are downscaled, full-size pngs are attached.
The forum circlejerkers and mod asskissers don't want to do anything, so nothing will happen. Every step of the way they've had to be dragged kicking and screaming: admitting it's actually different from actual art and warrants placement in misc, making a tag at all while crying about it the whole time, requiring it in the title instead of userbase added like virtually every other tag, making presence required over 50% unlike every other tag, etc, etc. All ignoring the fact literally anybody could just shit this stuff out at home on their own computer without any effort with no logical reason to upload it elsewhere. It's as if people started flooding video host sites with their modded Skyrim jerk sessions; who cares about something that took no effort and anybody could make on their own? QUOTE(Necromusume @ Apr 22 2023, 17:19)  Now you have galleries like this, [WetLady] - Sisters in no panties and dresses [AI Generated]This would be a realporn expunge. It's completely generic photography, except you can still barely tell it's 'AI', and so it's allowed. I think the reason realporn isn't allowed is that there's so much of it out there that it would just flood the site, and overwhelm the content the site is actually intended to host. Just tagging realporn and telling people to blacklist the tag isn't considered a sufficient solution. So, the prediction is that you are probably going to have to disallow AI content for the same reason. Excessive future-proofing of the rule aside, at this point, we can still tell what is AI with no drama about 99.9% of the time. And for content like this, if it wasn't AI, it wouldn't be allowed anyway. There's fierce competition for new uploadable material, especially material which is available for free, without having to pay DLsite or buy physical doujinshi and scan them. Even doing a quality photoshoot costs hundreds or thousands of dolllars, to hire a model and a photographer and rent costumes and a venue. AI, by contrast, is limitless. It's a pool from which you can draw another 600 pictures at any time. We had thought that people not getting much GP would make AI uploading self-limited, and so far, for whatever reason, that hasn't occured. Either they do get enough views, or people just upload out of OCD, or hope, or wanting to be an uploader. There were what-ifs about artists using AI as part of their process. There has turned out to be far less of that, because it requires actual work. Those could be treated separately. I wonder how much interest people in the future will have in viewing endless archived AI generations from 2023. I'd much rather have all the hand-drawn Touhou fan work that used to be on Pixiv, but never got uploaded anywhere else because they were black and white sketches, or something weird, or a 4-panel comic that someone couldn't read. A lot of those were great, but they're probably lost forever, and they can't be recreated. The zeitgeist and the mind-state abstract of all the artists can't be stored. For AI, if someone saves a copy of the software, all they need to store is the prompt string and the random seed and they can generate the same 20,000 pictures 10 years from now. Ideally, people wanting to share large quantities of interesting stablediffusion output would have a specialized site where they can store and share that information more efficiently. It would also make it more resistant to censorship of the outputs, since anyone could recreate it. I personally think it's really damn strange that there is so much pushback from the admins towards this completely innocuous attitude. As if we'll be missing out on sooooo much "art" (that a trained chimp could make) if any sort of restrictions were placed on AI generated content. QUOTE(Necromusume @ Apr 22 2023, 18:21)  Well, Pixiv is for profit, so they're not likely to care so long as they still get Pixiv Premium subscriptions, ad views, and can track everyone's browsing behind the login wall to build a targeted advertising profile on them.
Pixiv also at least requires it to be properly tagged and offers users the ability to ignore and hide it, while EH seems intent on making that as difficult as possible and fought the ability to do so the entire way. QUOTE(peterson123 @ Apr 22 2023, 23:28)  Actually it's not that difficult: - go to https://e-hentai.org/mytags
- in the search field there, enter "ai generated" and select that
- check the checkbox left of "Hidden"
- click the "Save" button on the very right
Hope this helps. No shit really?! I definitely didn't know this and hadn't turned it on the second you whining retards finally admitted there was a need for a tag after half a year of crying about it. Any other completely unrelated pointless comments you want to make, or do you actually want to contribute to the discussion? This post has been edited by fletchlives: May 1 2023, 22:07
|
|
|
|
|
 |
|
Apr 22 2023, 21:58
|
cutegyaru
Group: Gold Star Club
Posts: 219
Joined: 21-June 13

|
QUOTE(fletchlives @ Apr 22 2023, 14:54)  The forum circlejerkers and mod asskissers don't want to do anything, so nothing will happen.
Every step of the way they've had to be dragged kicking and screaming: admitting it's actually different from actual art and warrants placement in misc, making a tag at all while crying about it the whole time, requiring it in the title instead of userbase added like virtually every other tag, making presence required over 50% unlike every other tag, etc, etc.
All ignoring the fact literally anybody could just shit this stuff out at home on their own computer without any effort with no logical reason to upload it elsewhere. It's as if people started flooding video host sites with their modded Skyrim jerk sessions; who cares about something that took no effort and anybody could make on their own? I personally think it's really damn strange that there is so much pushback from the admins towards this completely innocuous attitude. As if we'll be missing out on sooooo much "art" (that a trained chimp could make) if any sort of restrictions were placed on AI generated content. Pixiv also at least requires it to be properly tagged and offers users the ability to ignore and hide it, while EH seems intent on making that as difficult as possible and fought the ability to do so the entire way.
Yeah, agreed. Nothing much more to say, to be honest. We've had similar experiences when trying to push for a "MTL" tag (now just compromised to "rough translation"). Thanks to OP for his number crunching, but even without the data laid out, anyone with half a brain saw this flood coming from miles away. It's funny still seeing such a tepid response from the mod team. Maybe once 90% of new content is an endless churn of AI-generated images we'll see a sterner stance being taken. You know, the kind that SHOULD have been taken ages ago and that literally every other site besides EH has taken. This post has been edited by cutegyaru: Apr 22 2023, 21:59
|
|
|
|
|
 |
|
Apr 22 2023, 22:36
|
JnTo.
Group: Gold Star Club
Posts: 362
Joined: 30-September 15

|
Adding new categories would be great, 3d has already been suggested, it would be the perfect opportunity to add 3d and AI.
If you ban AI content, people will still find ways to play the system, retouching and uploading AI content that is very hard to spot... Some AI content on site hasn't even been detected and "normies" who actually look at the images in the galleries and jerk on them can't add the tag and don't even know you can petition to rename a gallery...
It's a problem that will only grow, and considering the time it took to just add a tag, I seriously doubt that the ideal measures will be taken in time, we'll probably have to wait 4-5 months when the situation will have degenerated well to finally see concrete solutions appear...
This post has been edited by JnTo: Apr 22 2023, 22:37
|
|
|
|
|
 |
|
Apr 22 2023, 23:28
|
peterson123
Group: Members
Posts: 3,016
Joined: 22-February 12

|
QUOTE(fletchlives @ Apr 22 2023, 18:54)  Pixiv also at least requires it to be properly tagged and offers users the ability to ignore and hide it, while EH seems intent on making that as difficult as possible and fought the ability to do so the entire way. Actually it's not that difficult: - go to https://e-hentai.org/mytags
- in the search field there, enter "ai generated" and select that
- check the checkbox left of "Hidden"
- click the "Save" button on the very right
Hope this helps. This post has been edited by peterson123: Apr 22 2023, 23:29
|
|
|
|
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
|
 |
 |
 |
|