Welcome Guest ( Log In | Register )

2 Pages V < 1 2  
Reply to this topicStart new topic
> Hentai@home-Downloader auto archiver script, A helpful windows script to archive finished galleries

 
post Jan 31 2015, 01:11
Post #21
micnorian14



Newcomer
*
Group: Recruits
Posts: 11
Joined: 26-April 10
Level 180 (Destined)


The script can be easily modified to use any commandline archiver like 7zip to output whatever extension you favor. People that use this script probably use it with other galleries like comic archives or even databases with a similar file scheme. I personally use a modified script that will search for a "readme.txt" or "note from uploader.txt" if it exists.

For simplicity I use CBXShell as it integrates well with windows. It also works across networks. The benefit to using CBX is that you can choose to have it only scrape either/or CBZ/CBR files and/or NOT 7z/zip/rar files so you aren't wasting time trying to retrieve media-pane info from multi-gigabyte files on your computer (assuming explorer doesn't hang first) That said - the file format isn't an issue or argument to be had here.

This script has room to improve as it dumps all processed files into one big folder. I can have it output files to folders using the first few characters in the given filename, but that would include things like "Comiket" or "(C68)" or other garbage in front of the [artist] string. The program i have been using is called "robobasket" not "robocopy" and is not free.

The job this script does is applicable to more than just smut. Most cameras include dates as part of the filename. Running a version of this script can sort your entire photo albums by date without the need for metadata - something older cameras either didn't have or lost from moving files over the years.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Jan 31 2015, 02:57
Post #22
blue penguin



in umbra, igitur, pugnabimus
***********
Group: Gold Star Club
Posts: 10,046
Joined: 24-March 12
Level 500 (Godslayer)


As we are making arguments pro and contra .cbz .cbr .cb7 .cbt anyway, let's add data corruption to the cauldron. Imagine that a random bit flipped in your file on disk (e.g. when you use software disk encryption that's actually common because of RAM faults).

If you use plain .cbt (unix tar) you're fine, as the corruption is limited to a single file.

If you use gzipped .cbt you should be fine as well (for exactly the same reason). Recovering gzip is not easy but is [www.gzip.org] possible.

If you use .cbz you're more dependent on your luck. zlib do not have any functionality to recover corrupted files, yet several tools have been created to recover corrupted zip files. Heck, jar or gnuzip works sometimes.

If you use .cbr be sure to use Microsoft Windows, as the repair tools are compiled only for that OS.

If you use .cb7 be prepared to use a hex editor. Seriously, they suggest that on the [www.7-zip.org] 7 zip page.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 2 2015, 08:57
Post #23
micnorian14



Newcomer
*
Group: Recruits
Posts: 11
Joined: 26-April 10
Level 180 (Destined)


QUOTE(blue penguin @ Jan 30 2015, 19:57) *

As we are making arguments pro and contra .cbz .cbr .cb7 .cbt anyway, let's add data corruption to the cauldron. Imagine that a random bit flipped in your file on disk (e.g. when you use software disk encryption that's actually common because of RAM faults).

If you use plain .cbt (unix tar) you're fine, as the corruption is limited to a single file.

If you use gzipped .cbt you should be fine as well (for exactly the same reason). Recovering gzip is not easy but is [www.gzip.org] possible.

If you use .cbz you're more dependent on your luck. zlib do not have any functionality to recover corrupted files, yet several tools have been created to recover corrupted zip files. Heck, jar or gnuzip works sometimes.

If you use .cbr be sure to use Microsoft Windows, as the repair tools are compiled only for that OS.

If you use .cb7 be prepared to use a hex editor. Seriously, they suggest that on the [www.7-zip.org] 7 zip page.


That's good info thank you. I might consider using cbr instead than from meow on. Beyond that - any idea how to sort all these silly files? I've tweaked robobasket to get things done automatically from now on but it's not ideal as the server that ran all this crap is no longer secure (windows XP) and now offline indefinitely. If it can be done better in a linux environment do tell! In fact I probably should have done that years ago in the first place.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 3 2015, 04:34
Post #24
blue penguin



in umbra, igitur, pugnabimus
***********
Group: Gold Star Club
Posts: 10,046
Joined: 24-March 12
Level 500 (Godslayer)


Well, given that you want to organise things by artist automatically, I can tell you straight away that Linux (or Mac) has one thing that MS Windows lacks that will make it a lot easier. Soft links.

It is not consistent that a gallery is named [ circle ( artist ) ] doujin name ( parody ) or [ artist ( circle ) ] doujin name ( parody ) . And regular expressions (or any other type of matching) will not be capable of distinguishing who is the circle and who is the artist. Therefore you assume that all galleries are named correctly as [ circle ( artist ) ] ... and order your files by circle. This will result in a lot of misordered file but hold with me. Now you soflink every gallery changing the two terms i.e. [ artist ( circle ) ] ... and you get everything organised by both artist and circle.

To you it appears as if every file on the filesystem is duplicated, but to the filesystem it takes on 4KB for each such duplication. Softlinks are great.

An example, say the you download these two galleries:
[Side M (Miyamoto Ikusa)] Ura Brave Kingdom 1
Miyamoto Ikusa (SideM) - Ura Brave Kingdom 1 [Translated]

Using this technique (allowed for some clever matching) you end with for files:
[Miyamoto Ikusa (Side M)] Ura Brave Kingdom 1
[Miyamoto Ikusa (SideM)] - Ura Brave Kingdom 1 [Translated]
[Side M (Miyamoto Ikusa)] Ura Brave Kingdom 1
[SideM (Miyamoto Ikusa)] - Ura Brave Kingdom 1 [Translated]

two of which are just soft links to the actual relevant files.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 3 2015, 04:47
Post #25
Torrentymous



Newcomer
*
Group: Members
Posts: 21
Joined: 15-July 12
Level 107 (Lord)


You could also use the galleryinfo.txt file. If only it included tags namespaces, this would make it much easier.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Mar 17 2015, 08:40
Post #26
micnorian14



Newcomer
*
Group: Recruits
Posts: 11
Joined: 26-April 10
Level 180 (Destined)


I hurt myself thinking too hard about this. My spare raspberry pi now hosts the server on a flashdrive with incredible performance to boot. It's low power, headless, secure, and by god it's effective - many times more so than any old laptop or frankensteined-desktop that you couldn't make yourself toss away after all those years.

I never did figure out a solid way to
1: specify that said files be
2: sorted into subfolders
3: based on
4: specific strings in the given file's name.

But oh well. Maybe the next version of hath will generate a fancy html-catalog with all the files in one big folder - much like an offline copy of the main site.

Wait. Why do we even save files locally when they'll always be available online forever?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Jun 8 2015, 22:42
Post #27
dirtyfinger



Newcomer
*
Group: Members
Posts: 33
Joined: 20-May 10
Level 19 (Novice)


QUOTE(el h @ Jan 30 2015, 10:04) *

There are multiple file formats that are basically zip files, but with extra rules about the structure inside. They often use their own extension.

Two others that come to mind are OpenOffice documents and Java jar files.


Or Microsoft Excel documents. All Zip files with stuff in it.

QUOTE(micnorian14 @ Mar 17 2015, 09:40) *

I hurt myself thinking too hard about this. My spare raspberry pi now hosts the server on a flashdrive with incredible performance to boot. It's low power, headless, secure, and by god it's effective - many times more so than any old laptop or frankensteined-desktop that you couldn't make yourself toss away after all those years.

I never did figure out a solid way to
1: specify that said files be
2: sorted into subfolders
3: based on
4: specific strings in the given file's name.

But oh well. Maybe the next version of hath will generate a fancy html-catalog with all the files in one big folder - much like an offline copy of the main site.

Wait. Why do we even save files locally when they'll always be available online forever?

Hmm .... got a raspberry myself ...
You can run a h@h client on it? Man, that would be great.
I could run it, make the zip script for linux and put it in a cron job.
heck, write a scraper to automatically add new stuff to the client.



This post has been edited by dirtyfinger: Jun 8 2015, 22:48
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Jun 9 2015, 02:16
Post #28
blue penguin



in umbra, igitur, pugnabimus
***********
Group: Gold Star Club
Posts: 10,046
Joined: 24-March 12
Level 500 (Godslayer)


QUOTE(dirtyfinger @ Jun 8 2015, 21:42) *
Hmm .... got a raspberry myself ...
You can run a h@h client on it? Man, that would be great.
It works, but it is not anything great. I've run H@H on a Pi, the issue was not memory, not processing power and not even the quirks with java. The issue is that the Pi's network card is slow. Enough for H@H requirements, but it will not produce much hath neither it will download things with a decent speed.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Nov 12 2015, 20:37
Post #29
JunkManCometh



Casual Poster
***
Group: Members
Posts: 169
Joined: 29-October 13
Level 167 (Lord)


QUOTE(micnorian14 @ Mar 17 2015, 06:40) *

I hurt myself thinking too hard about this. My spare raspberry pi now hosts the server on a flashdrive with incredible performance to boot. It's low power, headless, secure, and by god it's effective - many times more so than any old laptop or frankensteined-desktop that you couldn't make yourself toss away after all those years.

I never did figure out a solid way to
1: specify that said files be
2: sorted into subfolders
3: based on
4: specific strings in the given file's name.

But oh well. Maybe the next version of hath will generate a fancy html-catalog with all the files in one big folder - much like an offline copy of the main site.

Wait. Why do we even save files locally when they'll always be available online forever?


There is a way to do what you want now that the tag field in galleryinfo.txt contains namespaces. Instead of parsing the filename you will need to parse galleryinfo.txt. I have a perl script that pulls the tag information to build the metadata file for use when importing the archived files into Comic Rack. Doing what you want shouldn't take too much extra effort. Parse the file for the tags you want and create the directory structure of your choice.

Feel free to mangle my script as you please to get your intended results.
CODE
#!/usr/bin/perl -w

use strict;
use warnings;
use Win32::Console::ANSI;
use Term::ANSIColor qw(:constants);

my $root = 'D:\Temp\hath';
my @comicHead = ("<?xml version=\"1.0\"?>\n", "<ComicInfo xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n", "  <Genre>H Manga</Genre>\n", "  <LanguageISO>en</LanguageISO>\n", "  <AgeRating>Adults Only 18+</AgeRating>\n", "  <Manga>YesAndRightToLeft</Manga>\n");

# Creates ComicInfo.XML from galleryinfo.txt for use by Comic Rack
#  
# Exported info should include the following
#  <Series></Series>                     == title
#  <StoryArc></StoryArc>                  == maps from parody namespace
#  <Writer></Writer>                    == maps from artist namespace
#  <Summary></Summary>                == includes whole galleryinfo.txt blob
#  <Notes> ||| TAGS: 'list_of_tags' ||| </Notes> == used by "Retrieve TAGS from Notes" script in CR

opendir(DIR, $root) or die $!;

while (my $dir = readdir(DIR)) {
    next if (-f $dir); # skip non-file entries
    if (-f $root.'\\'.$dir.'\\galleryinfo.txt') {
        open FILE, $root.'\\'.$dir.'\\galleryinfo.txt' or die $!;
        my @lines = <FILE>;
        close FILE or die $!;
        open COMIC, ">$root\\$dir\\ComicInfo.xml" or die $!;
        print COMIC @comicHead;
        foreach my $line (@lines) {
            $line =~ s/\<3/E3/g;  # search for '<3' replace with 'E3'
            if($line =~ /^Title:\s*(.*)$/) {
                print GREEN;
                printf '%.35s...', $1;
                print RESET;
                print COMIC "  <Series>$1</Series>\n  <Summary>$1</Summary>\n";
            }
            if($line =~ /^Tags:\s*(.*)$/) {
                print " --> Has Tags\n";
                print COMIC "  <Notes> ||| TAGS: '$1' ||| </Notes>\n";
                if($line =~ /artist:(?<artist>.*?),/) {
                    print COMIC "  <Writer>$+{artist}</Writer>\n";
                    }
                if($line =~ /parody:(?<parody>.*?),/) {
                    print COMIC "  <StoryArc>$+{parody}</StoryArc>\n";
                    }
            }
        }
#        print COMIC "  <Summary>@lines</Summary>\n";
        print COMIC "</ComicInfo>\n";
        close COMIC or die $!;
    }
    else {
        print RED ON_YELLOW "$dir ", RESET;
        print "--> no GALLERY file\n";
    }
}
print WHITE, ON_BLACK, "done";
closedir(DIR);
exit 0;
User is offlineProfile CardPM
Go to the top of the page
+Quote Post


2 Pages V < 1 2
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 


Lo-Fi Version Time is now: 25th April 2025 - 13:47