PCJ's Whoniverse Gallery

A Doctor Who gallery specialising in Screencaps

Current total size: 680GB
Hours: Minutes: Seconds:
Open Screenshot Calculator
About

Welcome to PCJ's Whoniverse Gallery!

Introduction

Since around 2010, I've wanted to create a Doctor Who screenshot gallery. Now, while there are already a few out there, and some of them are absolutely amazing, they have (up to) four problems in my mind. They're New Who only (not even the spin-offs!), sometimes out of date, the density of caps is too low causing detail to be missed and/or don't get the highest quality caps (e.g. Moffat+ is usually taken from HDTV re-encode at 720p). This gallery is set to change that by using as high a quality as possible for as much as possible.

Since then, I've also seen one or two other wonderful galleries come and go that featured other imagery, such as publicity shots, so would love to eventually evolve into being a general purpose gallery. But that's something to consider months in the future. In the meantime, I would highly suggest checking out the amazing Tragical History Tour.

Spreadsheet

In order to help keep track of the sheer numbers involved, I've created a spreadsheet and made it public. It gives stats on the amount of data that has been processed and what the progress is on our way through phase 1.

Phase 1? Phase 2? Phase 3? What is and isn't covered

The Whoniverse is huge. Like really huge. I'm not even confident I can even cover everything technically speaking. We'll see how things go.

Phase 1 is the main aim of the site, and that is to get screencaps up for all of the main televised episodes of Doctor Who and Spin-offs (as shown in the spreadsheet above). Phase 2 builds on that by adding the extras. Phase 3 is then whatever else in the screencap line. There may be more phases added at a later stage and these phases may merge and mingle a bit, especially when certain things are requested.

System Load Graph

For those interested, here is a system load graph. The load average shows the system load averaged across a minute, 5 minutes and 15 minutes. It has 4 cores so a load average of 4 shows the server is under load.

The Process

This is pretty much all done on my Windows laptop, which is my only half-decent machine, so the automation is Windows-focused. It could very easily be ported to Linux though.

A lot of this is a go big or go home approach and may seem like a little OTT. I'm generally of a mind of "If it's worth doing, it's worth over doing" (Also, "when in doubt, C4".)

The first process is to get them in a compatible format without re-encoding. For DVDs, this is basically running them through MakeMKV then using VideoReDo Batch to get them into .ts. For BD, it's really just a case of picking the right .m2ts and rename it. For Recon, this varies.

Then it's stuck into Adobe Media Encoder and every frame is extracted to .JPG. I experimented with doing it to .PNG but found the end result to match directly. It's just less space and far faster to do it directly. I tried FFMPEG which, despite it being a massive favourite tool of mine and is used in almost every video software, I was far from impressed with the quality (and quality is important to this).

After that, it's branched out into 3 places. For barcoding, for Visipics and for the Gallery itself. This uses Link Shell Extension to create hardlink clones (faster to "copy" and takes up negligible spaces as opposed to the whole file over again).

Barcoding: Inspired by the likes of Movie Barcode and this /r/DoctorWho post, I figured it'd be awesome to make barcodes and there's no better time. A batch script takes each folder there and launches another. This uses ppx2 (a port of Linux's xargs) and Imagemagick's magick (née convert) tool to first compress each frame to 1px wide and saves that to a ramdisk (powered by SoftPerfect RAM Disk). The ramdisk is used because this is very IO intensive and speeds it up dramatically. This is then montaged (via ImageMagick's montage tool) into a single file, where each vertical 1px line represents 1 frame. This is then used by the magick tool to produce several versions. 2 simply squished horizontally (1 second and 2 seconds per horizontal pixel). And another 2 smoothened out (i.e. vertically resized to 1px and resized back again), which is done at heights of 576 and 1080. The barcodes also happen to work out well for finding major errors.

Visipics: Just something a little extra that I may or may not include at the end of the day. I'm sure a lot of use of the screenshots would be for finding HQ stuff per story so this is another approach. You usually can't sort by size on galleries and even if you can, you get a shitload of dupes. The idea of this is basically to set the filter to loose and delete all duplicate images (per story for Classic, per episode for New) and allow the few thousand or so that remain to be downloadable for people to sort by size then). It may not be something I'll go through with though. It depends on space.

And finally, the actual thing, The Gallery: A batch script runs a Python script for each folder (i.e. episode). Instead of straight out picking the first x per x, this Python script tries to get the best quality. Since the episodes are capped at 2 per second, it'll look for the biggest file per half second and delete all the rest. An additional check is in place to ensure each frame selected is at least 5 frames apart. There are issues with this approach but at my skill level, without manually checking or spending a lot longer processing, I think this is a great compromise. Then a copy is made (in order to avoid changing the other hardlinks) and each folder is ran through CaesiumPH to losslessly optimise it to reduce file size. Sadly, this is the step I hate the most as there's no way to automate it right now, it takes forever and since it's a beta, it likes to crash. But it's the best program I've tested so it stays. (There's a commandline tool version almost ready for beta so there's hope!). These files are then renamed to change the numbering (i.e. from each frame to each cap uploaded), using Bulk Rename Utility.

At extraction, dedupe and optimise, I also use DU (another Linux tool ported to Windows via BOW/CygWin) to calculate the space used at each step and add them to the spreadsheet for stats (i.e. gawking at numbers) reasons.

Every day, I run an Rsync (an excellent Linux tool, ported to Windows, avaiable via BashOnWindows/WSL or CygWin) to transfer my external drive (where this is all done on) to the home server to both backup my progress and transfer the pictures ready for the gallery.

The gallery itself is running Coppermine Gallery. Sadly, this is far from the most modern script to use and looks pretty terrible now, but it's the best one I found for free. Files are added via "FTP" upload, which basically just prepares thumbnails/intermediate images (via ImageMagick, again) and adds them to the database. The server load at the top is provided by the wonderful netdata and I have a bash script that automatically updates the total disk space used in the top right every day at midnight.

Every now and then, I use WinDirStat to get an overview of the Caps folder. It helps me see if anything has gone wrong and an indication of how much data I'm currently dealing with. Until BashOnWindows works for me again, I'm also using it to manually enter the number of files into the spreadsheet.

So software used: Windows Batch Scripting, MakeMKV, VideoReDo, Adobe Media Encoder, Link Shell Extension, ppx2, ImageMagick, SoftPerfect RAM Disk, Visipics (which uses ImageMagick to back it), Python, CaesiumPH, BashOnWindows/CygWin, Rsync, DU, WinDirStat, Coppermine Gallery on LAMP (again, using ImageMagick too), netdata, Bash.

Use of Content On This Site

Everything I've created is donationware. You're welcome to use it for non-commercial and commercial purposes for free, however credit or donation is encouraged (especially if used commercially). Please don't redistribute as is and claim credit.

I would advuse you to also be mindful of publicity photos and screenshots. 

Contact

In regards to the Gallery, you can contact me via the Contact page, Reddit, Twitter, Facebook, Tumblr, or Email. Personally, you can also contact me at Twitter and Tumblr.