AI for Pictures: The View From Here
by David Haden
AI SOFTWARE IS PROGRESSING AT A STARTLING PACE. Readers may be familiar with campus chatbots, suggestion bots, auto-summarizers, timeline assemblers, keyword extractors, trend trackers, automated article rewriters, censor bots, style alignment software such as CQuill Writer, and more. They are useful for some actions, but they are not about to write Hamlet. Now, a new breed of AI “picture bots” is on the horizon. One key example is from OpenAI, an R&D lab co-founded and funded by Elon Musk. Its name is DALL•E 2. In my job as editor of Digital Art Live magazine, I’ve seen several extended demos of this forthcoming AI tool. It is evident to me that OpenAI is not, as was the case with its GPT-2 text-writing AI, talking up its technology partly to make a media splash. DALL•E 2 is a huge leap forward, especially because it is fast and trivially easy to operate. Its output goes far beyond the slow methods of creative Photoshop compositing or the nerdy pipeline-wrangling needed for convincing deepfake video clips. It can create convincing new images from a simple, typed-text description, and it can also be fed existing images—a parrot, for instance—and then the typed-text commands can subtly change the image toward a tiger being described. It is still in closed beta, but take a look at the main public demo: It can do far more than make amusing tiger-parrots, and it should not be mistaken for a digital toy. Other similar, but cruder, creative AI tools are WOMBO Dream (from Wombo Studios, Inc.), Disco Diffusion, Artbreeder, and NVIDIA Canvas 1.2 (powered by the GauGAN2 AI model). Even Google has revealed a DALL•E 2 competitor, Imagen.
Alongside these powerful creative tools sit the useful AI assistants, such as software for upscaling images, the colorizing of black and white images with DeOldify.NET, sky replacement, and the seamless erasure of unwanted objects. As many photo librarians and art library staffers will be aware, subject detection, face and texture detection, and auto-tagging are also increasingly mature automated technologies. Over time, standalone tools will become bundled and streamlined into retail software. We will likely also see a blurring of the lines between the utility AIs and more artistic AIs.
But AI tools will not stop there. AIs are already expanding to potentially recover viable pictures from sources thought to be beyond any cost-effective recovery—for instance, via methods such as SwinIR-based super-detailing, added to Topaz Lab’s Gigapixel AI 6.1 (updated in May 2022). In time, there will also be trivial ways to transform one shape into another, one style into another, and maybe even one era into another by swapping, say, cars or buildings. Nor will the recovery be limited to high-resolution photographs. For instance, MediaChance’s new upscaling software is trained to scale small brush-painted canvases. Such art enhancement tools are at a very early stage. But eventually, we may reasonably expect to recover painted color versions of lost art canvases that only exist as poor black-and-white images or engravings. Furthermore, pose-capture and expression-capture from pictures will become possible, enabling the transfer of these to other images or to 3D human figures. It will be possible to automatically build a 3D model of an interior room as seen in a 2D painting. It will also be possible to reinsert that data back into the source picture and thus to relight the scene and figures. It may even become possible for AI to year-date a vintage photo, based on its car models or hat styles. Automated 3D-like shading of 2D line art is already possible, with Style2Paints. Some of the early capabilities will be unwelcome, such as capturing and reassembling images that are supposedly safely locked away from pirates as tiles in tiled and zooming apps of the sort commonly used to show picture collections online. Automated removal of a subtle watermark is a far harder target, but that too may be achieved.
THE FUTURE OF PICTURE AI
Image collections will inevitably be comprehended differently after the release of DALL•E 2 and others. Such rapid change may seem scary to some, but let’s look on the positive side. The following are three ways your institution might engage with picture AI:
- Remixing image collections would go beyond the “scan, add Creative Commons license, tag, and forget” process of early digitization projects. It could be valuable in enticing potential students to pay for courses, first in AI and then in other topics, especially if partnering with local teachers. Forget picture-licensing fees, and think course fees.
- Specialist image collections might train and test their own picture-based AI, especially if they have a distinct stylistic or historical niche to mine—or they may have particular problems to solve in lost-image recovery or knowledge such as dating.
- Knowledge workers can encourage social media platforms to prompt users to tag images. A simple tag such as “AIgen” is not very difficult to type on a keyboard. Such a tag may suffice for quick identification. Embedding some uniform identifier directly into image file headers would also be useful. Once DALL•E 2 pictures become indistinguishable from the real versions, we may be very glad of such small pointers.
LINKS TO THE SOURCES
OpenAI’s DALL•E 2
(colorize vintage pictures and gray scale artwork; Windows desktop version) github.com/ColorfulSoft/DeOldify.NET
Topaz Lab’s Gigapixel AI
(upscale images, and with the latest 6.1 release, also add details) topazlabs.com/gigapixel-ai
MediaChance AI Photo & Art Enhancer
(Automated shading of line art is to return soon in version 5.0.) github.com/lllyasviel/style2paints/blob/master/README.md
“The Impact of AI, Machine Learning, Automation and Robotics on the Information Professions: A Report for CILIP”
(This 2021 report is general background and does not specifically address picture AI.) cilip.org.uk/page/researchreport (click through to the PDF)