When using AI models like Stable Diffusion, sometimes input images need to be of a specific size. In case of Stable Diffusion, multiples of 64 are required. Stable Diffusion (at least 1.5) works best with images of 512 pixels in width or height. If you for example take an image of 599 x 205 pixels and you resize it to 1496 x 512 (maintaining the aspect ratio), you end up with 1496, which is not a multiple of 64. In order to obtain a usable image, it needs to be padded to a size of 1536 x 512 to allow processing. In order to batch resize and pad images I created a Python script using OpenCV. Why create a script? Doing this for a large amount of images quickly becomes a chore and writing a script for this is fun. Why OpenCV? It is powerful, easy to use and popular.
First you need to install OpenCV. This can be done with: "pip install opencv-python" or "conda install -c conda-forge opencv" (whichever package manager you prefer). The script itself is relatively straightforward and you can view it below and here. This also shows how easy it is to do basic things with OpenCV like opening a file, resizing, padding and writing it back to the filesystem. OpenCV doesn't support the AVIF file format (read here) that's why I included the option to exclude extensions. Most other image file formats are supported though (such as JPEG, PNG, BMP, TIFF, WEBP and others). Output images are renamed to [original filename without extension]_scaled.png. OpenCV determines the format to save the file in based on the extension so you do not need to specify this explicitly. I have chosen a color of white for padding since the Stable Diffusion WebUI tool uses a black mask and this way I can easily see what has been masked and what hasn't. Also I only pad to the right and to the bottom since in my experience, those are the areas for which inpainting is usually most useful.
No comments:
Post a Comment