Settings

Animation modes:

“Border, translation_x, translation_y, rotation_3d_x, rotation_3d_y, rotation_3d_z, noise_schedule, contrast_schedule, color_coherence, diffusion_cadence, 3D depth warping, midas_weight, fov, padding_mode, sampling_mode, and save_depth_map. Resume_from_timestring is available during 3D mode. (more details below)

Animation Parameters:

 

Motion Parameters:

motion parameters are instructions to move the canvas in units per frame

 

Coherence:

The color coherence will attempt to sample the overall pixel color information, and trend those values analyzed in the 0th frame, to be applied to future frames. LAB is a more linear approach to mimic human perception of color space - a good default setting for most users.

 

HSV is a good method for balancing presence of vibrant colors, but may produce unrealistic results - (ie.blue apples) RGB is good for enforcing unbiased amounts of color in each red, green and blue channel - some images may yield colorized artifacts if sampling is too low.



The diffusion cadence will attempt to follow the 2D or 3D schedule of movement as per specified in the motion parameters, while enforcing diffusion on the frames specified. The default setting of 1 will cause every frame to receive diffusion in the sequence of image outputs. A setting of 2 will only diffuse on every other frame, yet motion will still be in effect. The output of images during the cadence sequence will be automatically blended, additively and saved to the specified drive. This may improve the illusion of coherence in some workflows as the content and context of an image will not change or diffuse during frames that were skipped. Higher values of 4-8 cadence will skip over a larger amount of frames and only diffuse theNth” frame as set by the diffusion_cadence value. This may produce more continuity in an animation, at the cost of little opportunity to add more diffused content. In extreme examples, motion within a frame will fail to produce diverse prompt context, and the space will be filled with lines or approximations of content - resulting in unexpected animation patterns and artifacts. Video Input & Interpolation modes are not affected by diffusion_cadence

 

3D Depth Warping:

FOV (field of view/vision) in deforum, will give specific instructions as to how the translation_z value affects the canvas. Range is -180 to +180. The value follows the inverse square law of a curve in such a way that 0 FOV is undefined and will produce a blank image output. A FOV of 180 will flatten and place the canvas plane in line with the view, causing no motion in the Z direction. Negative values of FOV will cause the translation_z instructions to invert, moving in an opposite direction to the Z plane, while retaining other normal functions.A value of 30 fov is default whereas a value of 100 would cause transition in the Z direction to be more smooth and slow. Each type of art and context will benefit differently from different FOV values. (ex. “Still-life photo of an apple” will react differently than “A large room with plants”)

 

FOV also lends instruction as to how a midas depth map is interpreted. The depth map (a greyscale image) will have its range of pixel values stretched or compressed in accordance with the FOV in such a fashion that the illusion of 3D is more pronounced at lower FOV values, and more shallow at values closer to 180. At full FOV of 180, no depth is perceived, as the midas depth map has been compressed to a single value range



In image processing, bicubic interpolation is often chosen over bilinear or nearest-neighbor interpolation in image resampling, when speed is not an issue. In contrast to bilinear interpolation, which only takes 4 pixels (2×2) into account, bicubic interpolation considers 16 pixels (4×4). Images resampled with bicubic interpolation are smoother and have fewer interpolation artifacts.

 

Video Input:

When using video_input mode, the run will be instructed to write video frames to the drive. If you’ve already populated the frames needed, uncheck this box to skip past redundant extraction, and immediately start the render. If you have not extracted frames, you must run at least once with this box checked to write the necessary frames.

 

Interpolation:

 

Resume Animation:

Currently only available in 2D & 3D mode, the timestamp is saved as the settings .txt file name as well as images produced during your previous run. The format follows:

yyyymmddhhmmss - a timestamp of when the run was started to diffuse.

Prompts

Ein Bild, das Text, Screenshot, Schrift enthält.

Automatisch generierte Beschreibung

In the above example, we have two groupings of prompts: the still frames *prompts* on top, and the animation_prompts below. During the “NONE” animation mode, the diffusion will look to the top group of prompts to produce images. In all other modes, (2D, 3D etc) the diffusion will reference the second lower group of prompts.

 

Careful attention to the syntax of these prompts is critical to be able to run the diffusion.

For still frame image output, numbers are not to be placed in front of the prompt, since noscheduleis expected during a batch of images. The above prompts will produce and display a forest image and a separate image of a woman, as the outputs

 

During 2D//3D animation runs, the lower group with prompt numbering will be referenced as specified. In the example above, we start at frame 0: - an apple image is produced. As the frames progress, it remains with an apple output until frame 20 occurs, at which the diffusion will now be directed to start including a banana as the main subject, eventually replacing the now no longer referenced apple from previous

 

Interpolation mode, however, will “tweenthe prompts in such a way that firstly, 1 image each is produced from the list of prompts. An apple, banana, coconut, and a durian fruit will be drawn. Then the diffusion begins to draw frames that should exist between the prompts, making hybrids of apples and bananas - then proceeding to fill in the gap between bananas and coconuts, finally resolving and stopping on the last image of the durian, as its destination. (remember that this exclusive mode ignores max_frames and draws the interpolate_key_frame/x_frame schedule instead

 

Many resources exist for the context of what a prompt should include. It is up to YOU, the dreamer, to select items you feel belong in your art. Currently, prompts weights are not implemented yet in deforum, however following a template should yield fair results:

            [Medium]          [Subject]         [Artist]              [Details]                 [Repository]

Ex. “A Sculpture of a Purple Fox by Alex Grey, with tiny ornaments, popular on CGSociety”,




Run

 

Load Settings:

Image settings:

Dimensions in output must be multiples of 64 pixels otherwise, the resolution will be rounded down to the nearest compatible value. Proper values 128, 192, 256, 320, 384, 448, 512, 576, 640, 704, 768, 832, 896, 960, 1024. Values above these recommended settings are possible, yet may yield OOM (out of memory) issues, as well as improper midas calculations. The model was trained on a 512x512 dataset, and therefore must extend its diffusion outside of thisfootprint” to cover the canvas size. A wide landscape image may produce 2 trees side-by-side as a result, or perhaps 2 moons on either side of the sky. A tall portrait image may produce faces that are stacked instead of centered

 

Sampling Settings:

Stable Diffusion outputs are deterministic, meaning you can recreate images using the exact same settings and seed number. Choosing a seed number of -1 tells the code to pick a random number to use as the seed. When a random seed is chosen, it is printed to the notebook and saved in the image settings .txt file.

Considering that during one frame, a model will attempt to reach its prompt by the final step in that frame. By adding more steps, the frame is sliced into smaller increments as the model approaches completion. Higher steps will add more defining features to an output at the cost of time. Lower values will cause the model to rush towards its goal, providing vague attempts at your prompt. Beyond a certain value, if the model has achieved its prompt, further steps will have very little impact on final output, yet time will still be a wasted resource. Some prompts also require fewer steps to achieve a desirable acceptable output.

 

During 2D & 3D animation modes, coherence is important to produce continuity of motion during video playback. The value under Motion Parameters, “strength_scheduleachieves this coherence by utilizing a proportion of the previous frame, into the current diffusion. This proportion is a scale of 0 - 1.0 , with 0 meaning there’s no cohesion whatsoever, and a brand new unrelated image will be diffused. A value of 1.0 means ALL of the previous frame will be utilized for the next, and no diffusion is needed. Since this relationship of previous frame to new diffusion consists of steps diffused previously, a formula was created to compensate for the remaining steps to justify the difference. That formula is as such:
Target Steps - (strength_schedule * Target Steps

 

Your first frame will, however, yield all of the steps - as the formula will be in effect afterwards.



A normal range of 7-10 is appropriate for most scenes, however some styles and art will require more extreme values. At scale values below 3, the model will loosely impose a prompt with many areas skipped and left uninteresting or simply grayed-out. Values higher than 25 may over enforce a prompt causing extreme colors of over saturation, artifacts and unbalanced details. For some use-cases this might be a desirable effect. During some animation modes, having a scale that is too high, may trend color into a direction that causes bias and overexposed output.



 

Save & Display Settings:

 

Prompt Settings:

 

Batch Settings:

            Iter = incremental change (ex 77, 78, 79 ,80, 81, 82, 83…)

            Fixed = no change in seed (ex 33, 33, 33, 33, 33, 33…)

            Random = random seed (ex 472, 12, 927812, 8001, 724…)

Note: seed -1 will choose a random starting point, following the seed behavior thereafter

Troubleshoot: a “fixedseed in 2D/3D mode will overbloom your output. Switch to “iter

 

Init_Settings:

Note: even with use_init unchecked, video input is still affected.

Note: in ‘noneanimation mode, a folder of images may be referenced here.

 

Video




Parameters

In deforum, any parameter that accepts a string format of instructions (type = `string`) can be altered using a math expression, a schedule, or a combination of both. These parameters are typically denoted with 0:(0) where the preceding number is the frame, and the parentheses number is the value to be enforced during the designated frame. In the example of 0:(0), the render will reference frame0 and assign 0.0 as its value indefinitely unless instructed otherwise.

Parameters that are controlled by strings are as follows: angle, zoom, translations_xyz, rotations_3D_xyz, perspective_flips_theta,phi,gamma,fv , noise_schdule, strength_schedule, and contrast_schdule.

Scheduled values will “tweenlinearly between two instructional elements in a string. In the example of 0:(-2), 100:(4) The render will start at frame0 with a value of -2 and rise up over time, increasing its value to 4 by the time it reaches frame100. During frame50 of that render, we would observe a value of 1.0 being enforced, since the midpoint between frame0 and frame100 falls on the line drawn between the two values at 1.0

When using math expressions however, thetweeningfollows an approximation of values within elements of the string in such a way that a curve is drawn between values. Consider the following example: 0:(sin(t)), 100:(4) The function at frame 0 in this case is a sine wave, where “t” represents the frame number. The value at frame 0 will start to calculate the sin(t) to produce its initial value, and quickly fluctuate causing peaks and valleys, while it slowly climbs to a constant value of 4 by frame100. We can observe an effect of the sine wave starting at full strength, and finally losing all amplitude 100 frames later - a “ripple” effect.

If a math expression is used as the sole element of a string, it will indefinitely calculate and produce its value for as long as it is defined, without interruption. If at any point, a parameter falls out of the range of acceptable values, the render will adhere to the next available calculation of that function. (ex. A value approaches infinity, asymptotic or undefined) This can sometimes be a desired effect if a “pulsingorsawtoothfunction is to be achieved.





MATH expressions

Many combinations and complex functions can be expressed during an animation schedule to achieve patterns and motion that would otherwise take extremely long strings of manual information to achieve. Consider a sine function, where previously, we would have to enter in each frame’s respective value to simulate a waving pattern. The longer our animation, the more frame instructions we’d have to manually enter. Now, with MATH functions, we can populate a never-ending list of instructions simply contained in one expression. The method that we use is to reference the variable “t”. When we use that variable in our math statements, a calculation is performed such that “t” = the current frame number. Since the frame number steadily increases in increments of +1, we can now define an “x axis”. With that aspect in place, we can use “t” to alter the value across the “y axis” in sequence. A frames (time) progresses forward, the MATHs performed on “t” will allow us to control what values are to be enforced at that exact snapshot in time. In the default notebook of deforumV05, thetranslation_xschedule is defined as: 0:(10*sin(2*3.14*t/10)) We can see “t” along with a sine wave (sin) being performed. This will cause the image to translate left and right over time. We will examine in more detail how this function works.

 

Anatomy MATH expressions

We saw the expression 0:(10*sin(2*3.14*t/10)) being used in the default notebook of deforumV05. Let’s observe how it isdrivingour parameter. When we use the most simple of math expressions 0:(t) we define the value at any frame to be equal to its frame number. However, this value will soon rise off into unusable values above any recommended range within the animation parameters. At frame0, we start at 0, by frame1, we’re at 1, and by frame 200, we’re at 200 - so on and so forth. So a method ofcontainingthis value must be expressed somehow, as to prevent the number from flying off into infinity. The 2 best methods are sine/cosine functions as well as modulus functions (more info on modulus later). 

            So, in our example, we can see a “sin( )” being used. If we were to take the sine of our frame number, or “sin(t)”, we’d generate a wave shape. The value would swing up and down quickly as each frame was calculated.

While this does keep our value from ever increasing - it is not enough to control our parameter in a realistic way. A simple sine wave is too fast, shallow and rapid. So our example includes more expressions being performed. We see that a familiar value 3.14 is multiplied by “t”. This causes the period of our sine wave to fall on integers (approximately) at its wavelength. More specifically, this wavelength is 2. So our example goes further to multiply that variable by 2 also. When we take the sin(2*3.14*t) , we yield a wave that has a period of 1 and an amplitude of 1 (it peaks and valleys between -1 and 1). All that is left is to add math that will control how high the value should bounce(amplitude), and how often(frequency). So our example finally multiplies the whole expression by 10, and also divides “t” by 10. This results in a wave that will alternate between +10 and -10 and repeat every 10 frames.  → cont on next page



->cont.

 

But what if we wanted even MORE control. We notice our example suffers the property of always passing through 0 as its baseline - but what if we wanted the baseline to start at -3? We just need to take the whole expression, and subtract 3 from it, and our new baseline is established.  0:(10*sin(2*3.14*t/10)-3) Now our wave bounces between 7 and -13, keeping its amplitude and frequency intact. More functionality can be added as we build our expression, including exponents, cosine properties, and negative amplitudes.

 

A recap of our examples anatomy: 0:(10*sin(2*3.14*t/10))

0: = the current frame instruction

10 = the amplitude orheight

t = frame count

10 = the frequency or wavelength

Advanced Expressions

When constructing a complex schedule of effects during your animation, more control and special techniques will yield a better dynamic result. Let’s examine a specific use case. The artist wants to use a constant value of 0.8 as their strength schedule. However, they wish they could have more detail appear in their animation. A value of 0.45 is great for adding new enriched content to a scene, but it causes very little coherency. The artist decides that they should only introduce the value of 0.45 periodically about every 25 frames, yet keep it 0.8 for most of the sequence. How should the artist express this using MATHs?

Let’s observe the following solution, then discuss.

0:(-0.35*(cos(3.141*t/25)**100)+0.8)
A massively powerful function, with a simple elegance to it. Our artist uses this function in their example to achieve the desired result. At frame 0 and all frames after, a value is being calculated. In this expression we’re selecting a cosine function (cos) to allow our wave to have small periodic dips instead of peaks. The double asterisk acts as an exponent function and brings the cosine to the 100th power, tightening the dips into small indents along the timeline. The addition of +0.8 sets the baseline at 0.8 which the artist agreed was desirable for the animation, and starts the function with -0.35 knowing that it will dip below the established baseline from 0.8 down to 0.45 as expected. An approximation of pi is being used again (3.141) to align the frames to integers, and t is being divided by 25 to enforce the dip to occur only at frames that are multiples of 25. Our artist has achieved the schedule using one expression that will be calculated for the duration of the animation frames.

 

Remember that expressions can be changed along the schedule to “tweenalong the frames.
0:(10*sin(2*3.14*t/10)), 50:(20*sin(4*3.14*t/40)), 100:(cos(t/10)) is an acceptable format.

Another useful tool is the modulus function. Represented by “%” is typically used to calculate the remainder of a function. In deforum, we use modulus to affect “t” frame count as a repeating limiter. Consider the following syntax:

translate_3D_z:     0:(0.375*(t%5)+15)

If “t” is the frame count, it would increase indefinitely, however in our example, we’ve set the modulus to 5. This means as the frame rises (01,2,3,4,5,6,7,8… etc) the value of “t” will repeat a sequence of 0,1,2,3,4,5,0,1,2,3,4,5,0,1… etc, without ever increasing over 5. This graphically produces a sawtooth wave. In order to bend the “blades” of the sawtooth to stretch over time, we multiply by 0.375. This acts as the slope of each line. A multiplier of 1 would yield a 45° line. Higher multipliers will increase the frequency even further, while numbers closer to 0 will lay the line near flat. Since we’re controlling the Z translate in 3D mode, we want our baseline to be at 15, hence our addition of it at the end of the syntax. The overall effect of this parameter causes our animation to consistently zoom forward, yet with pulses, similar to the perspective of nodding your head to music while riding in a car.

 

Many more clever approaches can be used to create elaborate functions and animations, as well as just simplifying the instruction of long frame counts. There are tools that exist such as graphing calculators to better help envision what a function would look like linearly. This can be simulated by the format of y = x instead of deforum’s 0: (t) where “y” is the frame, and “x” is t

This calculator can be used to solve similar functions, yet some syntax may vary.

Desmos | Graphing Calculator

 

We encourage the users to share their experiences with formulas and expressions, since there will be endless discoveries with how MATHs can work in unique applications





Gradient Conditioning in Deforum

 

Having trouble finding good gradient conditioning settings? Try loading one of the settings files from thesettingsfolder with override_settings_with_file.

Ein Bild, das Text, Schrift, Screenshot enthält.

Automatisch generierte Beschreibung

 

 

Exposure/Contrast Conditional Settings

mean_scale 

Pushes the pixel values towards middle gray

Samples have values between -1 and 1. Mean_scale guides pixels toward 0, hence gray

 

var_scale

Pushes pixels towards lower variance

 

exposure_scale 

Targeted mean loss. Uses exposure_target variable. 

When exposure_target == 0, exposure_scale is equivalent to mean_scale

Use it to compensate for very high cfg scale, for example.

 

exposure_target [ -1.0 to 1.0 ]

Used with exposure_scale

Negative values push towards a darker image, positive values toward a brighter image

Color Match Conditional Settings

colormatch_scale

Guides the image towards a color palette.

Palette is extracted from an input image colormatch_image

Best matching results when decode_method == “autoencoder

Works best when gradient_wrt == "x0_pred”, but will still work well with “x”

 

colormatch_image

Image to extract a color palette from

colormatch_n_colors

Number of colors in the palette

ignore_sat_weight (default None)

Amount to ignore the saturation when using colormatch_scale

High ignore_sat_weight will allow a higher colormatch_scale without making the colors look overblown

CLIP\Aesthetics Conditional Settings

clip_name ['ViT-L/14', 'ViT-L/14@336px', 'ViT-B/16', 'ViT-B/32']

Used with aesthetics_scale>0 and clip_scale>0

Recommended to use with gradient_wrt=’x’

 

aesthetics_scale

Recommended to use with gradient_wrt=’x’

 

clip_scale

CLIP Guidance!

 

cutn

Used when clip_scale > 0

Number of times CLIP views the image for each step

This variable only matters under these conditions:

Extremely large images bigger than 2688 x 2688px, or

decode_method = autoencoder

This is because clip looks at the decoded image, which is small when decode_method=linear

 

cut_pow

Used when clip_scale > 0

Affects the size of the view window when CLIP sees the image

When cut_pow is very small, the view window is large, and vise versa

Only matters for extremely large images or autoencoder decode_method, see cutn

Other Conditional Settings

init_mse_scale

init_mse_image

Guide the image to match init_mse_image pixel-by-pixel

Conditional Gradient Settings

gradient_wrt ["x", "x0_pred"]

“x0_pred” 

·         Applies the gradient calculation only to the denoised image at each step

·         Much faster

·         May require higher *_scale values

·         Does not work very well with clip_scale or aesthetic_scale guidance

“x”

·         Applies the gradient calculation through the unet

·         Much slower

·         Takes the content of the image more into consideration because the grad uses the unet

 

gradient_add_to ["cond", "uncond", "both"]

Gradient is added to the cfg cond or uncond or both

"cond" or "uncond" only applies if cond_uncond_sync == False

 

decode_method ["autoencoder","linear"]

Gradient conditioning decodes the latent at every step.

·         Linear is a shortcut that quickly translates the latent to a small (default 64x64) image. linear decoding is not implemented for every model, so if your results with conditioning look noisy, tryautoencoder”.

·         Autoencoder is much slower, but higher quality. It uses the same decoding method the model already uses after the last step.

 

Clamping

Keeps the gradient from getting too big and making the image all washed out.

If you find that a conditioning setting isn’t changing an image much no matter how high the conditioning scale, try increaseing one of these clamp numbers or using grad_threshold_type = "dynamic".

 

grad_threshold_type  ["dynamic", "static", "mean", "schedule"]

The final gradient is clamped before it’s added to the image. This is to avoid the overexposed look.

·         dynamic: thresholding from the imagen paper (May 2022)

·         static: simple clamping

·         mean: rescales out of range values based on the mean

·         schedule: uses clamp_start and clamp_stop to set a threshold value that changes linearly at each step. Like “mean” but with changing clamp_grad_threshold over time

 

clamp_grad_threshold

used with grad_threshold_typesstatic” “mean” and “dynamic” 

clamp_start

used with grad_threshold_typedynamic

clamp_grad_threshold value at the first step, linearly changes to clamp_stop at the last step

clamp_stop

used with grad_threshold_typedynamic

clamp_grad_threshold value at the last step, linearly changes from clamp_start from the first step

 

Inject Timing

grad_inject_timing

            Applies the gradient only at a limited number of steps

            Interpreted differently based on the type of grad_inject_timing

            if grad_inject_timing is:

·         int: compute every inject_timing steps, eg. 2 mean apply grad every other step

·         list of floats: compute on these decimal fraction steps (eg, [0.5, 1.0] for 50 steps would be at steps 25 and 50)

·         list of ints: compute on these steps

·         None: Compute on all steps

This can be used to generate an image faster, or apply conditioning to different sections of the gen. For example, overall composition is usually decided in the first 20% of steps, so if colormatch conditioning is only applied in the last half of steps it will result in practically the same image but with different colors.

 

Speed vs VRAM Settings

cond_uncond_sync

Syncs cond an uncond so they are generated in parallel. 

Faster if True, Slower if False

False uses less vram

gradient_add_to = "cond" or "uncond" only applies if cond_uncond_sync == False