# 2D Surface Deformation

Increases in the performance of graphics accelerators free additional CPU cycles that can be used for real-time physical world modeling. Modifying the geometry of an object is more effective than mapping a new or animated texture, because correctly deformed objects will look right from any angle and in any lighting conditions. Max I. Fomitchev discusses the implementation of deformable surfaces for real-time 3D games that simulate realistic environments.

February 16, 2000

Author: by Max Fomitchev

The rapid growth of CPU power opens possibilities for development of 3D games that feature realistic environment. The increase in the performance of graphics accelerators frees additional CPU cycles that can be used for real-time physical world modeling.

Game realism can be improved by simulating deformable objects or deformable surfaces. The list of common deformable objects and surfaces include:

water (waves, streams, waterfalls), wind and splashes simulation;

cloth (capes, flags, curtains), wind and wrinkle simulation;

soil (mud, clay), imprint and crater simulation;

vegetation (grass, trees), wind simulation.

Simulation of deformable objects results in unique experience of realistic never-the-same environment. Wind that causes waves and makes grass bend; objects that produce ripples when dropped in water; soil covered with footprints and explosion craters.

The user receives much richer sense of reality when the geometry of an object is modified rather than when anew or animated texture is mapped on to it because correctly deformed objects will look right from any angle and in any lighting conditions. Also deformed objects can block or reveal other objects behind them.

The implementation of deformable surfaces discussed in this article is intended for real-time 3D games that simulate realistic environment. The algorithm is optimized for AMD 3DNow! technology. Different implementations and their performance are discussed.

Physics of 2D surface deformation

Consider a grid that represents a simple 2D surface as shown on figure 1. Each vertex on the surface is connected with 6 neighbors to the north, east, southeast, south, west and northwest. This interconnection defines local topology. The neighboring vertices interact with each other by means of elastic forces (i.e. interconnections between vertices, depicted as solid lines on the figures, act as coil springs). In the initial (relaxed) state vertices are evenly spaced, and vectors S_{RelaxE}, S_{RelaxSE}, S_{RelaxS} specify distances between neighbors. In the relaxed state elastic forces between vertices are equal to zero.

When an external force (F_{ext}) is applied to a vertex on the surface at time t_{0}=Dt, the vertex starts moving and displaces to a new location time t_{1}=2Dt. This displacement produces elastic forces between local topological neighbors. The resulted elastic forces counter the displaced vertex motion and try to return the vertex to its original location. However according to the 3^{rd} Newton’s law the same forces but with opposite direction act upon neighboring vertices. So at time t_{2}=3Dt the neighbors get displaced, and entire cluster of vertices comes in motion. With time more and more vertices get involved in motion, representing wave propagation across the surface.

The 2D surface is characterized by the following parameters:

local topology;

global topology (i.e. plane, cylinder, sphere, etc.)

vertex mass m;

relaxation distances (S

_{RelaxE}, S_{RelaxSE}, S_{RelaxS});elasticity model (linear, exponential, etc.);

elasticity constant (elasticity tensor E for anisotropic surfaces) E, E >= 0;

damping constant (damping tensor D for anisotropic surfaces) d, 0 <= d <= 1.

Greater elasticity corresponds to faster deformation and stiffer body, while low elasticity corresponds to softer bodies.

Smaller damping represents faster relaxation or faster solidification for ductile surfaces.

This paper deals with plane isotropic surfaces combined from identical particles with mass m with six-neighbor local topology and linear elasticity model.

State of a vertex on the surface is characterized by coordinate vector S, velocity V, total internal (elastic) force F_{Total} and external force F_{Ext}.

The surface deformation is calculated for each vertex in the following way. First vertex displacement relative to the east, south and southeast neighbors is calculated:

DS_{E} = S_{E} - S - S_{RelaxE}

DS_{S} = S_{S} - S - S_{RelaxS}

DS_{SE} = S_{SE} - S - S_{RelaxSE}

Then elastic forces between local neighbors are evaluated:

F_{E} = Elasticity · DS_{E}

F_{S} = Elasticity · DS_{S}

F_{SE} = Elasticity · DS_{SE}

Total force acting on the vertex under consideration is:

F_{Total} = F_{E} + F_{S} + F_{SE}

According to the 3^{rd} Newton’s Law the vertex under consideration contributes to the total force acting on its neighbors:

F_{TotalE} = F_{TotalE} - F_{E}

F_{TotalS} = F_{TotalS} - F_{S}

F_{TotalSE} = F_{TotalSE} - F_{SE}

Notice that north, west and northwest elastic forces are not calculated directly. The total force for the vertex F_{Total} is updated automatically when north, west and northwest vertices are processed (3^{rd} Newton’s law).

Finally, when all vertices on the surface are processed (i.e. all internal forces evaluated) new coordinates are calculated for each vertex.

Acceleration is calculated from the 2^{nd} Newton’s law:

a = (F_{Total} + F_{ext})/m

The rest is discrete numerical integration:

DV = a · Dt

V = V + DV

DS = V · Dt

S = S + DS

Lastly vertice velocity is adjusted in order to account for damping:

V = V · d

Effectors

Normally surface does not deform by itself, but rather deforms due to the influence of an external object – effector. Any rigid object that collides with a deformable surface is an effector. Wind can be modeled as a stream of infinitely small rigid particles with Brownian velocities. Spatial (3D) effectors can be modeled as a set of infinitely small particles corresponding to the object’s vertices.

Particle effector is show on figure 3. It is characterized by influence radius R_{Effector} and strength.

Surface vertices located further than R_{Effector} from the center of effector remain unaffected. The effector influence results in external forces acting on in-range vertices. Typically the effector-induced force is inversely proportional to the distance between vertex and the center of effector:

DS = S_{Effector} – S

F_{new} = EffectorStrength · DS/|DS|, |DS| < R_{Effector}

F_{ext} = F_{ext} + F_{new}

When simulating more or less complete physical system one can account for surface resistance by updating total force acting on effector:

F_{TotalEffector} = F_{TotalEffector} – F_{new}

Models of common surfaces

Based on the described 2D surface model different common surfaces can be represented by varying surface parameters:

Water (elastic): elasticity = 1, damping ~ 0.995.

Cloth (elastic): elasticity ~ 0.9, damping ~ 0.9.

Rubber (elastic): elasticity > 1, damping < 1.

Setting damping constant to 1 will produce infinite oscillations generated by a single effector. Deformation pattern will never be the same. This might be a good idea for infinite water waves modeling for seascapes.

Typically a deformable surface has some fixed vertices, i.e. vertices that can not be displaced from their original location. Such fixed vertices correspond to surface edges or to attachment edges / points. Seashore and curtains are examples of fixed edges and fixed attachment points respectively.

Trees and vegetation can be modeled as a hierarchy of surfaces with cylinder / cone global topology and stiff rubber (elasticity > 1, damping < 1) properties, where individual cylinders correspond to branches, stems or trunks. Root and attachment edges must be marked as fixed points. Particle effectors that simulate wind can be applied to such ‘rubber’ trees to produce realistic grass fields or woods experience.

Algorithm implementation

Surface deformation algorithm can be split on two parts:

Initial setup - to collision with effector (external force calculation);

Iterative surface deformation calculation (new vertex position and normal calculation).

The surface deformation algorithm is inherently suitable for SIMD processing and can be implemented in several ways. But before the discussion of the implementation consider the following numerical optimizations or "cheating opportunities" that remove redundant division and multiplication operations:

1/m can be stored to avoid division in a = F/m;

m = 1 can be used to avoid multiplication by 1/m (a · F);

Elasticity = 1 can be used to avoid multiplication in F = Elasticity · DS (F º DS);

dt = 1 can be used to avoid multiplication in DV = a · Dt and DS = V · Dt (DV º a and DS º V);

Uniform relaxation length S

_{RelaxE}X = S_{RelaxS}Y = S_{RelaxSE}X = S_{RelaxSE}Y can be used to maximize register utilization

C code

When targeting to Direct3D the easiest way is to implement surface deformation using D3DVECTOR structure and AoS paradigm. All vectors (forces, displacements, velocities etc.) can be defined as arrays of D3DVECTOR. However keeping in mind SIMD-optimization and porting to 3DNow! it is necessary to choose an alternative path and SoA paradigm.

C-implementation is straightforward. The resulting objective code produced by Visual C (optimized for performance on ‘blended’ CPU) yields truly amazing performance on Athlon CPU. The performance comes from the unique FPU design that features fully pipelined out-of-order execution of floating-point operations. Most common FP-instructions, such as fadd, fsub, fmul take only one cycle to execute. Some instructions such as fxch are virtually free and typically consume zero CPU cycles. Moreover, fadd / fsub, fmul and fst instructions can execute in parallel. The C implementation in Listing1. takes about 2M cycles to process 10,000 vertices on Athlon-600 MHz.

Listing 1. C Implementation of Surface Deformation

void SURFACE::Update(float dt)

{

// Reset Total Force

memset(TotalForceX, 0, NumVertex*sizeof(float));

memset(TotalForceY, 0, NumVertex*sizeof(float));

memset(TotalForceZ, 0, NumVertex*sizeof(float));

// For each vertex

for ( int i = 0, k = 0; i < NumRows - 1; i++ )

{

// Clear first vertex in the next row

TotalForceX[(i + 1)*NumCols] = 0.0f;

TotalForceY[(i + 1)*NumCols] = 0.0f;

TotalForceZ[(i + 1)*NumCols] = 0.0f;

for ( int j = 0; j < NumCols - 1; j++, k++ )

{

// Distance between C-E vertices

float

dx = SX[k + 1] - SX[k],

dy = SY[k + 1] - SY[k],

dz = SZ[k + 1] - SZ[k];

// Less relaxation length

dx -= SRelaxE.x;

dy -= SRelaxE.y;

dz -= SRelaxE.z;

// C-E elastic force

dx *= Elasticity,

dy *= Elasticity,}

dz *= Elasticity;

// Total force for C-vertex

TotalForceX[k] += dx;

TotalForceY[k] += dy;

TotalForceZ[k] += dz;

// Total force for E-vertex (3rd Newton's Law)

TotalForceX[k + 1] -= dx;

TotalForceY[k + 1] -= dy;

TotalForceZ[k + 1] -= dz;

// Distance between C-S vertices

dx = SX[k + NumCols] - SX[k];

dy = SY[k + NumCols] - SY[k];

dz = SZ[k + NumCols] - SZ[k];

// Less relaxation length

dx -= SRelaxS.x;

dy -= SRelaxS.y;

dz -= SRelaxS.z;

// C-S elastic force

dx *= Elasticity;

dy *= Elasticity;

dz *= Elasticity;

// Total force for C-vertex

TotalForceX[k] += dx;

TotalForceY[k] += dy;

TotalForceZ[k] += dz;

// Total force for S-vertex (3rd Newton's Law)

TotalForceX[k + NumCols] -= dx;

TotalForceY[k + NumCols] -= dy;

TotalForceZ[k + NumCols] -= dz;

// Distance between C-SE vertices

dx = SX[k + NumCols + 1] - SX[k];

dy = SY[k + NumCols + 1] - SY[k];

dz = SZ[k + NumCols + 1] - SZ[k];

// Less relaxation length

dx -= SRelaxSE.x;

dy -= SRelaxSE.y;

dz -= SRelaxSE.z;

// C-SE elastic force

dx *= Elasticity;

dy *= Elasticity;

dz *= Elasticity;

// Total force for C-vertex

TotalForceX[k] += dx;

TotalForceY[k] += dy;

TotalForceZ[k] += dz;

// Total force for SE-vertex (3rd Newton's Law)

TotalForceX[k + NumCols + 1] -= dx;

TotalForceY[k + NumCols + 1] -= dy;

TotalForceZ[k + NumCols + 1] -= dz;

}

k++;

}

float Massdt = Mass*dt;

// For each vertex add external force

for ( k = 0; k < NumVertex; k++ )

{

TotalForceX[k] += ExternalForceX[k];

ExternalForceX[k] = 0.0f;

TotalForceY[k] += ExternalForceY[k];

ExternalForceY[k] = 0.0f;

TotalForceZ[k] += ExternalForceZ[k];

ExternalForceZ[k] = 0.0f;

// For each vertex that is not fixed in space

if ( Fixed[k] )

{

// Calculate velocity: V = V + A*dt, A = F/m

VX[k] += TotalForceX[k]*Massdt;

VY[k] += TotalForceY[k]*Massdt;

VZ[k] += TotalForceZ[k]*Massdt;

// Calculate new vertex position: S = S + V*dt

SX[k] += VX[k]*dt;

SY[k] += VY[k]*dt;

SZ[k] += VZ[k]*dt;

// Account for damping factor: V = V*damp

VX[k] *= Damping;

VY[k] *= Damping;

VZ[k] *= Damping;

// Set normals to macth velocity

Normal[k].x = VX[k];

Normal[k].y = VY[k];

Normal[k].z = VZ[k];

}

}

}

3DNow! code

The hand-optimized assembler code that uses 3DNow! instruction set (see AMD SDK amd3dx.h header file) can be fairly easy produced from SoA C-implementation. Vertices are processed in pairs, and the code for x, y and z-components is the same except for displacement on memory operations.

Athlon CPU is less sensitive for instruction scheduling mostly due to the large (72 mops) reorder buffer. However it still a good idea to schedule instructions to avoid possible pipeline stalls due to latencies of both 3DNow! pipelines. In the code fragment below 3DNow! instructions are grouped in packs of 3. Each instruction processes the same vector component (x, y or z) for all three topological neighbors: east, south and southeast. This grouping of instructions hides most of the latency of 3DNow! pipeline since the result of the first instruction is not referenced until 3 cycles later.

The code suffers a little from misaligned data access. Misaligned access comes from the necessity to process east and southeast vertices. However the penalty for misaligned access is only one cycle.

The code heavily uses load-execute operations to minimize register pressure and promote code density. Also branch for fixed vertices was eliminated and replaced by MMX pand instruction that resets total force to zero for fixed vertices.

For storing normal data both halves of MMX registers must be written to memory but to different locations. pswapd instruction that is an Athlon extension 3DNow! instruction set was used to swap upper and lower 32-bit halves of a MMX register. The pswapd instruction offers better performance than punpckhdq MMX instruction since it doesn’t use MMX shifter unit.

PREFETCH instructions were used to optimize memory throughput.

The 3DNow! assembler code takes about 1.5M cycles to process 10,000 vertices on Athlon-600 MHz. The 3DNow! implementation in Listing 2 is ~40% faster than x87 code.

Max I. Fomitchev is a full-time consultant at Systems&Programming Resources Inc., Tulsa, OK. He has a MCS and is getting his Ph.D. in March. He has recently released Gems 3D shareware 3D puzzle game (www.Gems3D.com), and he also does contract work for AMD among other things. Specializing in code optimization, his interests include code optimization, 3D graphics, physics simulation, compiler design and ultrasound imaging. Check out his website at www.webzone.net/maxf.

Listing 2. 3DNow! Implementation of Surface Deformation

void SURFACE::Update3DNow(float dt)

{

float* ptrS, Massdt[2];

int i, j, k;

float

SREX[2] = {SRelaxE.x, SRelaxE.x},

SREY[2] = {SRelaxE.y, SRelaxE.y},

SREZ[2] = {SRelaxE.z, SRelaxE.z},

SRSX[2] = {SRelaxS.x, SRelaxS.x},

SRSY[2] = {SRelaxS.y, SRelaxS.y},

SRSZ[2] = {SRelaxS.z, SRelaxS.z},

SRSEX[2] = {SRelaxSE.x, SRelaxSE.x},

SRSEY[2] = {SRelaxSE.y, SRelaxSE.y},

SRSEZ[2] = {SRelaxSE.z, SRelaxSE.z};

// Reset Total Force

memset(TotalForceX, 0, NumVertex*sizeof(float));

memset(TotalForceY, 0, NumVertex*sizeof(float));

memset(TotalForceZ, 0, NumVertex*sizeof(float));

// Initial setup

__asm {

femms

mov ebx,this // ebx -> this

movd mm7,[ebx]this.Elasticity

punpckldq mm7,mm7 // mm7 = Elasticity

mov eax,[ebx]this.NumVertex

shl eax,2 // eax = NumVertex*sizeof(float)

mov ecx,[ebx]this.NumCols

shl ecx,2 // ecx = NumCols*sizeof(float)

mov edx,eax

shl edx,1 // edx = NumVertex*sizeof(float)*2

lea edi,[ebx]this.SX // [edi] -> SX/SY/SZ

lea esi,[ebx]this.TotalForceX // [esi] ->

TotalForceX/TotalForceY/TotalForceZ

mov i,eax

sub i,ecx

M:

mov j,ecx

sub j,8

M1:

prefetchm(edi,64)

prefetchm(esi,64)

// [edi] -> S

// [esi] -> TotalForce

// mm7 = Elasticity

// mm0 = dx/dy/dz

// Process all vertices X-coordinate

movq mm3,[edi] // mm3 = SX[k]

movq mm0,[edi + 4] // SX[k + 1]

movq mm1,[edi + ecx] // SX[k + NumCols]

movq mm2,[edi + ecx+4] // SX[k + NumCols + 1]

pfsub (mm0,mm3) // dx = SX[k + 1] - SX[k]

pfsub (mm1,mm3) // dx = SX[k + NumCols] - SX[k]

pfsub (mm2,mm3) // dx = SX[k + NumCols + 1] - SX[k]

movq mm4,SREX

movq mm5,SRSX

movq mm6,SRSEX

pfsub (mm0,mm4) // dxE -= SRelaxE.x

pfsub (mm1,mm5) // dxS -= SRelaxS.x

pfsub (mm2,mm6) // dxSE -= SRelaxSE.x

movq mm3,[esi + 4] // TotalForceX[k + 1]

pfmul (mm0,mm7) // dxE *= Elasticity

pfmul (mm1,mm7) // dxS *= Elasticity

pfmul (mm2,mm7) // dxSE *= Elasticity

pfsub (mm3,mm0) // TotalForceX[k + 1] - dxE

movq mm4,[esi + ecx] // TotalForceX[k + NumCols]

movq [esi + 4],mm3 // TotalForceX[k + 1] -= dxE

pfsub (mm4,mm1) // TotalForceX[k + NumCols] - dxS

movq [esi + ecx],mm4 // TotalForceX[k + NumCols] -= dxS

movq mm5,[esi + ecx+4] // TotalForceX[k + NumCols + 1]

pfsub (mm5,mm2) // TotalForceX[k + NumCols + 1] - dxSE

movq [esi + ecx+4],mm5 // TotalForceX[k + NumCols + 1] = dxSE

pfadd (mm0,esi) // TotalForceX[k] += dxE

pfadd (mm0,mm1) // TotalForceX[k] += dxS

pfadd (mm0,mm2) // TotalForceX[k] += dxSE

movq [esi],mm0 // TotalForceX[k] += dx

// Y-coordinate

add edi,eax

add esi,eax

movq mm3,[edi] // mm3 = SY[k]

movq mm0,[edi + 4] // SY[k + 1]

movq mm1,[edi + ecx] // SY[k + NumCols]

movq mm2,[edi + ecx+4] // SY[k + NumCols + 1]

pfsub (mm0,mm3) // dy = SY[k + 1] - SY[k]

pfsub (mm1,mm3) // dy = SY[k + NumCols] - SY[k]

pfsub (mm2,mm3) // dy = SY[k + NumCols + 1] - SY[k]

movq mm4,SREY

movq mm5,SRSY

movq mm6,SRSEY

pfsub (mm0,mm4) // dyE -= SRelaxE.y

pfsub (mm1,mm5) // dyS -= SRelaxS.y

pfsub (mm2,mm6) // dySE -= SRelaxSE.y

movq mm3,[esi + 4] // TotalForceY[k + 1]

pfmul (mm0,mm7) // dyE *= Elasticity

pfmul (mm1,mm7) // dyS *= Elasticity

pfmul (mm2,mm7) // dySE *= Elasticity

pfsub (mm3,mm0) // TotalForceY[k + 1] - dyE

movq mm4,[esi + ecx] // TotalForceY[k + NumCols]

movq [esi + 4],mm3 // TotalForceY[k + 1] -= dyE

pfsub (mm4,mm1) // TotalForceY[k + NumCols] - dyS

movq [esi + ecx],mm4 // TotalForceY[k + NumCols] -= dyS

movq mm5,[esi + ecx+4] // TotalForceY[k + NumCols + 1]

pfsub (mm5,mm2) // TotalForceY[k + NumCols + 1] - dySE

movq [esi + ecx+4],mm5 // TotalForceY[k + NumCols + 1] -= dySE

pfadd (mm0,esi) // TotalForceY[k] += dyE

pfadd (mm0,mm1) // TotalForceY[k] += dyS

pfadd (mm0,mm2) // TotalForceY[k] += dySE

movq [esi],mm0 // TotalForceY[k] += dy

// Z-coordinate

add edi,eax

add esi,eax

movq mm3,[edi] // mm3 = SZ[k]

movq mm0,[edi + 4] // SZ[k + 1]

movq mm1,[edi + ecx] // SZ[k + NumCols]

movq mm2,[edi + ecx+4] // SZ[k + NumCols + 1]

pfsub (mm0,mm3) // dz = SZ[k + 1] - SZ[k]

pfsub (mm1,mm3) // dz = SZ[k + NumCols] - SZ[k]

pfsub (mm2,mm3) // dz = SZ[k + NumCols + 1] - SZ[k]

movq mm4,SREZ

movq mm5,SRSZ

movq mm6,SRSEZ

pfsub (mm0,mm4) // dzE -= SRelaxE.z

pfsub (mm1,mm5) // dzS -= SRelaxS.z

pfsub (mm2,mm6) // dzSE -= SRelaxSE.z

movq mm3,[esi + 4] // TotalForceZ[k + 1]

pfmul (mm0,mm7) // dzE *= Elasticity

pfmul (mm1,mm7) // dzS *= Elasticity

pfmul (mm2,mm7) // dzSE *= Elasticity

pfsub (mm3,mm0) // TotalForceZ[k + 1] - dzE

movq mm4,[esi + ecx] // TotalForceZ[k + NumCols]

movq [esi + 4],mm3 // TotalForceZ[k + 1] -= dzE

pfsub (mm4,mm1) // TotalForceZ[k + NumCols] - dzS

movq [esi + ecx],mm4 // TotalForceZ[k + NumCols] -= dzS

movq mm5,[esi + ecx+4] // TotalForceZ[k + NumCols + 1]

pfsub (mm5,mm2) // TotalForceZ[k + NumCols + 1] - dzSE

movq [esi + ecx+4],mm5 // TotalForceZ[k + NumCols + 1] -= dzSE

pfadd (mm0,esi) // TotalForceZ[k] += dzE

pfadd (mm0,mm1) // TotalForceZ[k] += dzS

pfadd (mm0,mm2) // TotalForceZ[k] += dzSE

movq [esi],mm0 // TotalForceZ[k] += dz

// Update indices

sub edi,edx

sub esi,edx

add edi,8

add esi,8

sub j,8

jnz M1

add edi,8

add esi,8

sub i,ecx

jnz M

lea ecx,[ebx]this.SX

mov ptrS,ecx // ptrS = SX

movd mm5,[ebx]this.Mass

punpckldq mm5,mm5

movd mm6,dt // mm6 = dt

punpckldq mm6,mm6

pfmul (mm5,mm6) // mm5 = Massdt

movq Massdt,mm5

movd mm7,[ebx]this.Damping

punpckldq mm7,mm7 // mm7 = Damping

lea ecx,[ebx]this.Normal // [ecx] -> Normal

lea edx,[ebx]this.Fixed // [edx] -> Fixed

lea edi,[ebx]this.TotalForceX // [edi] -> TotalForce

lea esi,[ebx]this.ExternalForceX// [esi] -> ExternalForce

lea ebx,[ebx]this.VX // [ebx] -> V

mov k,eax

// For each vertex add external force

M2:

prefetchm(edi,64)

prefetchm(esi,64)

prefetchm(ebx,64)

prefetchm(ecx,64)

movq mm6,[edx] // Fixed[k]

movq mm0,[edi] // TotalForceX[k]

movq mm1,[edi + eax] // TotalForceY[k]

movq mm2,[edi + eax*2] // TotalForceZ[k]

movq mm3,[esi] // ExternalForceX[k]

movq mm4,[esi + eax] // ExternalForceY[k]

movq mm5,[esi + eax*2] // ExternalForceZ[k]

pfadd (mm0,mm3) // mm0 = TotalForceX[k] + ExternalForceX[k]

pfadd (mm1,mm4) // mm1 = TotalForceY[k] + ExternalForceY[k]

pfadd (mm2,mm5) // mm2 = TotalForceZ[k] + ExternalForceZ[k]

pxor mm3,mm3

pand mm0,mm6 // TotalForceX[k] & Fixed[k]

pand mm1,mm6 // TotalForceY[k] & Fixed[k]

pand mm2,mm6 // TotalForceZ[k] & Fixed[k]

movq mm6,Massdt

movq [esi],mm3 // ExternalForceX[k] = 0

movq [esi + eax],mm3 // ExternalForceY[k] = 0

movq [esi + eax*2],mm3 // ExternalForceZ[k] = 0

//maskmovq mm0,mm4

movq [edi],mm0 // TotalForceX[k] += ExternalForceX[k] & Fixed[k]

movq [edi + eax],mm1 // TotalForceY[k] += ExternalForceY[k] & Fixed[k]

movq [edi + eax*2],mm2 // TotalForceZ[k] += ExternalForceZ[k] & Fixed[k]

pfmul (mm0,mm6) // mm0 = TotalForceX[k]*Massdt

pfmul (mm1,mm6) // mm1 = TotalForceY[k]*Massdt

pfmul (mm2,mm6) // mm2 = TotalForceZ[k]*Massdt

movq mm6,dt

punpckldq mm6,mm6

// Calculate velocity: V = V + TotalForce*Massdt

pfadd (mm0,ebx) // VX[k] + TotalForceX[k]*Massdt

movq mm4,[ebx + eax] // VY[k]

movq mm5,[ebx + eax*2] // VZ[k]

pfadd (mm1,mm4) // VY[k] + TotalForceY[k]*Massdt

pfadd (mm2,mm5) // VZ[k] + TotalForceZ[k]*Massdt

movq mm3,mm0 // mm3 = VX[k] + TotalForceX[k]*Massdt

movq mm4,mm1 // mm3 = VY[k] + TotalForceY[k]*Massdt

movq mm5,mm2 // mm3 = VY[k] + TotalForceZ[k]*Massdt

pfmul (mm3,mm7) // VX[k]*Damping

pfmul (mm4,mm7) // VY[k]*Damping

pfmul (mm5,mm7) // VZ[k]*Damping

pfmul (mm0,mm6) // mm0 = (VX[k] + TotalForceX[k]*Massdt)*dt

pfmul (mm1,mm6) // mm1 = (VY[k] + TotalForceY[k]*Massdt)*dt

pfmul (mm2,mm6) // mm2 = (VY[k] + TotalForceZ[k]*Massdt)*dt

movq [ebx],mm3 // VX[k] += TotalForceX[k]*Massdt

movq [ebx + eax],mm4 // VY[k] += TotalForceY[k]*Massdt

movq [ebx + eax*2],mm5 // VZ[k] += TotalForceZ[k]*Massdt

movd [ecx],mm3 // Normal[k].x = VX[k]

movd [ecx + 4],mm4 // Normal[k].y = VY[k]

movd [ecx + 8],mm5 // Normal[k].z = VZ[k]

pswapd (mm3,mm3)

pswapd (mm4,mm4)

pswapd (mm5,mm5)

//punpckhdq mm3,mm3

movd [ecx + TYPE D3DVECTOR],mm3 // Normal[k].x = VX[k]

//punpckhdq mm4,mm4

movd [ecx + TYPE D3DVECTOR + 4],mm4 // Normal[k].y = VY[k]

//punpckhdq mm5,mm5

movd [ecx + TYPE D3DVECTOR + 8],mm5 // Normal[k].z = VZ[k]

// Calculate new vertex position: S = S + V*dt

// Save edi

mov i,edi

// [edi] -> S

mov edi,ptrS

movq mm3,[edi] // SX[k]

movq mm4,[edi + eax] // SY[k]

movq mm5,[edi + eax*2] // SZ[k]

pfadd (mm0,mm3) // SX[k] + VX[k]*dt

pfadd (mm1,mm4) // SY[k] + VY[k]*dt

pfadd (mm2,mm5) // SZ[k] + VZ[k]*dt

movq [edi],mm0 // SX[k] += VX[k]*dt

movq [edi + eax],mm1 // SY[k] += VY[k]*dt

movq [edi + eax*2],mm2 // SZ[k] += VZ[k]*dt

mov edi,i

// Update pointers

add ptrS,8

add edi,8

add esi,8

add ebx,8

add ecx,(TYPE D3DVECTOR)*2

add edx,8

sub k,8

jnz M2

femms

}

}

Read more about:

FeaturesYou May Also Like