sobota, 7. maj 2016

Thousand ships with real-time buoyancy



Above scene contains exactly 1000 ships, all of them run real-time buoyancy simulation with the same level of quality, there are no visual tricks used to fake physics, each object has its own physical state, it's real physics, at least its best approximation, as is always the case in interactive applications.


Your water sucks


I was a bit staggered when, just recently, I came to realize how un-obvious the problem of efficient ocean and buoyancy simulation seems to actually be to an average person. Don't get me wrong, the majority of the blame is on us, developers, for not educating people from the start about it, or in retrospect, for pushing the visuals ahead of physics. Although, if you read on, you'll discover that things are not as clear cut as I made them to be in the above sentence.
People can only get used to the visually appealing water in games like Far Cry, but not many actually notice that, at least for the majority of the time, the water is pretty much on the same level, without dynamically sized waves. In the above video, waves go from ~40 centimeters to their peak at about 50 meters, that's a factor of 125. There are a few different problems developers have to deal with when creating a fully functional dynamic ocean. First, let's look at the most obvious, visual aspect of it. When you have a simple water plane, like Far Cry, you make it as good as possible in its static water level and then move on. On the other hand, when wave amplitude (size) changes with time, you have a whole spectrum of water levels, and visuals should match its current state as close as possible. So that means foam generation, dynamic ripples on waves, subsurface scattering (SSS) on a larger scale etc. There is a lot more work associated with water like this, and not only it's harder to make it good all the time, it's also harder to make it look as good at the same level as similar water in a game with a static water level, because your shader (the thing that makes water look the way it does) is much more complex and eats up more GPU and CPU (more on that later) resources. From a purely technical point of view, disregarding artistic challenges with dynamic water, static water plane can always look better than a dynamic one, given the same water level. But, you may be surprized that this is just a tip of the iceberg.

Waves go from 40 centimeters to 50 meters, a factor of 125.

In fact, there is no fully functional ocean out of the box that you can just pull out in either Unreal Engine, Unity or Cry Engine. There are community efforts and there are market place assets that try to compensate for that with different levels of success. But at the end of the day, every game uses its own version of some kind of water/ocean. The important thing to remember here is that having a water shader or a fully functional ocean is worlds apart. Every game engine comes with some kind of simple water shader that you can use to make a basic ocean plane. Usually it will display the common tiling artifacts (repeating the same pattern over the distance), its water level is static and there is no interaction with the water itself, so no buoyancy, that means no ships or other floating objects, no throwing rocks at the water, no splashing, no under water ambiance with light scattering, no rain ripples etc. All of that has to be made additionally by hand. Another important thing to remember is that not every game actually needs interactive water, simply because the ocean, lake or a river may be unreachable or just part of a background. In a situation like this, the development of an ocean plane ends when the artist is happy with the way the ocean's surface looks, because that's the only experience a player will have with it. That becomes even easier if your game of a linear nature or is taking place in a static world, because you only have to tweak it for a certain light in a dynamic day and night cycle, you have to make it look good all around, day, night, sunrise, sunset and everything in between. And don't forget that for visuals, you may only need artists, but when adding physics, you have to add programmers, engineers into the mix, so more people collaborating to achieve a common goal, which means more overhead. Meaning much more time is spent on team communication, fixing and breaking each other's work.
Now let's look at the hidden beast, buoyancy. In physics, that's the force that pushes an object on a fluid upwards. Remember Archimedes' principle? It goes like this: "Any object, wholly or partially immersed in a fluid, is buoyed up by a force equal to the weight of the fluid displaced by the object." I'm not going to go into the math part, because that's not what we're doing here.

No buoyancy, that means no ships or other floating objects

But imagine you put a plastic ball in a big glass bowl of water. It floats, because of the buoyant force, its density is lower than that of water, so it stays over the water surface. This is a very simple scenario, the water surface is constant and there are no waves or other forces besides gravity and buoyant force (ignoring atmosphere pressure etc.), and additionally, the shape of a floating object is very simple. Try shaking the bowl a bit, and the water will start to move and with it the buoyant object. The state of our system becomes a bit more chaotic, because now there are additional, ever changing wave forces that exponentially complicate buoyancy calculation. It was nice when you only had to figure out the up and down force, but now you have to deal with all of the side forces. Let's take it one step further, let's replace our ball with a plane, like a small flat raft. Imagine the same scenario, and all of a sudden you realize that not every part of our raft is experiencing the same forces, because some parts arrive at a certain location before others do, and by the time the other parts arrive on the same location, the forces may've already change. We usually deal with such complex situations by breaking them down into smaller pieces. That means we start to measure and calculate forces only over certain area at certain points of the raft and then figure out how much force we need to apply in order to compensate for the other parts of our raft that aren't measuring the forces. This can get very expensive, and when you're working with a fully dynamic ocean, it gets even harder because you can't make so many assumptions that simplify or improve accuracy of the calculation and because some simplifications can get exposed in funny and/or scary ways, such as objects flying off the ocean surface when it gets too wild. We're not done yet. Add another dimension to the raft, height. Let's put a simple column right in the middle of it, like a sail, and make it half the lenght of a raft. Now it's not totally flat anymore and at this point, you have to deal with the fact that your object has nonuniformly distributed mass and with it an added tendency to flip at certain angles. And there you go, still a simplified scenario of what you have to deal with in a real-time buoyancy simulation; try adding propulsion, wind and collision forces etc. Imagine a complex shape like a ship, simplify it by measuring/calculating forces at only a few points that, if connected, form a simpler shape like a rectangle -- even now, you still have to deal with all the above mentioned variables, aside from the changing ocean forces. And at this point, you've probably forgot about the deadlines and limited budget that are part of everyday life in game development. Todd Howard said it best, "We can do anything, but not everything."

Just Cause 3, The Witcher 3, Far Cry 3

We've learned to deal with the fact that not everyone even remotely understands what goes into something like this, much less into the making of a whole game. When people are Kickstarting their MMOs with hundreds of thousands of dollars, an alarmingly small amount of people actually knows what a joke that is to someone who went through the whole development process of a game like that, it's peanuts. Even that, I can understand. But I'm not okay with the fact that when, for example, guys at CDPR (The Witcher 3) or Avalanche Studios (Just Cause 3) create amazing, gorgeous games and share their knowledge and enthusiasm with the world, they get hit with a genuine "Your water sucks". You know what, mister anonymous, your water sucks, because you never even tried to understand how complex this problem really is. If you want more detailed description of what went into Just Cause 3 buoyancy, please read this awesome, free article, written by Jacques Kerner of Avalanche Studios. And if you want a simpler overview, without any math of what went into the system above, you can read on bellow. I may publish a detailed description, if there is a demand.


Technical overview


This demo scene was made with a custom build of Unreal Engine that I've been optimizing for VR for about a year in my company. I'm going to give you a few specific highlights for this scene and a few general tech highlights of this build.
Obviously, it uses WaveWorks for the core of the ocean simulation; currently, I have an alternative rendering path for AMD that performs a bit worse. To make it clear, lower performance has nothing to do with GameWorks (it's not used at all on AMD), it's because I just didn't have time to work on it as much as NVIDIA did on their algorithms, and even if I did, it's hell of a lot of work to optimize for specific GPU architectures. I'm pretty confident that NVIDIA will eventually make a Compute shader path for rendering on non-NV GPUs, and then everyone could use it as a default UE ocean sub-system on desktops. I do sometimes think about doing that instead of them, as I'm sure their time is very limited, but unfortunately, so is mine; maybe one day. Moving on, you could say that the most amazing thing about this video is its real-time buoyancy calculation. For realizing that part, I have a five part system. At the bottom, we have WaveWorks with displacement readback running on DirectX, with exposed functions for wind direction, beaufort scale etc. On top of that, we have an ocean manager system. Its main role is to control the parameters passed to WaveWorks and, even more importantly, to efficiently calculate the ocean surface from a certain point. Remember, readback is a vertex displacement from a certain point on ocean surface, so in order to do a raycast, you have to iteratively search for your point on the surface. Here we have a few optimizations, aside from the standard things like square vector size when comparing. First, you can specify how accurately you want to search for your surface points. You do that by setting your search area; first, you can select step number and step size, then you can select at what distance (error margin) you're happy with your search, so that when accurate enough results are found early on, it can stop searching deeper. You can also specify the multiplier for all of those settings relative to the beaufort scale, if you want to really tweak the performance. Keep in mind that the main goal of this system is to be as dynamic, scalable and accurate as possible, so that you can have any wave size you want. Another optimization is, that we take the wind vector into account when searching for our world ocean surface location. Instead of just uniformly looking around the specific point, we can now predict where the displacments are larger and then scale/stretch search area under the hood to find the results faster. The points that ocean manager reads are simple actors by themselves that track a couple of things, most notably their size, local relative location, absolute world location, along with current wave height with absolute world location, and are stored in a static list that gets processed by the ocean manager, all in one batch. At the top, there is a three part system. The buoyancy manager collects and tracks points on a specific object, and then calculates and applies the buoyant force. Then, we have the general buoyancy object/point for reading and applying force (that buoyancy manager tracks and ocean manager reads and writes to), and the additional gravity alpha variable that gets modified by the math model of the ship. The equation that does that takes inputs from 4 points (from each corner of the ship/object) and outputs the gravity alpha that gets processed in the next buoyancy tick. The output of this equation is based on angles that get calculated from the 4 points, so in reality, it predicts what is happening to the ship based on its rotation, and then that gets applied to the buoyancy.
At the end of the day, buoyancy in the above scene is the result of a whole bunch of techniques that strike quite a nice balance between reading the best ocean surface approximation and running the in-between prediction. Make no mistake, these ships do and can more, there is a build-in system to specify ships path and the engine force, even more, they don't just move, they really float, the ships get rotated and carried away with the current over time. There are more parts to this, but these five are the main ones. Please note that we've left out all the math that figures out how much force to apply and where, depending on how much of the object is submerged, among other factors. The end result is cheaper buoyancy with higher accuracy. It's not perfect, but it scales really well, so you can make it as accurate as you want for some additional performance cost. Another very important thing to note is that keeping calculations asynchronous is a big win. There're many ways to do this; the two that I use a lot are the build-in async physics scene, there is a caveat to this. Dynamic objects in async and sync scene normally can't cross-interact with each other (static object are in both scenes), so you have to make sure that all object that should interact are in the same scene, or that you are manually doing cross tracing/sweeping. The other is that you manually spread your calculations over multiple frames. For example, you can put ships into batches and process one batch per frame, and with a bit of tweaking, you can harness some very good results. Now, off to a visual side.


Performance heavy point lights on the left side, culled. Depth/stencil lights on the right side, cheap, visible.

The lights in the video are not point lights, and there are multiple reasons for that. First, it would kill the performance; even if you tweak attenuation radius, there is no way you're getting 1000 lights on the screen, especially because there are so many other things going on. Even if you could, the lights get culled at a certain distance; you can manually set light culling to a very low number, but you won't get much better visual results, except the additional performance drop. All lights are drawn in a pre and post tonemapping phase, depending on where they are from the camera; similar system to that of GTA V, but I didn't use tiles/images here, because I didn't want to cover too much screen surface with lights to distract from the ships. Also, you probably wouldn't use that up close anyways, especially in VR, you would spot the 2D nature of the post-process light halo texture. Quick note on the ships; in reality they usually have an additional white light at the back, green light on their right and red light on their left, I tried it but decided against it at the end, because of the nicer atmosphere these white lights created. The water splash/foam around the ships is a noise, generated in a water shader by reading from distance fields that the ships are in. There is no way in hell you could use particle effects for each ship here, the particle emitters would totally bottleneck your CPU. Yes, even if you use GPU particles, they still cost you a bit of the CPU; in fact, make sure to always measure GPU/CPU performance to properly balance your resource usage.

Take a look at the edges of the screen and the reflections, there is no screen space reflection fading on the right image.

It looks really nice, but performance-wise I don't have a silver bullet for this problem. I made a system that when you render at higher screen percentage, you can take part of the render target scale and use it to render at a bit higher FOV, then you can crop the result in a full screen/post effect shader and remove any screen space reflection fading by the edges. To completely remove them, in my system, you have to render at about 115 % of resolution, but at 110 %, there are still almost no artifacts. I was playing with an idea to optimize this wide reflection, so that only a limited set of objects would actually get rendered outside of the final view area, with basic shading, ignoring translucency etc. Not sure how well that would hold up to the alternative approaches. For VR, there is now an additional rendering path called "Hybrid stereo" that combines the instanced approach with the more brute force, two passes approach (left, right eye).

Left eye pass is blue and right is green, notice that only sea gets rendered in two passes here.


Unreal Engine, when running in instanced stereo ideally only renders left eye and instances everything else, but not every rendering path can yet be instanced in or outside of the engine, so that's why combining both approaches comes in handy sometimes. The good thing about this is that you can make third party libraries that don't explicitly support instancing work efficently with UE4 instanced stereo rendering system. It adds a "Hybrid stereo" check option for each component in the scene, so that you can manually select parts of the scene that you want to render for each eye. Usually I use Filmic Tonemapping that doesn't desaturate blacks as much as Reinhard for most of my work and I have it ready in the post shader, but I've left it off in this video for some reason, the sad thing is that so many little things (maybe not tonemapping) would get overlooked anyway, because of the pretty low video quality on youtube, it looks so much crisper when running in real-time. For VR, there are a couple of really cool things. First and foremost, the instanced stereo that Epic build-in is awesome. But you do have to make it work with other things, like third party libraries I mentioned, that issue draw calls on their own. Unreal draws instanced stereo by calling DrawIndexedInstanced while on DirectX (10 or higher), DrawElementsInstanced if running on OpenGL (3.1 or higher), and if you can't draw mashes/objects by calling that yourself, then you have to render in two passes; a nastier approach would be to hook up to DX, intercept, modify and forward draw calls. The second and arguably the most amazing VR feature is NVIDIA's MultiRes rendering. The image bellow is not rendered in stereo so that the effect is more apparent.

Red rectangle represents pixel density of  1.0, everything outside of it is rendered at 0.2 density (exaggeration).

This can save crazy amounts of pixels that are rendered to the screen, it's something that it's not used in the video as it's not running in VR, but let me explain it a bit anyway. HMD lenses distort the image, most noticeable at the edges. What you may not know is that this is the primary reason we have to render at much higher resolution to get 1:1 pixel density at the lens center. What MultiRes does is exactly that, it renders at multiple resolutions. You can specify the relative resolution of the off-center areas and the relative size of these areas, and with that you can save up to 40 % of pixels that you have to shade, without even noticing the difference. Note that you can easily save 60 % or something similar, but you may notice the lower resolution at that point, usually in the form of a line that's the result of a super sharp to super blur transition, it's heavily dependent on the scene you are looking at. Pretty cool, right? Well, it's not free, resource-wise, meaning it will cost you some performance to use it, so the trick as always is to get a net positive performance. The more pixels you shade and the more complex your shaders are, the better performance increase you'll get. Undoubtably this is the dawn of foveated rendering that we'll all be using in the future.

Quick note on GameWorks. Yes, it uses NVIDIA's GameWorks (WaveWorks and VRWorks) and I do have a couple of other NV features integrated in my branch of the engine, but honestly, GameWorks is really only as good as its implementation, at its core is a very polished library of algorithms.

For a bit more about it read NVIDIA's blog post about this scene.

Demo scene information


  • Running at 1920x1080 with 133 % render target (screen percentage).
  • Graphics settings are all on Epic. (max in Unreal Engine)
  • Runs at: avg. ~45 FPS, min. ~30 FPS, max ~ 60 FPS.
At this point, 95 % of the time, my CPU is the bottleneck.

Made and ran on this PC:

Motherboard: Z97-DELUXE(NFC & WLC)
PSU: Cooler Master "V Series" V850
CPU: Intel® Core™ i7-4790K 4.6Ghz
CPU cooler: Hydro Series™ H100i Extreme Performance
GPU: GeForce GTX TITAN X
RAM: G.SKILL Trident X Series 32GB (4 x 8GB) DDR3 2400 Mhz
SSD0: Samsung SSD 850 PRO 256GB
SSD1: Samsung SSD 850 EVO 500GB
HDD: WD Green 2TB
CASE: NZXT. Phantom 410 Black