- Map and Unmap using discard functions calls between the drawcall submitting for updating data to the GPU.
- Double or triple buffer of big buffers for uploading instances, bones,....
- Only the GPU thread was able to upload data easy, that means that it was a lot of copy and synchronisation of data from the game thread to the render thread, and then the render thread was submitting to the GPU.
Life handling
Dynamic memory management
It is implemented using an allocator inside a big reserved buffer (that memory can be used as a buffer or vertex buffer) in the GPU memory (upload heap), this allocator will reserve a segment of memory for each job working in a frame when it needed and will return memory inside that segment until it is full. then it will allocate another segment. Each segment will know in what frame was used, so the renderer will keep the segments alive until the GPU has done the frame.
Using this system we can update GPU memory from all the workers (game and render jobs will able to write directly in the GPU memory) and because we know what frame was it used, we know when we can reuse the segment for another allocation. With this implementation we need only to sync inside the allocator when we need to access to the segments, in case there are two workers allocating a segment or the renderer freeing segments because the GPU has finished to needed them. Once a segment is associated to a worker/frame, it doesn't need to sync with other threads, as the segment cannot access from another threads. So the lineal allocator inside a segment doesn't need any synchronisation.
The interface is really simple:
//Alloc dynamic gpu memory void* AllocDynamicGPUMemory(display::Device* device, const size_t size, const uint64_t frame_index);
Just allocate the memory and fill it, then you can calculate the offset inside the buffer (to pass it to your GPU shaders) using the base pointer of the resource and the return pointer.
Sample:
The clear benefit is that we are submitting to the GPU from all the worker threads at the same time, avoiding to copy them in another intermediate buffer and almost without contention, we copy directly to the GPU memory. But it has a negative, that we are going to do more drawcalls, as each worker thread needs a drawcall. Of course, if we are drawing a lot of instances, it is not going to be a big issue (modern GPUs are good overlapping drawcalls), instead one drawcall we are going to have 6 minimum drawcalls (in a 6 processor CPU and we are going to have more if the memory block get filled for the same worker). The benefits are bigger than the negative in the ECS test.
Code: https://github.com/JlSanchezB/Cute/blob/master/engine/render_module/render_segment_allocator.h
No comments:
Post a Comment