컴퓨트 셰이더(Compute Shader)

Notice

Recent Posts

Recent Comments

Link

관리 메뉴

Graphics Programming

컴퓨트 셰이더(Compute Shader) 본문

Season 1/OpenGL

컴퓨트 셰이더(Compute Shader)

minseoklee 2015. 10. 4. 04:22

※ 예제 코드들은 OpenGL Super Bible에서 가져온 것입니다.

- 컴퓨트 셰이더 스테이지는 별개의 파이프라인. OpenGL의 다른 셰이더들과 단절되어 있다

- 고정된 입출력이 없다

- 프로그래밍 관점에서는 다른 셰이더들과 같다

- GLSL로 작성, 셰이더 개체로 표현, 프로그램 개체에 링크됨

- 컴퓨트 셰이더는 다른 셰이더들과 섞일 수 없다

- 이미 정점, 단편 셰이더가 있는 프로그램에 컴퓨트 셰이더를 붙일 수 없음 (링크 실패)

- 링크된 프로그램은 컴퓨트 셰이더만 포함하거나 그래픽스 셰이더들(정점, 테셀레이션, 기하, 단편)만을 포함한다

// 아무것도 안 하는 컴퓨트 셰이더

#version 430 core

layout(local_sizee_x = 32, local_size_y = 32) in;

void main() {}

// 컴퓨트 셰이더 실행

void glDispatchCompute(GLuint num_groups_x, num_groups_y, num_groups_z);

void glDispatchComputeIndirect(GLintptr indirect);

- indirect: 버퍼 개체 내에서의 오프셋. glDispatchCompute()에 전달할 수 있는 매개변수들이 포함된다

워크 그룹

- 컴퓨트 셰이더는 워크 그룹 단위로 실행된다

- glDispatchCompute[Indirect]() 한 번 호출 시

- 하나의 글로벌 워크 그룹이 GL에 전송된다

- 글로벌 워크 그룹은 로컬 워크 그룹들로 나눠진다

- 로컬의 x, y, z 방향 크기는 num_grous_x/y/z (기본값: 1/1/1)

- 워크 그룹은 워크 아이템들의 3D 블록

- 각 워크 아이템마다 컴퓨트 셰이더 실행

- 최대 크기 : GL_MAX_COMPUTE_WORK_GROUP_SIZE_INVOCATIONS (최소 1024)

- 각 방향의 크기 (x, y, z) >= (1024, 1024, 64)

- 로컬 워크 그룹 크기 알아내기

- int size[3];

- glGetProgramiv(program, GL_COMPUTE_WORKGROUP_SIZE, size);

입출력

- 내장된 출력은 없다

- 다른 셰이더들과 달리 사용자 정의 출력을 선언할 수 없다

- 내장된 입력 변수들

- uvec3 gl_LocalInvocationID

- uvec3 gl_WorkGroupSize

- uvec3 gl_NumWorkGroups

- uvec3 gl_WorkGroupID

- uvec3 gl_GlobalInvocationID = gl_WorkGroupID * gl_WorkGroupSize + gl_LocalIinvocationID

- 데이터는 셰이더 코드에서 메모리에 직접 저장해야 한다

- Shader Storage Block에 기록

- image 함수들

- atomic counter 증감

// 이미지 변수를 이용한 출력

#version 430 core

layout(local_size_x = 32, local_size_y = 32) in;

layout(binding = 0, rgba32f) uniform image2D img_output;

layout(binding = 1) uniform image2D img_output;

void main() {

vec4 texel;

ivec2 p = ivec2(gl_GlobalInvocationID.xy);

texel = imageLoad(img_input, p);

texel = vec4(1.0) - texel;

imageStore(img_output, p, texel);

}

glBindImageTexture(0, tex_input, 0, GL_FALSE, 0, GL_READ_ONLY, GL_RGBA32F);

glBindImageTexture(1, tex_input, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F);

glDispatchCompute(IMAGE_WIDTH/32, IMAGE_HEIGHT/32);

동기화

플로우 동기화: barrier()

같은 워크 그룹 내 모든 invocation이 barrier() 호출 시점에 다다를 때까지 대기

메모리 동기화

memoryBarrier() : 지금까지의 image, buffer, shared variable 쓰는 연산들에 모두 배리어 적용

memoryBarrierShared(), memoryBarrierImage(), memoryBarrierBuffer() : 각각에 대해서만 적용

groupMemoryBarrier() : 같은 워크 그룹 내의 모든 invocation에 대해 동기화

subsum 예제 코드

// test compute shader

string cshader = R"(

#version 430 core

layout (local_size_x = 128) in;

layout (binding = 0) coherent buffer block1 {

uint input_data[gl_WorkGroupSize.x];

};

layout (binding = 1) coherent buffer block2 {

uint output_data[gl_WorkGroupSize.x];

};

shared uint shared_data[gl_WorkGroupSize.x * 2];

void main() {

uint id = gl_LocalInvocationID.x;

uint rd_id, wr_id, mask;

const uint steps = uint(log2(gl_WorkGroupSize.x)) + 1;

uint step = 0;

shared_data[id * 2] = input_data[id * 2];

shared_data[id * 2 + 1] = input_data[id * 2 + 1];

barrier();

memoryBarrierShared();

for(step = 0; step < steps ; step++){

mask = (1 << step) - 1;

rd_id = ((id >> step) << (step + 1)) + mask;

wr_id = rd_id + 1 + (id & mask);

shared_data[wr_id] += shared_data[rd_id];

barrier();

memoryBarrierShared();

}

output_data[id * 2] = shared_data[id * 2];

output_data[id * 2 + 1] = shared_data[id * 2 + 1];

}

)";

// createComputeProgram() is not a GL function.

GLuint computeProgram = createComputeProgram(cshader);

assert(computeProgram);

// test data

std::vector<GLuint> subsum_data(128, 1);

GLuint buf_in;

glGenBuffers(1, &buf_in);

glBindBuffer(GL_SHADER_STORAGE_BUFFER, buf_in);

glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(GLuint) * subsum_data.size(), &subsum_data[0], GL_DYNAMIC_COPY);

GLuint buf_out;

glGenBuffers(1, &buf_out);

glBindBuffer(GL_SHADER_STORAGE_BUFFER, buf_out);

glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(GLuint) * subsum_data.size(), NULL, GL_DYNAMIC_COPY);

// run subsum shader

glUseProgram(computeProgram);

glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, buf_in);

glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, buf_out);

glDispatchCompute(128, 1, 1);

glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, 0);

glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, 0);

// validation

glBindBuffer(GL_SHADER_STORAGE_BUFFER, buf_out);

GLuint* subsum_result = reinterpret_cast<GLuint*>(glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_ONLY));

for (int i = 0; i < 128; ++i) {

cout << subsum_result[i] << ' ';

}

cout << endl;

glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);

저작자표시 비영리 동일조건 (새창열림)

'Season 1/OpenGL' Related Articles

Comments

Graphics Programming

컴퓨트 셰이더(Compute Shader) 본문

컴퓨트 셰이더(Compute Shader)

티스토리툴바