Entries for tag "optimization", ordered from most recent. Entry count: 5.
# Rendering Optimization - My Talk at Warsaw University of Technology
If you happen to be in Warsaw tomorrow (2017-12-13), I'd like to invite you to my talk at Warsaw University of Technology. On the weekly meeting of Polygon group, this time the first talk will be about about artificial intelligence in action games (by Kacpi), followed by mine about rendering optimization. It will be technical, but I think it should be quite easy to understand. I won't show a single line of code. I will just give some tips for getting good performance when rendering 3D graphics on modern GPUs. I will also show some tools that can help with performance profiling. It will be all in Polish. The event starts at 7 p.m. Entrance is free. See also Facebook event. Traditionally after the talks we all go for a beer :)
# When QueryPerformanceCounter call takes long time
QueryPerformanceCounter function is all about measuring time and profiling performance, so I wasn't able to formulate right Google query to find a solution to the problem I had - call to
QueryPerformanceCounter function itself taking too much time. Below I describe what I eventually found out.
It all started from hardware failure. My motherboard stopped working, so I needed to buy a new one (ASRock X370 Killer SLI). I know that normally changing motherboard requires reinstalling Windows, but I tried not to do it. The system didn't want to boot, so I booted the PC using pendrive with Windows installer and launched the repair function. It helped - after that Windows was able to start and everything seemed to work... until I launched the program that I develop on that machine. It was running painfully slow.
I tried different things to find out what is happening. Input/output to hard drive or network was not an issue. GPU performance was also OK. It seemed that the app is just doing its calculations slowly, like the CPU was very slow. I double-checked actual CPU and RAM frequency, but it was OK. Finally I launched sampling profiler (the one embedded in Visual Studio - command: Analyze > Performance Profiler). This way I found that most of the time is spent in function
This WinAPI function is the recommended way to obtain a timestamp in Windows. It's very precise, monotonic, safe to use on multiple cores and threads, it has stable frequency independent of CPU power management or Turbo Boost... It's just great, but in order to meet all these requirements, Windows may use different methods to implement it, as described in article Acquiring high-resolution time stamps. Some of them are fast (just reading TSC register), others are slow (require system call - transition to kernel mode).
I wrote a simple C++ program that tests how long it takes to execute
QueryPerformanceCounter function. You can see the code here: QueryPerformanceCounterTest.cpp and download 64-bit binary here: QueryPerformanceCounterTest.zip. Running this test on two different machines gave following results:
CPU: Intel Core i7-6700K, Motherboard: GIGABYTE Z170-HD3-CF:
> QueryPerformanceCounterTest.exe 1000000000
Executing QueryPerformanceCounter x 1000000000...
According to GetTickCount64 it took 0:00:11.312 (11.312 ns per call)
According to QueryPerformanceCounter it took 0:00:11.314 (11.314 ns per call)
CPU: AMD Ryzen 7 1700X, Motherboard: ASRock X370 Killer SLI (changed from different model without system reinstall):
> QueryPerformanceCounterTest.exe 10000000
Executing QueryPerformanceCounter x 10000000...
According to GetTickCount64 it took 0:00:24.906 (2490.6 ns per call)
According to QueryPerformanceCounter it took 0:00:24.911 (2491.1 ns per call)
As you can see, the function takes 11 nanoseconds on first platform and 2.49 microsenonds (220 times more!) on the second one. This was the cause of slowness of my program. The program calls this function many times.
I tried to fix it and somehow convince Windows to use the fast implementation. I uninstalled and reinstalled motherboard drivers - the latest ones downloaded from manufacturer website. I upgraded and downgraded BIOS to different versions. I booted the system from Windows installation media and "repaired" it again. I restored default settings in UEFI/BIOS and tried to change "ACPI HPET Table" option there to Disabled/Enabled/Auto. Nothing worked. Finally I restored Windows to factory settings (Settings > Update & Security > Recovery > Reset this PC). This solved my problem, but unfortunately it's like reinstalling Windows from scratch - now I need to install and configure all the apps again. After that the function takes 22 ns on this machine.
My conclusions from this "adventure" are twofold:
QueryPerformanceCounterto execute slowly on some platforms, like for 2.5 microseconds. If you call it just once per rendering frame then it doesn't matter, but you shouldn't profile every small portion of your code with it, calling it millions of times.
Update 2017-12-11: A colleague told me that enabling/disabling HPET using "bcdedit" system command could possibly help for that issue.
# Experience with Game in Scheme
Last week I talked with a colleague about optimizing performance of his game project. It's written in Scheme (functional programming language, dialect of Lisp), using SDL and OpenGL. The bottleneck was GUI rendering.
When analyzing code, we found that the function for rendering text renders TTF font to a new surface with function
TTF_RenderUTF8_*, then creates another
SDL_Surface to convert format, then creates and fills OpenGL texture with
glTexImage2D. The texture is then used just once for rendering... to another freshly allocated
SDL_Surface, to perform clipping. All this happens for every GUI control, in every frame! Finally the objects are freed by Scheme garbage collector.
He then asked me whether we should find a way not to render the GUI part of the screen every frame, but only when something changes. He also cited the famous and often misused quote of Donald Knuth: "Premature optimization is the root of all evil".
I explained to him that modern GPU-s are able to render millions of triangles every frame and redrawing whole screen every frame is a standard practice. It's resource creation (allocation and filling of surfaces and textures) what should be avoided and not done every frame. If using SDL_ttf for text rendering, a final texture/surface should be prepared once and used to do just rendering in every frame, until the text changes. Additionally, calls like
glDisable(GL_DEPTH_TEST) also don't have to be made for every object in every frame.
When thinking about it now, the general rule of game optimization could be:
About functional programming and stuff like that, I think such high level concepts are good for software development as long as they are accompanied with understanding of what's under the hood. Some principles of functional programming, like avoiding side effects and just transforming one list of items to the other, are similar to the idea of DOD (Data-Oriented Design), used in game development for efficiency and scalability. But I don't agree that thinking about efficiency should be avoided until necessary or that optimization makes code complex and unreadable. Quite contrary - I believe simple code is both readable and efficient.
As I have little experience with functional programming, for me it was also fascinating to learn basics of Scheme. It's based on simple principles yet it is so powerful. I'm now thinking about how a language like this could be applied everywhere, from program configuration and as a description language, to game scripting and procedural media generation :)
# Writing Efficient C++ Code - my Article in ProgramistaMag
My article - "Writing Efficient C++ Code" - is about achieving best possible efficiency of native C++ code. It describes data-oriented design as an alternative to pure object-oriented philosophy and shows its advantages, mostly related to consciously designed layout of data in memory, which makes good use of CPU cache. It also mentions operations that should be avoided if code is to be efficient, shows some language tricks and Visual C++ project options that help with generating efficient code and mentions parallelization.
Besides, in each issue of the magazine you can find interesting articles about different programming languages, libraries and technologies, as well as interviews, book reviews and other articles related to software development. The magazine is available for subscription in electronic and paper form.
# IGK Conference - Our Slides
8th Polish Game Engineering Conference (VIII Ogólnopolska Konferencja Inżynierii Gier Komputerowych) IGK-8'2011 is over so now we can publish slides from our presentation:
It's in Polish. Here is abstract of the paper:
(Polish) Artykuł opisuje wady programowania obiektowego – zarówno od strony projektowej, jak i ze względu na wydajność kodu. Porusza problem opóźnienia w dostępie do pamięci RAM we współczesnych architekturach komputerowych. Przedstawia programowanie zorientowane na dane (ang. DOD – Data-Oriented Design) jako alternatywne podejście do projektowania i implementowania silnika gry kładące nacisk na optymalizację struktur danych pod kątem szybkości. Porusza także problem wydajności poszczególnych konstrukcji języka C++.
(English) Paper describes pitfalls of object-oriented programming – from the design perspective, as well as regarding code performance. It mentions problem of latency in accessing data in RAM memory on today computer architectures. It shows DOD (Data-Oriented Design) as an alternative approach to design and implementation of a game engine focused on optimizing data structures in terms of performance. It also describes efficiency of different C++ language constructs.