Thread Tools Display Modes
09-26-18, 10:56 AM   #1
Blooblahguy
A Deviate Faerie Dragon
AddOn Author - Click to view addons
Join Date: Oct 2009
Posts: 18
oUF performance

Hello,

I'm finding that oUF this expansion has been a huge cpu hog compared to other expansions. I think in large part this is due to the way that blizzard has changed it's lua implementation and how we need to fetch Auras and CombatLog entries now. However I'm wondering if there are plans to implement performance improvements into the oUF core? I'd be happy to submit merge requests on the official github in this effort, but I'm not sure if that steps on toes or if those are largely ignored with such a large userbase. I'll put some performance notes below that I see oUF having problems with.

Things oUF could do better for much faster function cycles:
* Localizing common functions in each script. Things like UnitAura / UnitBuff / UnitDebuff / UnitReaction / UnitThreatSituation and so on. There are easily 100 functions that could be localized and localized function references are a minimum of 30% faster (i've profiled in some cases up to 300% faster)
* Localizing variables outside of for loops, also a massive performance increase
* Creating tables or table templates(key sizing) outside of loops and just updating their reference inside of loops
* Result memoizing - I believe this is possible in WoWs implementation, but storing some function results so that when the same function parameters are given, it simply returns the same result as the last time rather than recalculating. This can be useful for common calls such as UnitName or internal functions that use unique string names as input. `unit` changing frequently might make this difficult to implement on things like nameplates or raid frames. Food for thought.
* I've noticed that OnShow can result in all of a frame elements forcing an update, which seems likely unecessary and a huge resource hog for frames that hide and display frequently.
* There are also OnUpdate script in a number of default elements which do a lot of calculation that I think should be revisited
* Default blizzard addons tend to continue to run even when hidden and their main driver has events unregistered. Once their frames are created they have subevents registered and still seem to be firing an unbelievably high amount. I think when we spawn raid frames, we should DisableAddon on Blizzard_CompactRaidFrames, on nameplates DisableAddon Blizzard_Nameplates, so on so forth. I've been profiling these frames and they are absolutely decimating performance when they should be disabled. I was finding that addons like WAs were holding up > 3s of cpu time over the course of a raid fight, but that just CompactRaidFrame_Unit1 was upwards of 80s. Same goes for nameplates.

BFA has been a poorly optimized expansion, Uldir has been a poorly optimized raid, and these days more than ever people are running crappy addons or WAs that eat up a ton of CPU usage. I think oUF can help fix some of that. I have 3 addons that all use oUF and the 3 of them together are starting to get kinda cpu-heavy just from their oUF elements.
  Reply With Quote
09-26-18, 02:33 PM   #2
JDoubleU00
A Firelord
 
JDoubleU00's Avatar
AddOn Author - Click to view addons
Join Date: Mar 2008
Posts: 463
I'm curious, have you tried other layouts to see if this is a problem with them also? I do not fully understand if the issues you are describing are problems with the core code or could it be with something in the layout. Again, I'm not the most knowledgeable, so I'm just asking.
__________________
Author of JWExpBar and JWRepBar.
  Reply With Quote
09-26-18, 03:49 PM   #3
Blooblahguy
A Deviate Faerie Dragon
AddOn Author - Click to view addons
Join Date: Oct 2009
Posts: 18
I haven't tried other layouts, but I've tried a few things with my own layouts to try and narrow this down. It seems especially bad with the nameplate implementation, even with my layout function returning on line 1 and the nameplate callback function returning on line 1.

I'm also able to see when it's oUF elements vs layout-specific elements because oUF is technically implemented as a separate addon from my 3 addons that use it. So CPU profiling shows it as it's own cpu instance.

I think part of the problem may be that the DisableBlizzard functions are not covering what they used to, I'm investigating more when I get home today.
  Reply With Quote
09-26-18, 07:26 PM   #4
lightspark
A Rage Talon Dragon Guard
 
lightspark's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2012
Posts: 341
Originally Posted by Blooblahguy View Post
* Localizing common functions in each script. Things like UnitAura / UnitBuff / UnitDebuff / UnitReaction / UnitThreatSituation and so on. There are easily 100 functions that could be localized and localized function references are a minimum of 30% faster (i've profiled in some cases up to 300% faster)
Originally Posted by Blooblahguy View Post
* Localizing variables outside of for loops, also a massive performance increase
Oh, that's a hot topic when we're prepping oUF 7.0, which was a major revamp. After discussing it for months, we chose not to do such optimisations. In general, it's not really needed, unless you're running loops w/ literal millions of iterations, performance gains from using local functions over global ones are kinda marginal. TBQH, in the addon dev community this thing is seen as a preference, not a necessity.

However, if it's something major, e.g., the 300% cases you've just mentioned, we, at least I personally, would like to hear about those.

Originally Posted by Blooblahguy View Post
* Creating tables or table templates(key sizing) outside of loops and just updating their reference inside of loops
Examples? o_O

Originally Posted by Blooblahguy View Post
* Result memoizing - I believe this is possible in WoWs implementation, but storing some function results so that when the same function parameters are given, it simply returns the same result as the last time rather than recalculating. This can be useful for common calls such as UnitName or internal functions that use unique string names as input. `unit` changing frequently might make this difficult to implement on things like nameplates or raid frames. Food for thought.
That's a bad idea, we don't have any control over what data Blizz APIs send us, it's entirely possible to get a different result while using the same params because something on the back end decided so.

Originally Posted by Blooblahguy View Post
* I've noticed that OnShow can result in all of a frame elements forcing an update, which seems likely unecessary and a huge resource hog for frames that hide and display frequently.
It's necessary to update all elements OnShow because there's no way to know if something about that frame's unit has changed, even if it's somewhat persistent like "player", although even it can be replaced w/ "vehicle" sometimes, but something like "target" or "nameplate*" is often pointing at different PCs/NPCs when they re-appear on your screen.

Originally Posted by Blooblahguy View Post
* There are also OnUpdate script in a number of default elements which do a lot of calculation that I think should be revisited
We try to use OnUpdate only when we really have to, if you find problematic ones, feel free to point them out. But, please, don't try to replace OnUpdates w/ coroutines

Originally Posted by Blooblahguy View Post
* Default blizzard addons tend to continue to run even when hidden and their main driver has events unregistered. Once their frames are created they have subevents registered and still seem to be firing an unbelievably high amount. I think when we spawn raid frames, we should DisableAddon on Blizzard_CompactRaidFrames, on nameplates DisableAddon Blizzard_Nameplates, so on so forth. I've been profiling these frames and they are absolutely decimating performance when they should be disabled. I was finding that addons like WAs were holding up > 3s of cpu time over the course of a raid fight, but that just CompactRaidFrame_Unit1 was upwards of 80s. Same goes for nameplates.
We absolutely shouldn't be doing this. You can't disabled nameplates completely, Blizz force the UI to use friendly nameplates in raids/dungs, moreover, despite various parts of Blizz UI being located in the AddOns folder they shouldn't be treated as such. Many of them are often referenced w/o any safeguards from the main UI code that's in the FrameXML folder, Blizz basically expect them to be enabled at all times.

As for Blizz compact raid frames, it's up to layout devs to disable them.

I think Blizzard_ArenaUI is the only exception we make, which is shady enough, and we'll prob rework it.
__________________

Last edited by lightspark : 09-27-18 at 06:14 AM.
  Reply With Quote
09-27-18, 04:02 AM   #5
haste
Featured Artist
 
haste's Avatar
Premium Member
Featured
Join Date: Dec 2005
Posts: 1,027
Optimizations are always welcome, but they should be backed with proper profiling. You're listing a lot of micro optimizations that probably won't have a lot of impact on the bigger picture.

Originally Posted by Blooblahguy View Post
* Localizing common functions in each script. Things like UnitAura / UnitBuff / UnitDebuff / UnitReaction / UnitThreatSituation and so on. There are easily 100 functions that could be localized and localized function references are a minimum of 30% faster (i've profiled in some cases up to 300% faster)
* Localizing variables outside of for loops, also a massive performance increase
You do save some by making globals local, but we're talking about maybe 10ms over 100k calls.

Putting locals outside of loops has a even smaller gain. The only place that would have made sense is in the aura element and I'd rather take the cleaner code over a minor optimization there.

Originally Posted by Blooblahguy View Post
* Creating tables or table templates(key sizing) outside of loops and just updating their reference inside of loops
We're already re-using (frame) tables in oUF, so there isn't really anything to be gained by this. With the aura element we could split the icons and options into separate tables, so we could benefit from Lua's array table behavior. I don't think it's worth breaking the API over it however.

Originally Posted by Blooblahguy View Post
* Result memoizing - I believe this is possible in WoWs implementation, but storing some function results so that when the same function parameters are given, it simply returns the same result as the last time rather than recalculating. This can be useful for common calls such as UnitName or internal functions that use unique string names as input. `unit` changing frequently might make this difficult to implement on things like nameplates or raid frames. Food for thought.
We would trade some CPU time for a lot of memory doing this. Since most functions return multiple values and take various input depending on element/layout/etc.

Originally Posted by Blooblahguy View Post
* I've noticed that OnShow can result in all of a frame elements forcing an update, which seems likely unecessary and a huge resource hog for frames that hide and display frequently.
With some work we could flag hidden frames as dirty, depending on event/unit combination that was called while it was hidden. Not sure how much of an impact this would make and if it's worth the extra complexity.

Originally Posted by Blooblahguy View Post
* There are also OnUpdate script in a number of default elements which do a lot of calculation that I think should be revisited
We could probably throttle castbar and runes to only update every 16ms, but it needs to be pro filed to see if it's worth it or not. range, tags and frame's OnUpdate are already throttled.

Originally Posted by Blooblahguy View Post
* Default blizzard addons tend to continue to run even when hidden and their main driver has events unregistered. Once their frames are created they have subevents registered and still seem to be firing an unbelievably high amount. I think when we spawn raid frames, we should DisableAddon on Blizzard_CompactRaidFrames, on nameplates DisableAddon Blizzard_Nameplates, so on so forth. I've been profiling these frames and they are absolutely decimating performance when they should be disabled. I was finding that addons like WAs were holding up > 3s of cpu time over the course of a raid fight, but that just CompactRaidFrame_Unit1 was upwards of 80s.
This is probably where the real meat is. Fixing this is probably a larger gain than all other points combined.
__________________
「貴方は1人じゃないよ」
  Reply With Quote
09-27-18, 04:41 PM   #6
Blooblahguy
A Deviate Faerie Dragon
AddOn Author - Click to view addons
Join Date: Oct 2009
Posts: 18
So i'll reference this document here, because it probably has better examples than what i'll list. What i've profiled so far is incomplete on it's own so i'll try and get more profiling done this weekend.

* Localizing variables outside of for loops
The size of the gain here depends on the complexity of the loop and the calls from within it. Obviously number of calls matters but i think that factor is far outweighed by how expensive common WoW functions are.

I tested the following calls in seperate 10,000 loops. Depending on the layout, frequent health updates, number of units on screen these calls frequently hit & exceed 10k in a given fight so I thought it would be a good test number.
The first number is what they clocked without localizing the API call first, the 2nd is with the api call localized

UnitIsConnected 1.775 -> 1.676
UnitExists 1.918 -> 1.826
UnitReaction 4.301 -> 4.254
UnitIsUnit 1.904 -> 1.874
UnitAura 1.925 -> 1.843
UnitIsPlayer 1.657 -> 1.589
UnitIsTapDenied 1.626 -> 1.607
UnitPlayerControlled 1.683 -> 1.596
UnitHealth 4.950 -> 4.872
UnitHealthMax 4.996 -> 4.913

total time: 26.735
total time optimized: 26.050
avg improvement: 2.62%

So granted, not large - but keep in mind this was to localize what is already a single reference, no table lookups or anything involved. Just making local UnitHealthMax = UnitHealthMax. I think total time is a really important stat here, but I'll touch back on that.

When we make the call include a lookup on a multidimensional table things look a lot different. Let's analyze the the health element since basically every layout uses it. I can't do a 1:1 comparison right now but even just looking at the lookup to unpack reaction color we see a large improvement.

Before my profile I set this table:
Code:
local parent = {}
parent.colors = {}
parent.colors.reaction = {}
parent.colors.reaction[4] = {.1, .2, .3, 1}
Code:
profile("unitreaction_color", function()
			for i = 1, 100 do
				local unitreaction = UnitReaction('nameplate1', 'player')
				local color = unpack(parent.colors.reaction[4])
			end
		end)
Takes 0.0602
Code:
profile("optimized_unitreaction_color", function()
			local unpack, UnitReaction = unpack, UnitReaction
			local r_table = parent.colors.reaction
			for i = 1, 100 do
				local unitreaction = UnitReaction('nameplate1', 'player')
				local color = unpack(r_table[4])
			end
		end)
Takes 0.0523
imrprovement: 13%

That table is as simple as it gets. This difference gets more and more pronounced the bigger the table reference is and what else the function does. We unpack colors from the self element in these cases and these self tables can often get really large, especially when layouts use many of the elements available in oUF. I tried unpacking color from my bdCore library table, which is really pretty lean, and that increased the difference to 21%. I'll try and get exact stats on oUF layouts when I get home, right now I don't have an easy way to test.

Again with all of the above in mind, I think it's important to note just how often these functions call. Maybe not from just player, target, tot, and pet but when you have raid frame and nameplate layouts then all of these call counts go up drastically.

* Creating tables or table templates(key sizing) outside of loops and just updating their reference inside of loops
This and some other points are just about potential optimizations, not that I saw it was being done blatantly wrong at any point. oUF does seem to mostly create variable inside of loops though.
Take the following code as an example

Code:
for i = 1, 1000000 do
local a = {}
a[1] = 1; a[2] = 2; a[3] = 3
end
Takes 52.240 seconds to run while
Code:
for i = 1, 1000000 do
local a = {true, true, true}
a[1] = 1; a[2] = 2; a[3] = 3
end
Only takes 20.98 seconds to run. It's 60% faster. Obviously total time is exaggerated by a high loop, but I don't think implementing this practice would take much time and the benefits start to add up.

*memoizing
I'm not the biggest fan of this either, but referencing back to the above about how long default calls can take over the course of a fight the only real method of optimization there is is to call these functions and loops less. Just to pick on the health element again it's update color function could be memoized easily because we know that given a certain set of inputs, the element will always be colored the same way.

If we pass `UnitIsTapDenied(unit)`, UnitIsPlayer(unit) = UnitIsPlayer(unit) and select(2, UnitClass(unit)) or false, and UnitReaction(unit,unit2) then we have a unique set of parameters that always return the same colors. that we could cache and return the next time we call it. I've implemented this on my nameplates because UNIT_THREAT_LIST_UPDATE and UNIT_HEALTH fire so frequently. Memory is far cheaper than processing power, and that is especially true in the case of WoW. It is absolutely worth trading some off. We could further optimize this by storing self.class, self.reaction, self.isplayer and updating those variable on the correct events - but that is definitely cumbersome.
It can't be used often though, since the whole job of oUF is to take a bunch of variable data and make it easily usable. But in the case of memory here, we're talking about creating 100kbs of table caches to save hundreds if not thousands of cpu loops.

OnShow / OnUpdate improvements
Yeah they exist for a reason, and I know it's not as simple as just disabling these things and hoping it all works out. I do think it it could be a good to give some elements an attribute to opt out of the UpdateAllElements function.

DisableAddon Blizzard stuff
So i was actually wrong on part of this. I thought I had disabled blizzard nameplates and relogged and was still able to use my oUF layout but I must not have relogged or something. This definitely does not work. However, when tracking frames functions and addon cpu usage blizzard nameplates are still clocking in really high, and I think that the handle blizzard function inside of oUF may be missing something when it disables these frames, I'll investigate more this weekend. This happens a lot more with CompactRaidFrames, which definitely can be disabled without breaking raid frame layouts. Note that this is different from the RaidUI addon, which let's players place markers and whatnot. To me it seems reasonable to disable CombatRaidFrames when a raid layout is initialized, but if you feel that is overstepping then I can understand that.

I'll try and get more profiling numbers this weekend and really dig into some of the FPS problems people are reporting to me.
  Reply With Quote
09-28-18, 02:11 AM   #7
lightspark
A Rage Talon Dragon Guard
 
lightspark's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2012
Posts: 341
I'm well aware of that document, I've read the whole book back in the day.

Originally Posted by Blooblahguy View Post
This and some other points are just about potential optimizations, not that I saw it was being done blatantly wrong at any point. oUF does seem to mostly create variable inside of loops though.
Take the following code as an example

Code:
for i = 1, 1000000 do
local a = {}
a[1] = 1; a[2] = 2; a[3] = 3
end
Takes 52.240 seconds to run while
Code:
for i = 1, 1000000 do
local a = {true, true, true}
a[1] = 1; a[2] = 2; a[3] = 3
end
Only takes 20.98 seconds to run. It's 60% faster. Obviously total time is exaggerated by a high loop, but I don't think implementing this practice would take much time and the benefits start to add up.
We create a lot temp vars and upvalues inside of loops, yes, but we never create throwaway tables like this, if we do, that's prob a mistake/type/whatever, we reuse tables as much as we possibly can.

Moreover, debugprofilestop returns time in milliseconds.

Lua Code:
  1. local lastTime = debugprofilestop()
  2.  
  3. for i = 1, 1000000 do
  4. local a = {}
  5. a[1] = 1; a[2] = 2; a[3] = 3
  6. end
  7.  
  8. print(debugprofilestop() - lastTime)

This takes 581.90662911534ms or ~0.6s on my machine w/ i5-7500.

Lua Code:
  1. local lastTime = debugprofilestop()
  2.  
  3. for i = 1, 1000000 do
  4. local a = {true, true, true}
  5. a[1] = 1; a[2] = 2; a[3] = 3
  6. end
  7.  
  8. print(debugprofilestop() - lastTime)

This takes 332.96345540881ms or ~0.3s.

However, in oUF we mainly have this scenario:

Lua Code:
  1. local lastTime = debugprofilestop()
  2.  
  3. local a_ = {}
  4. for i = 1, 1000000 do
  5. local a = a_
  6. a[1] = 1; a[2] = 2; a[3] = 3
  7. end
  8.  
  9. print(debugprofilestop() - lastTime)

This takes ONLY 60.121539920568ms or 0.06s, the lowest I've seen while benching was 0.05s.

While this

Lua Code:
  1. local lastTime = debugprofilestop()
  2.  
  3. local a_ = {true, true, true}
  4. for i = 1, 1000000 do
  5. local a = a_
  6. a[1] = 1; a[2] = 2; a[3] = 3
  7. end
  8.  
  9. print(debugprofilestop() - lastTime)

Takes 59.989032864571ms or 0.06s, given that results' fluctuation is ~0.01s, I think you understand what I'm implying here...

I'm still curious about this bit:
There are easily 100 functions that could be localized and localized function references are a minimum of 30% faster (i've profiled in some cases up to 300% faster)
the 300% thingy in particular.
__________________

Last edited by lightspark : 09-28-18 at 02:21 AM.
  Reply With Quote
09-28-18, 02:54 AM   #8
lightspark
A Rage Talon Dragon Guard
 
lightspark's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2012
Posts: 341
Originally Posted by Blooblahguy View Post
So i was actually wrong on part of this. I thought I had disabled blizzard nameplates and relogged and was still able to use my oUF layout but I must not have relogged or something. This definitely does not work. However, when tracking frames functions and addon cpu usage blizzard nameplates are still clocking in really high, and I think that the handle blizzard function inside of oUF may be missing something when it disables these frames, I'll investigate more this weekend. This happens a lot more with CompactRaidFrames, which definitely can be disabled without breaking raid frame layouts. Note that this is different from the RaidUI addon, which let's players place markers and whatnot. To me it seems reasonable to disable CombatRaidFrames when a raid layout is initialized, but if you feel that is overstepping then I can understand that.

I'll try and get more profiling numbers this weekend and really dig into some of the FPS problems people are reporting to me.
Regarding this one.

Dunno if you're talking about compact raid frames themselves, or various CompactUnitFrame_* functions, if it's the latter, then their high usage numbers often come from Blizz nameplates. Blizz nameplates reuse a lot of their compact unit frame code.

Nameplates or their driver will clock high regardless because they're implemented in Lua now, we do disable nameplate health, cast, etc bars, but we don't stop Blizz nameplate driver from doing its job because it's a risky thing to do, I even left a comment in our code that explains the reason why we do it:

Lua Code:
  1. -- Because there's no way to prevent nameplate settings updates without tainting UI,
  2. -- and because forbidden nameplates exist, we have to allow default nameplate
  3. -- driver to create, update, and remove Blizz nameplates.
  4. -- Disable only not forbidden nameplates.

On a side note, I'll be adding a way to nuke compact raid frames w/o disabling Blizz raid addons, I'll also rework how we disable arena frames, as I said earlier, the way we do it now is a kinda iffy.
__________________

Last edited by lightspark : 09-28-18 at 03:09 AM.
  Reply With Quote
09-28-18, 04:42 AM   #9
zork
A Pyroguard Emberseer
 
zork's Avatar
AddOn Author - Click to view addons
Join Date: Jul 2008
Posts: 1,740
Originally Posted by lightspark View Post
On a side note, I'll be adding a way to nuke compact raid frames w/o disabling Blizz raid addons,...
What do you mean by that?

For me fully disabling the Blizzard addons Blizzard_CUFProfiles and Blizzard_CompactRaidFrames is the way to go since I have my own raid manager frame for world markers and such.

They can be reenabled quite easily too. Why the hassle?
__________________
| Simple is beautiful.
| WoWI AddOns | GitHub | Zork (WoW)

"I wonder what the non-pathetic people are doing tonight?" - Rajesh Koothrappali (The Big Bang Theory)
  Reply With Quote
09-28-18, 05:58 AM   #10
lightspark
A Rage Talon Dragon Guard
 
lightspark's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2012
Posts: 341
Originally Posted by zork View Post
What do you mean by that?

For me fully disabling the Blizzard addons Blizzard_CUFProfiles and Blizzard_CompactRaidFrames is the way to go since I have my own raid manager frame for world markers and such.

They can be reenabled quite easily too. Why the hassle?
Yeah, no...

Technically, disabling those two addons is enough, but only if you keep them enabled by default AND you provide an option to toggle them via in-game config, so your addon's users do it themselves. That's what Grid2 does.

But in general almost all major UF addons abandoned this approach. For instance, SUF and ElvUI simply disable and hide frames on the fly w/o disabling those two addons.

Some addons, e.g., VuhDo and Grid, do nothing at all, it's up to users to figure out how to disable Blizz raid frames via tutorials and whatnot.

It's not that easy for your average addon user to reenable them, if it actually was, there wouldn't be numerous threads on this topic.

Actually, the more I think about this issue, the less I want to add this raid disabler to oUF. But overall, only SUF/ElvUI approach is good for oUF, because oUF shouldn't leave any traces and affect the UI after it's fully disabled.
__________________

Last edited by lightspark : 09-28-18 at 06:10 AM.
  Reply With Quote

WoWInterface » Featured Projects » oUF (Otravi Unit Frames) » oUF performance

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off