• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • 🏆 Hive's 6th HD Modeling Contest: Mechanical is now open! Design and model a mechanical creature, mechanized animal, a futuristic robotic being, or anything else your imagination can tinker with! 📅 Submissions close on June 30, 2024. Don't miss this opportunity to let your creativity shine! Enter now and show us your mechanical masterpiece! 🔗 Click here to enter!

[Lua] Process memory leaking at high GC pressure

Status
Not open for further replies.
Level 19
Joined
Jan 3, 2022
Messages
324
Disclaimer: This is not something you would ever encounter under normal circumstances.
TLDR: At very high GC pressure processes' memory will grow. If the GC catches up, the memory will not be released to OS.

I'm currently attempting to debug a memory leak in a big trigger-heavy map I'm porting from Jass to Lua. To do that I hooked every single function: automatic error catching and to count function calls.
Lua:
-- safely call functions from other code and catch any errors, outputting them to all chat
-- Note: By default Reforged doesn't output any errors, they can only be found in memory dumps
function safeCall(...) -- TODO: not needed, GC
    return safeCallHandle(pcall(...))
end
-- This is needed due to how varargs work in Lua
function safeCallHandle(...)
    local arg = {...} -- TODO: not needed, GC
    local ok, err = arg[1], arg[2]
    if ok == true then
        return select(2, ...)
    else
        print("|cffee2222Error occurred: ")
        for line in tostring(err):gmatch("[^\n]+") do
            -- I think errors don't have stack traces here, so only single line :(
            print("|cffee2222".. line)
        end
        -- abort execution and point to above function
        error(tostring(err), 3)
    end
end

---

originalFunctions = {
    -- [functionName] = func
    callCount = {
        -- [functionName] = call count
    }
}

blacklist = {} -- empty

profileArg1 = {
    ["RemoveLocation"] = true,
    ["DestroyLightning"] = true,
    ["DestroyGroup"] = true,
    ["RemoveDestructable"] = true,
    ["KillDestructable"] = true,
    ["DestroyTextTag"] = true,
    ["DestroyEffect"] = true,
    ["RemoveWeatherEffect"] = true,
    ["DestroyFogModifier"] = true,
    ["KillUnit"] = true,
    ["RemoveUnit"] = true,
}

for k,v in pairs(_G) do
    if type(v) == "function" then
        if k ~= "pcall" and k ~= "xpcall"
        and k ~= "print" and k ~= "select" and k ~= "error"
        and k ~= "tostring" and k ~= "type" and k ~= "tonumber"
        and k ~= "next" and k ~= "pairs"
        and k ~= "safeCall" and k ~= "safeCallHandle" then
            print("Overwriting: ".. k)
            originalFunctions[k] = v
            local origFunc = v
            local origName = k
            _G[k] = function (...)
                local profilingName = origName
                -- check if first arg is nil
                if profileArg1[origName] then
                    local args = table.pack(...) -- TODO: not needed, GC
                    profilingName = profilingName .. (args[1] == nil and "+nil" or "")
                end
                if false and not blacklist[origName] then
                    print(profilingName)
                end
                originalFunctions.callCount[profilingName] =
                    1 + (originalFunctions.callCount[profilingName] or 0)
                return safeCall(origFunc, ...)
            end
        end
    end
end

do
    trg_dump = CreateTrigger()
    act_dump = function()
        print("Dumping...")
        local sortedFuncs = {}
        for funcName, calls in pairs(originalFunctions.callCount) do
            table.insert(sortedFuncs, funcName)
        end
        table.sort(sortedFuncs, function(a,b)
            return originalFunctions.callCount[a] < originalFunctions.callCount[b]
        end)
        local out = {}
       
        for i = 1, #sortedFuncs do
            local name = sortedFuncs[i]
            -- TODO: not needed, GC
            table.insert(out, string.format("\0377d: \037s", originalFunctions.callCount[name], name))
        end
       
        PreloadGenClear()
        PreloadGenStart()
        -- Preload has a string limit of 259 chars... Windows' MAX_PATH?
        for i = 1, #out do
            Preload(out[i])
        end
        PreloadGenEnd(os.date("profiling-%Y_%m_%d-%H_%M.txt"))
       
        out = nil -- idk something in the game was causing a 16GB mem leak
        return true
    end
   
    local act_dumpLocal = act_dump
    TriggerAddAction(trg_dump, function() return safeCall(act_dumpLocal) end)
    -- note: this command stops working after 15-20s in-game, I haven't figured out why
    -- maybe it's a rogue trigger doing something
    TriggerRegisterPlayerChatEvent(trg_dump, Player(0), "-dumpprofile", true)
   
    local t = CreateTimer()
    TimerStart(t, 10, true, act_dump)
end
The function intercepts each function call and temporarily saves all input arguments in a local table.
This has caused war3.exe's process memory to bloat continuously... at one point it had reached 17GB before I noticed and on another run only 11GB in Working Set. I thought to shift the blame on the pre-configured Lua GC after the 1.31 desyncs, so I tried to reduce the generation of temporary objects in my intercepting functions:

Diff:
 function safeCallHandle(...)
-       local arg = {...} -- TODO: not needed, GC
-       local ok, err = arg[1], arg[2]
+       local ok, err = ...
        if ok == true then
                return select(2, ...)
        else

@@ -447,8 +446,8 @@ for k,v in pairs(_G) do
                                local profilingName = origName
                                -- check if first arg is nil
                                if profileArg1[origName] then
-                                       local args = table.pack(...) -- TODO: not needed, GC
-                                       profilingName = profilingName .. (args[1] == nil and "+nil" or "")
+                                       local arg1 = ...
+                                       profilingName = profilingName .. (arg1 == nil and "+nil" or "")
                                end
For some reason this solved the issue. I do not understand exactly why, because Lua's garbage collector should just work. Then I made a demo map to replicate garbage creation with a simple function that generates a specified amount of tables with dummy values every second (attached map):
Lua:
function genGarbage(amount)
    local n = math.random(0, 65535)
    for i = 1, amount do
        -- avoid array reallocations
        local t = {n, n+1, n+2, n+3}
    end
end
Testing this at the value of "-gen 100000", 100k tables per second resulted in a steady state of the game's process at 2.7GB memory. No further memory allocations. Smaller values settled at lower memory sizes.
Further testing at 500k tables/s: the process memory reached 21GB and kept growing before I dropped creation speed back to 123/s. The memory was never released by the process. OK I think this was just too much

My best attempt to explain the difference: my tracing functions didn't just create any tables, they created tables holding references to in-game objects: all API functions such as CreateGroup were hooked too. Although my small test shows that Lua GC has no issue with allocation speed and will eventually reach an equilibrium, the game engine must not handle it well when references are released very late, leading to an unnecessary bloat of internal structures holding the data. The original map created a lot of groups, locations and iterated over many units (it was a mix of Jass, vJass and GUI).
Anyway since there's no way to spur the GC or tune it, I'd call it a bug in Reforged. Again, normally you shouldn't encounter it.

Speculating on possible advice: clean saved references ("userdata") as early as possible (e.g. right after calling RemoveLocation etc.) For example if you keep them saved in a global table. This isn't different from Jass. However you do not need to manually null local variables, the GC will do fine. And if anyone knows how the Lua GC works in-depth, please comment if there anything else can be done (collectgarbage and similar are not available to scripts in 1.32+)
 

Attachments

  • Lua GC stress test game1.31.w3m
    17.2 KB · Views: 21
Last edited:

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,229
Other people have experienced this as well. My theory for it is that the incremental garbage collector rate is not updated until a garbage collection cycle has completed. As such if you keep allocating new objects faster than it iterates through the object it will never complete a cycle and so cannot free the memory. I think Lua normally has mechanisms to combat this but in order for the garbage collector to remain in lock step between all clients those mechanisms might have been compromised.

To avoid this problem I recommend adopting standard practices for languages without explicit freeing functionality. Prefer table reuse over allocating new tables when possible. Minimise the amount of garbage generated per unit time. As long as the client system has enough free memory, there is as good as no overhead to having tables sitting in the Lua heap unused but ready to be reused.
 
Level 19
Joined
Jan 3, 2022
Messages
324
Thanks.
I extended my profiling code to do two things for me: 1) count ALL function calls 2) record a stacktrace to detect leaked objects. Unfortunately the naive implementation of stacktracing always requires the arguments to be saved in a table thus inevitably causing this massive memory leak (2-3 MB/s)
This would've been fine with me, all I needed was to see where the map had leaks. Singleplayer didn't reveal anything: no leaks! Multiplayer? We desynced at 100% success ehm... failure rate :p Yes everyone desynced after about 40-60s in-game
Lua:
-- safely call functions from other code and catch any errors, outputting them to all chat
-- Note: By default Reforged doesn't output any errors, they can only be found in memory dumps
function safeCall(...) -- TODO: not needed, GC: either current vararg approach or table alloc
    return safeCallHandle(pcall(...))
end
-- This is needed due to how varargs work in Lua
function safeCallHandle(...)
    local ok, err = ...
    if ok == true then
        return select(2, ...)
    else
        print("|cffee2222Error occurred: ")
        for line in err:gmatch("[^\n]+") do
            -- I think errors don't have stack traces here, so only single line :(
            print("|cffee2222".. line)
        end
        -- abort execution and point to above function
        error(err, 3)
    end
end

-- overwrite all top-most global functions ONCE at the end of script loading
-- this pairs() could be the only source of indeterminism
for k,v in pairs(_G) do
    if type(v) == "function" then
        -- ugly function filter
        if k ~= "pcall" and k ~= "xpcall"
        and k ~= "print" and k ~= "select" and k ~= "error"
        and k ~= "tostring" and k ~= "type" and k ~= "tonumber"
        and k ~= "next" and k ~= "pairs"
        and k ~= "safeCall" and k ~= "safeCallHandle" then
            print("Overwriting: ".. k)
            originalFunctions[k] = v
            local origFunc = v
            local origName = k
            
            -- each function's wrapped inside this
            -- there're no iterations everything's kept simple
            _G[k] = function (...)
                local profilingName = origName
                -- check if first arg is nil
                if profileArg1[origName] then
                    local arg1 = ...
                    if arg1 == nil then
                        profilingName = profilingName .. "+nil"
                    end
                end
                if false and not blacklist[origName] then
                    print(profilingName)
                end
                originalFunctions.callCount[profilingName] =
                    1 + (originalFunctions.callCount[profilingName] or 0)
                
                table.insert(callStack, origName)
                if profileCreation[origName] then
                    -- record created, returned value
                    -- awful for wc3 gc, but must do that to track creation
                    local ret = {safeCall(origFunc, ...)}
                    trackedObjects[tostring(ret[1])] = table.concat(callStack, ">")
                    
                    table.remove(callStack)
                    return table.unpack(ret)
                    
                elseif profileDestruction[origName] then
                    local arg1 = ...
                    trackedObjects[tostring(arg1)] = nil
                    
                    local ret = {safeCall(origFunc, ...)}
                    table.remove(callStack)
                    return table.unpack(ret)
                    
                else
                    local savedStack = callStack
                    if profileYielding[origName] then
                        -- clean before yielding
                        callStack = {}
                    end
                    
                    local ret = {safeCall(origFunc, ...)}
                    
                    if profileYielding[origName] then
                        -- restore
                        callStack = savedStack
                    end
                    table.remove(callStack)
                    return table.unpack(ret)
                end
            end
        end
    end
end
Today I decided to rule out that function hooking order affects anything, now they're hooked in alphabetical order (instead of pairs):

Lua:
--[[ alphabetic key order ]]
local function compBytewise(a, b)
    for i = 1, math.max(#a, #b) do
        if (string.byte(a, i, i) or 0) < (string.byte(b, i, i) or 0) then
            return true
        else
            return false
        end
    end
    return false
end
function sortedIter(tbl)
    local out = {}
    for k, v in pairs(tbl) do
        table.insert(out, {k, v})
    end
    table.sort(out, function (a,b) return compBytewise(a[1], b[1]) end)
    
    local i = 0
    return function()
        i = i + 1
        if out[i] then
            return out[i][1], out[i][2]
        else
            return
        end
    end
end

for k,v in sortedIter(_G) do
This didn't solve the 100% chance desync between players. One last attempt I can make at avoiding this is less garbage as you said. However after my tests I come to woeful conclusions:
When the GC is behind, it will trigger the detection of desynced state (garbage collection frees references and finalizes game engine's objects? then when it's behind not all objects are GC'ed in a coherent way between players). This desync is sort of a false-positive because game's objects are meant to be no longer accessible to Lua (disregarding __gc finalizers). Therefore I guess any burst in garbage creation that is not collected in time before a desync check is due, will also cause a desync when there shouldn't be any. Possibly any Lua is going to have this element of chance to desync when the gc/desync check cycles align.
 
Level 19
Joined
Jan 3, 2022
Messages
324
I have solved the issue by rewriting everything to avoid heap allocations in the code that intercepted every single function call. I've never had such issues and never thought about stack vs heap in Lua. Broken environments (WC3) are the driving force for ingenuity <sigh> Zero memory leakage with 6-8 players on a heavy map whereas it was leaking in singleplayer before. Sure enough no more desyncs.

As a principle: avoid tables and instead use select, varargs and wrapper functions to accomplish what you need. If you need a table: route it through another function yet! This way I had pre and post-execution wrapper functions that did some stuff before/after execution of a target function.
Lua:
function leakyExample(...)
    -- table allocation only for one variable
    local args = {...}
    if args[1] == "hello" then
        print(args[1])
    end
    return ...
end


function stackBasedExample(...)
    -- everything except the first argument is dismissed
    local firstArg = ...
    if firstArg == "hello" then
        print(firstArg)
    end
    return ...
end

function selectExample(...)
    local ok = ...
    if ok then
        -- returns all args excluding the first
        return select(2, ...)
    else
        print("Error happened")
    end
end
-- e.g. return selectExample(pcall(interceptedFunction, ...))
-- in this example the varargs belong to the intercepted function, we've injected ourselves before its execution
Elusive Blizzard said:
The Lua GC is time sensitive
Some have reported improvements by adding a do/end block around things.
Generally though, don't allocate memory that fast for no reason.
If you're using something real time, code like it, instead of being lazy.
Object reusing is a very simple approach that will alleviate the issue entirely.
For the record, garbage collection is manually triggered 100 times per second.
By the game
To ensure it is collected in a """deterministic""" way.
That said, imagine they ran a single full GC instead of 100x incremental
 
Last edited:

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,229
That said, imagine they ran a single full GC instead of 100x incremental
The point of incremental collection is to improve frame time consistency. Full GC cycles with a large heap could stall for more than 1 frame resulting in periodic stutter. Incremental GC helps reduce this impact by doing a lot of the work each frame.
 
Status
Not open for further replies.
Top