openresty(lua-API-正则表达式)

摘自：http://www.daileinote.com/computer/openresty/10

ngx_lua 在表 ngx.re 里提供了5个正则表达式相关函数，它们的底层实现为 nginx 的 PCRE 库，并且支持 PCRE-JIT，速度非常快，完全可以代替 Lua 标准库的字符串匹配函数。

ngx.re.match

1	syntax: captures, err = ngx.re.match(subject, regex, options?, ctx?, res_table?)

返回第一个匹配，没匹配或报错返回 nil。

捕获的结果存在 captures 里，captures[0] 代表全部，captures[1] 代表第一个括号的捕获值，没捕获到的返回 nil。

local m, err = ngx.re.match("hello, 1234", "[0-9]+")
if m then
     -- m[0] == "1234"
else
    if err then  --出错
        ngx.log(ngx.ERR, "error: ", err)
        return
    end
    --未匹配
    ngx.say("match not found")
end

支持命名捕获

local m, err = ngx.re.match("hello, 1234", "([0-9])(?<remaining>[0-9]+)")
-- m[0] == "1234"
-- m[1] == "1"
-- m[2] == "234"
-- m["remaining"] == "234"

没捕获到的返回 false。

location = /test {
    content_by_lua_block {
        local m, err = ngx.re.match("hello, world", "(world)|(hello)")

        for k,v in pairs(m) do
            ngx.say(k, " ", v)	
        end
    }
}
[root@192 ~]# curl "localhost/test?name=freecls' or '1'='1"
0 hello
1 false
2 hello

下面是常用的选项：

i 大小写不敏感
j 开启 PCRE JIT 预编译
o 编译一次模式，开启 worker 进程对编译过的 regex 缓存
J PCRE Javascript 兼容模式
m 多行模式
s 单行模式
u utf-8模式
U utf-8模式，只是不检测 utf-8有效性检测
x 扩展模式

local m, err = ngx.re.match("hello, world", "HEL LO", "ix")
 -- m[0] == "hello"
local m, err = ngx.re.match("hello, 美好生活", "HELLO, (.{2})", "iu")
 -- m[0] == "hello, 美好"
 -- m[1] == "美好"

-o 指令可以让 worker 进程缓存编译过的regex，lua_regex_cache_max_entries 指令可以用来设置缓存的个数。

可选的 ctx 参数是一个表格，里面可以有一个可选成员 pos，如果没指定，那么将在 ctx 里面设置 pos 为匹配到的字符串末尾偏移量。如果指定，那么为匹配偏移量。

local ctx = {}
local m, err = ngx.re.match("1234, hello", "[0-9]+", "", ctx)
     -- m[0] = "1234"
     -- ctx.pos == 5
local ctx = { pos = 2 }
local m, err = ngx.re.match("1234, hello", "[0-9]+", "", ctx)
     -- m[0] = "234"
     -- ctx.pos == 5

ngx.re.find

1	syntax: from, to, err = ngx.re.find(subject, regex, options?, ctx?, nth?)

类似 ngx.re.match ，只是 from 代表匹配的开始，to 代表结尾。

如果不需要捕获，最好用这个函数，不创建新的字符串或者表格，速度快。

 local s = "hello, 1234"
 local from, to, err = ngx.re.find(s, "([0-9]+)", "jo")
 if from then
     ngx.say("from: ", from)
     ngx.say("to: ", to)
     ngx.say("matched: ", string.sub(s, from, to))
 else
     if err then
         ngx.say("error: ", err)
         return
     end
     ngx.say("not matched!")
 end
from: 8
to: 11
matched: 1234

第五个参数 nth 代表从第几个子捕获算偏移，默认为0。

local str = "hello, 1234"
local from, to = ngx.re.find(str, "([0-9])([0-9]+)", "jo", nil, 2)
if from then
    ngx.say("matched 2nd submatch: ", string.sub(str, from, to))  -- yields "234"
end

ngx.re.gmatch

1	syntax: iterator, err = ngx.re.gmatch(subject, regex, options?)

类似 ngx.re.match，只是返回迭代。

local it, err = ngx.re.gmatch("hello, world!", "([a-z]+)", "i")
if not it then
    ngx.log(ngx.ERR, "error: ", err)
    return
end

while true do
    local m, err = it()
    if err then
        ngx.log(ngx.ERR, "error: ", err)
        return
    end

    if not m then
        -- no match found (any more)
        break
    end

    -- found a match
    ngx.say(m[0])
    ngx.say(m[1])
end

ngx.re.sub

1	syntax: newstr, n, err = ngx.re.sub(subject, regex, replace, options?)

替换第一次匹配并返回替换后的结果，n代表替换的次数。

local newstr, n, err = ngx.re.sub("hello, 1234", "([0-9])[0-9]", "[$0][$1]")
if newstr then
    -- newstr == "hello, [12][1]34"
    -- n == 1
else
    ngx.log(ngx.ERR, "error: ", err)
    return
end

$0、${0}代表全匹配，$1、${1}代表第一个子匹配，如果需要使用 $，可以使用 $$。

1
2
3

local newstr, n, err = ngx.re.sub("hello, 1234", "[0-9]", "${0}00")
     -- newstr == "hello, 100234"
     -- n == 1

如果指定 replace 为函数，那么可以通过函数模式组装替换后的字符串。

local func = function (m)
    return "[" .. m[0] .. "][" .. m[1] .. "]"
end
local newstr, n, err = ngx.re.sub("hello, 1234", "( [0-9] ) [0-9]", func, "x")
    -- newstr == "hello, [12][1]34"
    -- n == 1

ngx.re.gsub

1	syntax: newstr, n, err = ngx.re.gsub(subject, regex, replace, options?)

同上，只是全局替换而不是只替换第一次匹配。

local newstr, n, err = ngx.re.gsub("hello, world", "([a-z])[a-z]+", "[$0,$1]", "i")
if newstr then
    -- newstr == "[hello,h], [world,w]"
    -- n == 2
else
    ngx.log(ngx.ERR, "error: ", err)
    return
end
local func = function (m)
    return "[" .. m[0] .. "," .. m[1] .. "]"
end
local newstr, n, err = ngx.re.gsub("hello, world", "([a-z])[a-z]+", func, "i")
    -- newstr == "[hello,h], [world,w]"
    -- n == 2