来自 技术 2019-04-17 00:00 的文章

[20190409]latch get 参数where and why测试.txt

[20190409]latch get 参数where and why测试.txt

--//链接http://blog.itpub.net/267265/viewspace-2640890/大概了解哦latch get参数.做一个测试:

1.环境:

SYS@book> @ ver1

PORT_STRING VERSION BANNER

------------------------------ -------------- --------------------------------------------------------------------------------

x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

SYS@book> select addr, name from v$latch_parent where lower(name) like '%'||lower('test excl. parent2 l0')||'%';

ADDR NAME

---------------- ----------------------------------------

0000000060009978 test excl. parent2 l0

--//拿这个拴锁作为测试.

$ cat lwhere.sql

column "where" format a36

column location format a36

column PARENT_NAME format a30

SELECT t1.inst_id inst_id

,t1.indx indx

,t1.ksllasnam PARENT_NAME

,t2.ksllwnam "WHERE"

,t1.kslnowtf NWFAIL_COUNT

,t1.kslsleep SLEEP_COUNT

,t1.kslwscwsl WTR_SLP_COUNT

,t1.kslwsclthg LONGHOLD_COUNT

,t2.ksllwnam LOCATION

FROM x$ksllw t2, x$kslwsc t1

WHERE t2.indx = t1.indx AND lower(t1.ksllasnam) LIKE lower('%&&1%');

SYS@book> @ lwhere 'test excl. parent2 l0'

no rows selected

--//没有查询到,顺便选一个数值作为测试,我测试好像这些参数oracle并不做检查.

2.测试:

--//session 1:

SYS@book> @ spidx

SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50

---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------

295 5 65437 DEDICATED 65438 21 3 alter system kill session '295,5' immediate;

PRO

-----------------------------------

SYS@book(295.5 spid=65438 pid=21)>

SYS@book(295.5 spid=65438 pid=21)> oradebug setmypid

Statement processed.

SYS@book(295.5 spid=65438 pid=21)> oradebug peek 0x0000000060009978 4

[060009978, 06000997C) = 00000000

SYS@book(295.5 spid=65438 pid=21)> oradebug call kslgetl 0x0000000060009978 1 2 3

Function returned 1

SYS@book(295.5 spid=65438 pid=21)> oradebug peek 0x0000000060009978 4

[060009978, 06000997C) = 00000015

--//持有后写入该会话的PID号,奇怪拴锁写入pid,而mutext写入sid,不知道为什么.

--//session 2:

SYS@book> @ spidx

SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50

---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------

44 71 65440 DEDICATED 65441 27 26 alter system kill session '44,71' immediate;

PRO

-----------------------------------

SYS@book(44.71 spid=65441 pid=27)>

$ cat ll.sql

select ksuprpid pid,ksuprsid sid ,ksuprlat laddr,ksuprlnm name, ksuprlmd,ksulawhy,ksulawhr, ksulagts gets from x$ksuprlat;

--//https://andreynikolaev.wordpress.com链接有一些介绍,要获得where,why信息,要查询X$视图才行.

--//这个视图是x$ksupr的v$process的基表.x$ksuprlat仅仅包含latch的信息.

SYS@book(44.71 spid=65441 pid=27)> @ll

PID SID LADDR NAME KSUPRLMD KSULAWHY KSULAWHR GETS

------- ---------- ---------------- ---------------------------------------- ---------- -------- ---------- ----------

21 295 0000000060009978 test excl. parent2 l0 EXCLUSIVE 2 3 13

SYS@book(44.71 spid=65441 pid=27)> oradebug setmypid

Statement processed.

SYS@book(44.71 spid=65441 pid=27)> oradebug call kslgetl 0x0000000060009978 1 2 4

--//挂起!!

--//session 3:

SYS@book> @ wait

P1RAW P2RAW P3RAW P1 P2 P3 SID SERIAL# SEQ# EVENT STATUS STATE WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS

---------------- ---------------- ----- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------- ------------------- --------------- --------------- -----------

0000000060009978 0000000000000005 00 1610652024 5 0 44 71 29 latch free INACTIVE WAITING 88607826 89 Other

--//session 1:

SYS@book(295.5 spid=65438 pid=21)> oradebug call kslfre 0x0000000060009978

Function returned 0

--//session 2:

SYS@book(44.71 spid=65441 pid=27)> oradebug call kslgetl 0x0000000060009978 1 2 4

Function returned 1

--//现在成功返回.

SYS@book(44.71 spid=65441 pid=27)> @ ll

PID SID LADDR NAME KSUPRLMD KSULAWHY KSULAWHR GETS

------- ---------- ---------------- ---------------------------------------- ---------- -------- ---------- ----------

27 44 0000000060009978 test excl. parent2 l0 EXCLUSIVE 2 4 14

--//现在是session 2 持有该拴锁.

SYS@book(44.71 spid=65441 pid=27)> oradebug peek 0x0000000060009978 4

[060009978, 06000997C) = 0000001B

--//0x1B=27,与当前会话的pid=27一直.

--//链接:http://andreynikolaev.wordpress.com/2010/08/24/exclusive-latches-in-memory-and-traces-oracle-versions-7-11-2/

Oracle 11g used PID value from v$process to indicate that exclusive latch is busy. Notice that Oracle latch is hold by

process, not session. On the contrary mutexes are hold by sessions.

--//Oracle 11g使用v$process中的PID值来指示独占锁存器正处于繁忙状态。注意,Oracle闩锁是由进程而不是会话保存的。相反,互斥

--//是通过会话进行的。

Oracle not always used PID for this purpose in the past. In fact busy exclusive latch had the value of:

--//Oracle过去并不总是为此目的使用PID。事实上,忙碌的独占锁存器的值是:

1 in version 7

0xFF in versions 8.0-10.2

PID in version 11g

3.继续测试:

--//使用strace 跟踪ipcs也就是信号看看.

$ strace -f -e ipc -p 65438

Process 65438 attached - interrupt to quit

$ strace -f -e ipc -p 65441

Process 65441 attached - interrupt to quit

--//当前是session 2持有该拴锁.

SYS@book(44.71 spid=65441 pid=27)> oradebug peek 0x0000000060009978 4

[060009978, 06000997C) = 0000001B

--//session 1:

SYS@book(295.5 spid=65438 pid=21)> oradebug call kslgetl 0x0000000060009978 1 2 5

--//挂起!!

--//跟踪进程看到阻塞后执行semop睡眠等待.

$ strace -f -e ipc -p 65438

Process 65438 attached - interrupt to quit

semop(310214656, 0x7fff12c040b0, 1

--//session 2:

SYS@book(44.71 spid=65441 pid=27)> oradebug call kslfre 0x0000000060009978

Function returned 0

--//跟踪进程看到释放拴锁时发出信息.session 2进程:

$ strace -f -e ipc -p 65441

Process 65441 attached - interrupt to quit

semctl(310214656, 25, SETVAL, 0x1) = 0

--//session 1进程:

$ strace -f -e ipc -p 65438

Process 65438 attached - interrupt to quit

semop(310214656, 0x7fff12c040b0, 1) = 0

--//执行成功!!也就是oracle是通过semctl控制信号操作.

4.继续测试:

--//也就是如果使用oradebug poke修改信息应该是无效的.

--//当前是session 1持有该拴锁.

SYS@book(295.5 spid=65438 pid=21)> oradebug peek 0x0000000060009978 4

[060009978, 06000997C) = 00000015

--//session 2:

SYS@book(44.71 spid=65441 pid=27)> oradebug call kslgetl 0x0000000060009978 1 2 6

--//挂起!!

--//session 2:

$ strace -f -e ipc -p 65441

Process 65441 attached - interrupt to quit

semctl(310214656, 25, SETVAL, 0x1) = 0

semop(310214656, 0x7fffd5533a10, 1

--//session 1,人为取消看看,使用oradebug poke:

SYS@book(295.5 spid=65438 pid=21)> oradebug poke 0x0000000060009978 4 0x00000000

BEFORE: [060009978, 06000997C) = 00000015

AFTER: [060009978, 06000997C) = 00000000

--//session 2:

SYS@book(44.71 spid=65441 pid=27)> oradebug call kslgetl 0x0000000060009978 1 2 6

Function returned 1

--//session 2:

$ strace -f -e ipc -p 65441

Process 65441 attached - interrupt to quit

semctl(310214656, 25, SETVAL, 0x1) = 0

semop(310214656, 0x7fffd5533a10, 1) = 0

--//session 1:

$ strace -f -e ipc -p 65438

Process 65438 attached - interrupt to quit

semop(310214656, 0x7fff12c040b0, 1) = 0

--//session 1没有执行semctl.不过这样操作后:

SYS@book> @ ll

PID SID LADDR NAME KSUPRLMD KSULAWHY KSULAWHR GETS

------- ---------- ---------------- ---------------------------------------- --------- -------- ---------- ----------

21 295 0000000060009978 test excl. parent2 l0 EXCLUSIVE 2 6 16

27 44 0000000060009978 test excl. parent2 l0 EXCLUSIVE 2 6 16

--//2个会话都持有该拴锁.这样是有问题的.

--//session 1:

SYS@book(295.5 spid=65438 pid=21)> oradebug call kslfre 0x0000000060009978

Function returned 0

--//session 2:

SYS@book(44.71 spid=65441 pid=27)> @ ll

PID SID LADDR NAME KSUPRLMD KSULAWHY KSULAWHR GETS

------- ---------- ---------------- ---------------------------------------- ---------- -------- ---------- ----------

27 44 0000000060009978 test excl. parent2 l0 EXCLUSIVE 2 6 16

SYS@book(44.71 spid=65441 pid=27)> oradebug call kslfre 0x0000000060009978

ORA-03113: end-of-file on communication channel

ORA-24323: value not allowed

--//会话已经中断了.

总结:

--//看了链接https://andreynikolaev.wordpress.com许多文章,才知道10g,11g关于latch发生很大变化,不能再按照以前spin and sleep

--//方式操作.摘抄一段https://fritshoogland.wordpress.com/2015/07/17/oracle-12-and-latches/的内容:

This is how the gdb output looks like when the latch get in willing to wait mode is executed:

kslgetl laddr:60023a80, willing:1, where:0, why:2442

kslges 60023a80, 0, 1, 0

kslwlmod 13311368, -1780327896, 1610758784, 1

skgpwwait 13311608, -1767360, -1780326976, 0

sskgpwwait 13311608, -1767360, -1780326976, 0

semop 360451, 13310840, 1, -1

Interestingly, if the latch is not taken, this is how the latch get sequence looks like:

kslgetl laddr:60023a80, willing:1, where:0, why:2442

In other words, for getting a non shared latch in willing to wait mode:

1-the function kslgetl is called, which tries to fetch the latch.

If the latch can be taken, the function returns, if not:

2-the function kslges (kernel service latch get spinning) is called, which supposedly also tries to take the same latch.

If the latch still can not be taken, the next function is:

3-the function kslwlmod (kernel service latch waiting list modify) is entered.

In this function the process registers itself as waiting in the post/wait list.

4-the function skgpwwait (system kernel generic post/wait wait) is entered.

This function sets up the waiting for the process so it can be posted.

5-the function sskgpwwait (system system kernel generic post/wait wait)

My current understanding is the 'ss' function contain the platform specific code for database functions.

6-the (operating system) function semop (semaphore operation) is called.

This will make the process sleep waiting on a semaphore (operating system signalling mechanism). This way, the process

will not be runnable on the CPU unless the semaphore is posed.

--//这将使进程休眠等待信号量(操作系统信令机制)。这样,该方法除非发出信号量,否则将无法在CPU上运行。

--//顺便问一下https://fritshoogland.wordpress.com/2015/07/17/oracle-12-and-latches/里面提到的地址如何获得的.

Some searching around revealed that a CPU register reveals this information. Add this to the above gdb script:

break *0xc29b51

commands

silent

printf " kslges loop: %d\n", $ecx

c

end

--//有时间再看看.