7 December 2024 / PROGRAMMING, DEBUGGING, PYTHON

소프트웨어공학 퀴즈3 및 기말고사 대비

The Debugging Book에서 핵심 내용만 요약 정리해서 미사카 미코토 공부법으로 만점 쟁취해 봅시다.

Slicer

현재 발생한 에러를 일으킨 과거의 코드를 찾아내는 방법으로, 에러가 발생한 지점에서부터 거꾸로 추적해 나가는 방법입니다. 특히, Data Dependency와 Control Dependency를 명확하게 정리하는 과정이 필요합니다. 이를 위해 Slicer라는 도구를 사용합니다.

This chapter provides a Slicer class to automatically determine and visualize dynamic flows and dependencies. When we say that a variable $x$ depends on a variable $y$ (and that $y$ flows into $x$), we distinguish two kinds of dependencies:

Data dependency: $x$ is assigned a value computed from $y$.
Control dependency: A statement involving $x$ is executed only because a condition involving $y$ was evaluated, influencing the execution path.

y에 대한 Control Dependency가 있다는 것은 다시 말해, y가 포함된 조건문이 참이 되어야만 x에 대한 statement가 실행된다는 것을 의미합니다.

변수는 어떠한 statement에 의해 값이 할당되거나 평가되므로, 변수 간의 data dependency는 statement 간의 data dependency로 해석할 수도 있습니다. 예를 들어 argument로 전달된 값 중 하나를 리턴하는 함수는, return statement가 상위 노드(변수)의 초기화에 대한 data dependency를 가지게 됩니다.

Dependency grpah는 data depdendency, control depedency에 대해 각 하나씩 총 두 개를 그릴 수 있으며, depdendency graph에서 각 노드는 다음과 같이 정의됩니다.

노드: (name, location)으로 표현되며, name은 변수의 이름, location은 변수가 선언된 위치를 의미합니다. location은 (codename, lineno)으로 표현됩니다.

Slicer를 구현하면, 어떤 함수의 dependency를 찾아내고 싶을 때 다음과 같은 형태로 사용할 수 있습니다.

with Slicer() as slicer:
    <Some call to func()>

이후 slicer.graph() 또는 slicer.code()를 호출하면, 해당 함수의 dependency를 시각화하거나 코드(텍스트)로 출력할 수 있습니다.

아마도 시각화하는 것은 구현이 꽤 까다로운 부분이 있으니 텍스트 형태로 출력하는 것만 퀴즈에서는 구현하면 될 것 같습니다.

Slicer는 먼저 함수에 instrumentation을 수행한 후, 해당 함수에서 instrumentation된 코드들이 실행되면서 dependency를 추적하는데, instrumentation을 수행할 함수를 지정하여 argument로 전달할 수도 있습니다.

with 블록을 벗어나면 instrumetnation이 수행 되었던 함수가 원래되로 복원됩니다.

with Slicer(func, func1, func2) as slicer:
    <Some call to func()>

실행이 끝나면 slicer 객체를 출력함으로써 dependency와 flow를 그래프로 그리도록 할 수 있습니다. 또는 .code() 메소드를 호출하거나요.

The method all_vars() returns all variables in the dependency graph. Each variable is encoded as a pair (name, location) where location is a pair (codename, lineno). 앞서 정의한 Dependency graph에서의 노드와 동일합니다.

>>> slicer.dependencies().all_vars()

code() and graph() methods can also be applied on dependencies. The method backward_slice(var) returns a backward slice for the given variable (again given as a pair (name, location)). To retrieve where z in Line 2 came from, use:

>>> _, start_demo = inspect.getsourcelines(demo)
>>> start_demo
>>> slicer.dependencies().backward_slice(('z', (demo, start_demo + 1))).graph()  # type: ignore

Here are the classes defined in this chapter. A Slicer instruments a program, using a DependencyTracker at run time to collect Dependencies.

Dependency를 시각화하면 전체 코드 중 어떤 line number가 잘못된 결과 생성에 기여했는지 확인할 수 있고, 해당하는 코드 subset (=slice)만을 추출하여 분석할 수 있습니다. 이러한 slice는 디버깅에 있어서 매우 핵심적인 역할을 합니다. 이유는 다음과 같습니다.

버그 유발에 전혀 기여하지 않은 코드를 완전히 배제할 수 있습니다.
코드 곳곳(다른 위치, 파일, 라이브러리 등)에 흩어진 가능한 origin들을 하나의 slice로 통합함으로써 한 데 모을 수가 있습니다.

특정 변수에 대한 Slice를 얻는다는 것은, 그 변수에 대해 backward dependency를 추적하여 그 변수 한정의 dependency graph를 그리는 것이라고도 할 수 있습니다.

Dependency graph는 대체로 런타임에 일어나는 실제 값의 평가에 의해 결정되는 dynamic dependency가 그려집니다. 즉, 실행할 때마다 dependency graph가 달라질 수 있습니다. 이러한 dynamic dependency를 추적하는 것은 static analysis로는 불가능합니다.

Instrumentation

Data dependency를 자동으로 추적하기 위해, 모든 데이터에 대한 read 또는 write operation이 발생할 때마다 그것을 추적할 수 있도록 소스 코드에 instrumentation을 수행해 봅시다.

In essence, for every occurrence of a variable x being read, we replace it with

_data.get('x', x)  # returns x

and for every occurrence of a value being written to x, we replace the value with

_data.set('x', value)  # returns value

and let the _data object track these reads and writes.

Hence, an assignment such as

a = b + c

would get rewritten to

a = _data.set('a', _data.get('b', b) + _data.get('c', c))

and with every access to _data, we would track

the current location in the code, and
whether the respective variable was read or written.

For the above statement, we could deduce that b and c were read, and a was written – which makes a data dependent on b and c.

Instrumentation을 체계적으로 수행하기 위해 소스 코드의 표현 방식 중 하나인 AST(Abstract Syntax Tree)를 사용할 수 있습니다. AST는 소스 코드를 트리 구조로 표현한 것으로, 소스 코드의 구조를 분석하거나 변형하는 데 사용됩니다.

AST를 이루는 각 노드들은 다음과 같습니다.

Module: 전체 코드
FunctionDef: 함수 정의
Assign: 변수 할당
BinOp: 이항 연산
Name: 변수 이름
Call: 함수 호출
Return: 리턴

… 등

이 중에서 우리는 Name 노드에 집중할 것입니다. Name 노드는 두 개의 필드를 갖는데, id는 변수 이름을, ctx는 변수가 사용된 context를 나타냅니다. ctx는 다음과 같은 값 중 하나를 가질 수 있습니다.

Load: 변수가 읽히는 경우
Store: 변수가 쓰여지는 경우
Del: 변수가 삭제되는 경우

AST를 이용하기 위해 Python의 ast 모듈, 그리고 해당 모듈이 제공하는 NodeTransformer 클래스를 사용할 것입니다. NodeTransformer 클래스는 AST의 각 노드를 방문하면서 변형을 수행할 수 있도록 해줍니다.

NodeTransformer 클래스의 .visit() 메소드를 호출하면 .visit_<NodeName>() 메소드를 호출하게 되는데, 이 메소드를 오버로딩하여 AST 노드를 변형할 수 있습니다.

여기까지 각 Data flow와 Control flow를 추적하는 DataTracker, 그리고 DataTracker가 사용하는 메소드 .get() .set()를 소스 코드에 instrumentation 하는 NodeTransformer 클래스들을 구현 후, dependency까지 추적하는 DependencyTracker 클래스를 구현했습니다.

이제 본격적으로 Slicer 클래스를 구현해 봅시다.

`Slicer` class Implmentation

먼저 추후 상속 및 overloaded될 base Instrumenter 클래스를 작성합니다.

그 이후 Instrumenter 클래스를 상속받아 Slicer 클래스를 작성합니다.

여기서 instrumentation을 할 item이라는 것은 Python 모듈 또는 함수를 뜻합니다.

네… Slicer 클래스는 앞서 작성한 Dependencies 및 DependencyTraker 클래스를 필요로 합니다. 객체를 생성할 때, argument로 Slicer 전용 DependencyTracker 객체를 전달할 수 있습니다.

Instrumentation을 수행하기 위해, 앞서 정의한 NodeTransformer 클래스들도 필요하니 정독하고 옵시다.

Slicer의 .execute() 메소드는 변형한 트리를 실행하는 것인데, 이는 즉 변형된 트리를 기반으로 실제 Python 함수를 재정의하는 작업입니다.

The Slicer constructor accepts a log argument (default: False), which can be set to show various intermediate results:

log=True (or log=1): Show instrumented source code
log=2: Also log execution
log=3: Also log individual transformer steps
log=4: Also log source line numbers

Dynamic Instrumentation

위에서 정의한 Slicer는 Initialization을 수행할 때, instrumentation을 수행할 함수가 결정되지만, with 블록 내에서도 도중에 instrumentation을 수행하는 Dynamic Instrumentation을 구현하는 것도 가능합니다.

이렇게 하면 굳이 리스트로 대상 함수를 호출하지 않아도 with 블록 내에서 호출하는 함수에 대해 알아서 instrumentation이 수행되니 편리하죠.

다음과 같은 step으로 진행됩니다.

When DynamicSlicer.__init__() is called:
- Use the inspect module to determine the source code of the call
- Analyze the enclosed with block for function calls
- Instrument these functions
Whenever a function is about to be called (DataTracker.call())
- Create an instrumented version of that function
- Have the call() method return the instrumented function instead

Slicer의 다양한 활용

(민감한) 정보의 흐름을 추적하여 취약점이 있는지 확인할 수 있습니다.
Assertion의 coverage를 검사하여 test code의 quality를 평가할 수 있습니다.
여러 번 코드를 실행하여 dependency 간 correlation을 분석하여 Statistical Debugging을 수행할 수 있습니다.

Exercises

Dynamic Instrumentation에서 include와 exclude argument를 받아, instrumentation 수행할 함수를 선택하도록 하기
Forward Slice 구현
Forward Slicing을 이용하여, .code() 메소드에서 각 변수가 이후 영향을 주는 줄을 출력하도록 구현

Flow Assertion 구현: origin이 예상한 것과 같은지 검증 등

  def assert_flow(target: Any, source: List[Any]) -> bool:
      """
      Raise an `AssertionError` if the dependencies of `target`
      are not equal to `source`.
      """
      ...
      return True

  assert_flow(y, [x])  # ensures that `y` depends on `x` only

Checked Coverage: 각 assert 구문이 dependency를 이루는 statement의 개수 출력

Statistical Debugger

참고 문헌

The Debugging Book

소프트웨어공학 퀴즈3 및 기말고사 대비

Slicer

Instrumentation

`Slicer` class Implmentation

Dynamic Instrumentation

Slicer의 다양한 활용

Exercises

Statistical Debugger

참고 문헌

Rust - Inter-Process Communication

Retreival Augmented Language Model

Search 외장대뇌(Exterior)

Slicer

Instrumentation

Slicer class Implmentation

Dynamic Instrumentation

Slicer의 다양한 활용

Exercises

Statistical Debugger

참고 문헌

Search 외장대뇌(Exterior)

`Slicer` class Implmentation