WeDefense - How to debug?

1. commond line

(1) Create a new config/config_debug.yaml for debug with specific config for data loader, especially   num_workers: 0 and prefetch_factor: :

dataloader_args:
  batch_size: 10
  num_workers: 0
  pin_memory: False
  prefetch_factor:
  drop_last: True

(2) Create a run_debug.sh for debugging, you can find a sample in

  • In run_debug.sh, you can use:-m pdb same as we did for python -m pdb.py [args], here we are using torchrun

    torchrun --rdzv_backend=c10d --rdzv_endpoint=$(hostname):$((RANDOM)) --nnodes=1 --nproc_per_node=1 \
	    -m pdb \ #<- this line is added for pdb debugging.
      wedefense/bin/train.py  \
  # other parameters.
  • If you do not want to run from run_debug.sh , you can launch from torchrun directly with proper paramters and variables.

source ./path.sh
config="conf/config_debug.yaml"
exp_dir="exp/debug"
gpus="[0]"
num_avg=2
data="data/partialspoof"
data_type="raw" #shard

torchrun --rdzv_backend=c10d --rdzv_endpoint=$(hostname):$((RANDOM)) --nnodes=1 --nproc_per_node=1 \
      wedefense/bin/train.py --config $config \
        --exp_dir ${exp_dir} \
        --gpus $gpus \
        --num_avg ${num_avg} \
        --data_type "${data_type}" \
        --train_data ${data}/train/${data_type}.list \
        --train_label ${data}/train/utt2lab \
        --val_data ${data}/dev/${data_type}.list \
        --val_label ${data}/dev/utt2lab \
        ${checkpoint:+--checkpoint $checkpoint}
  # other parameters.

In the -m pdb mode, you can use command below

  • Within Pdb interpreter:

Command

Description

l / list

List 11 lines surrounding the current line

w / where

Display the file and line number of the current line

n / next

Execute the current line

s / step

Step into functions called at the current line

r / return

Execute until the current function’s return is encountered

  • Controlling Execution and Debugging

Command

Description

b [#]

Create a breakpoint at line #

b

List breakpoints and their indices

c / continue

Execute until a breakpoint is encountered

clear [#]

Clear breakpoint with index #

  • Changing Variables / Interacting with Code

Command

Description

p <name>

Print value of the variable <name>

!<expr>

Execute the expression <expr> (acts like a Python interpreter)

run [args]

Restart the debugger with sys.argv arguments [args]

q / quit

Exit the debugger

Please refer to references if you want to get more information:

[1] pdb — The Python Debugger

[2] standford physics91si - Pdb Commands

2. vscode/cursor

If you prefer IDE, you can run Run Start Debugging tools for debugging.

2.1 Option 1: Launch debug from a debug icon

image-20260119235304821

  • you may choose python debugger with arguments

    • then input them

    • or write into a config file

      {
        "version": "0.2.0",
        "configurations": [
          {
            "name": "Debug WeDefense train (args)",
            "type": "python",
            "request": "launch",
            "program": "${workspaceFolder}/debug_train.py",
            "console": "integratedTerminal",
            "justMyCode": false,
            "args": [
              "--config", "conf/config_debug.yaml",
              "--exp_dir", "exp/debug",
              "--gpus", "[0]",
              "--num_avg", "2",
              "--data", "data/partialspoof",
              "--data_type", "raw"
            ]
          }
        ]
      }
      
  • or create a debug() function inside of the script to launch, and modify the starting function

    def debug():
      # input your testing code here
    
    if __name__ == "__main__":
      debug()
    

2.2 Option 2: Prepare a debugging startup script.

(provided with chatgpt, but some issues exist for multiple gpus)

# debug_train.py
# Run with VSCode/Cursor debugger (single node, single process) like torchrun.
import os
import sys
import socket
from pathlib import Path


def find_free_port() -> int:
    """Pick an available local TCP port."""
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind(("127.0.0.1", 0))
        return s.getsockname()[1]


def main():
    # ---- align with your bash variables ----
    config = "conf/config_debug.yaml"
    exp_dir = "exp/debug"
    gpus = "[0]"
    num_avg = "2"
    data = "data/partialspoof"
    data_type = "raw"  # "shard"

    # ---- resolve paths (optional but safer) ----
    repo_root = Path(__file__).resolve().parent
    train_py = repo_root / "wedefense" / "bin" / "train.py"
    if not train_py.exists():
        raise FileNotFoundError(f"Cannot find: {train_py}")

    train_data = repo_root / data / "train" / f"{data_type}.list"
    train_label = repo_root / data / "train" / "utt2lab"
    val_data = repo_root / data / "dev" / f"{data_type}.list"
    val_label = repo_root / data / "dev" / "utt2lab"

    # ---- torchrun-equivalent args ----
    master_addr = socket.gethostname()
    master_port = find_free_port()
    rdzv_endpoint = f"{master_addr}:{master_port}"

    # If you need checkpoint, set it here; else keep None
    checkpoint = None  # e.g. "exp/debug/ckpt.pt"

    # Build argv exactly like your command
    argv = [
        "torchrun",
        "--rdzv_backend=c10d",
        f"--rdzv_endpoint={rdzv_endpoint}",
        "--nnodes=1",
        "--nproc_per_node=1",
        str(train_py),
        "--config", config,
        "--exp_dir", exp_dir,
        "--gpus", gpus,
        "--num_avg", num_avg,
        "--data_type", data_type,
        "--train_data", str(train_data),
        "--train_label", str(train_label),
        "--val_data", str(val_data),
        "--val_label", str(val_label),
    ]
    if checkpoint:
        argv += ["--checkpoint", checkpoint]

    # ---- IMPORTANT: call torch.distributed.run (what torchrun uses) ----
    # This keeps everything in a normal Python process so the debugger can attach.
    from torch.distributed.run import main as torchrun_main

    print("Launching with args:\n", " ".join(argv))
    # torchrun_main reads sys.argv
    sys.argv = argv
    torchrun_main()


if __name__ == "__main__":
    main()

[!NOTE] .ignore file will skip all files start with “debug.” Please remember to comment out this row if you want to keep track such files.