Thinking Craftsman Toolkit

Verstion : 0.6.0

Introduction

Thinking Craftsman Toolkit is a setup of tools for analyzing source code in various ways. Currently there are 3 tools

  1. Code Duplication Detector (CDD):
    Code duplication detector is similar to Copy Paste Detector (CPD) or Simian. It uses Pygments Lexer to parse the source files and uses Rabin Karp algorithm to detect the duplicates. Hence it supports all languages supported by Pygments.

  2. Token Tag Cloud (TTC) :
    Sometime back I read the blog article 'See How Noisy Your Code Is'. TTC is tool for creating various tag clouds based on token types (e.g. keywords, names, classnames etc).

  3. Treemap Visualization for Source Monitor Metrics data (SMTreemap) :
    Source Monitor is an excellent tool to generate various metrics from the source code (e.g. maxium complexity, averge compelxity, line count, block depth etc). However, it is difficult to quickly analyse this data for large code bases. Treemaps are excellent to visualize the hierarchicaldata on two dimensions (as size and color). This tool uses Tkinter to display the SourceMonitor data as treemap. You have to export the source monitor data as CSV or XML. smtreemap.py can then use this CSV or XML file as input to display the treemap

Using the Tools

  1. Using Code Duplication Detector

    cdd.py [options] <directory name>
    Duplication results are displayed sorted in the descending order of number of duplicate lines found.

    Options:

    -h, --help                     : show this help message and exit
    -p PATTERN, --pattern=PATTERN  : find duplications with files matching the pattern.
    					If file pattern is not specified, CDD will check all the extensions supported by Pygments.
    -t, --treemap                  : display the duplication as treemap 

    Treemap option shows the entire directory tree as 'treemap' and gives a 'big picture' view of proliferation of duplication.

    • Green rectangles : These files don't have any duplication
    • White rectangles : These files have low number of duplicated lines (around 10)
    • Red rectangles : These files have large number of duplicates
    • Magenta lines inside the rectangles : show the relative location of duplicate lines in the file.
  2. Using Token Tag Cloud

    ttc.py [options] <directory name>
    Token Tag cloud parses the source code files and displays three tag clouds.
    1. Tag cloud of keyword
    2. Tag cloud of class names and variable names
    3. Tag cloud of class names and function names
    The size of word is based on number of occurances of that 'token' in the various source code files The color of word is based on number of files that 'token' is found.

    Options:

    -h, --help            			: show this help message and exit
    -p PATTERN, --pattern=PATTERN	: create tag cloud of files matching the pattern.Default is '*.c
    -o OUTFILE, --outfile=OUTFILE	: outfile name. Output to stdout if not specified

  3. Using SMTreemap