This document is very much a work in progress.

1. Intro

c-xrefactory is a software tool and a project aiming at restoring the sources of that old, but very useful, tool to a state where it can be enhanced and be the foundation for a highly capable refactoring browser.

It is currently in excellent working condition, so you can use it in your daily work. I do. For information about how to do that, see the README.md.

1.1. Caution

As indicated by the README.md, this is a long term restoration project. So anything you find in this document might be old, incorrect, guesses, temporary holders for thoughts or theories. Or actually true and useful.

Especially names of variables, functions and modules is prone to change as understanding of them increases. They might also be refactored into something entirently different.

This document has progressed from non-existing, to a collection of unstructured thougths, guesses, historic anecdotes, ideas and a collection of unstructured, pre-existing, wiki pages, and is now quite useful. Perhaps it will continue to be improved and "refactored" into something valuable for anyone who venture into this project.

The last part of this document is an Archive where completely obsolete descriptions have been moved for future software archeologists to find.

1.2. Background

You will find some background about the project in the README.md.

This document tries to collect the knowledge and understanding about how c-xrefactory actually works, plans for making it better, both in terms of working with the source, its structure and its features.

Hopefully over time this will be the design documentation of c-xrefactory, which, at that time, will be a fairly well structured and useful piece of software.

1.3. Goal

Ultimately c-xrefactory could become the refactoring browser for C, the one that everybody uses. As suggested by @tajmone in GitHub issue #39, by switching to a general protocol, we could possibly plug this in to many editors.

However, to do that we need to refactor out the protocol parts. And to do that we need a better structure, and to dare to change that, we need to understand more of the intricacies of this beast, and we need tests. So the normal legacy code catch 22 applies…​

Test coverage is starting to look good, coming up to slightly above 80% at the time of writing this. Many "tests" are just "application level" execution, rather than actual tests, but also this is improving.

2. Context

c-xrefactory is designed to be an aid for programmers as they write, edit, inspect, read and improve the code they are working on.

The editor is used for usual manual manipulation of the source code. C-xrefactory interacts with the editor to provide navigation and automated edits, refactorings, through the editor.

Failed to generate image: Could not load PlantUML. Either require 'asciidoctor-diagram-plantuml' or specify the location of the PlantUML JAR(s) using the 'DIAGRAM_PLANTUML_CLASSPATH' environment variable. Alternatively a PlantUML binary can be provided (plantuml-native in $PATH).
@startuml
title <size:24>System Context View: C-xrefactory</size>

set separator none
left to right direction
skinparam ranksep 60
skinparam nodesep 30
hide stereotype

<style>
  root {
    BackgroundColor: #ffffff;
    FontColor: #444444;
  }
  // Element,DB
  .Element-RWxlbWVudCxEQg== {
    BackgroundColor: #ffffff;
    LineColor: #444444;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #444444;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,Person,Ext
  .Element-RWxlbWVudCxQZXJzb24sRXh0 {
    BackgroundColor: #686868;
    LineColor: #484848;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,Software System
  .Element-RWxlbWVudCxTb2Z0d2FyZSBTeXN0ZW0= {
    BackgroundColor: #1168bd;
    LineColor: #0b4884;
    LineStyle: 0;
    LineThickness: 2;
    RoundCorner: 20;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,Software System,Editor,Ext
  .Element-RWxlbWVudCxTb2Z0d2FyZSBTeXN0ZW0sRWRpdG9yLEV4dA== {
    BackgroundColor: #686868;
    LineColor: #484848;
    LineStyle: 0;
    LineThickness: 2;
    RoundCorner: 20;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Relationship
  .Relationship-UmVsYXRpb25zaGlw {
    LineThickness: 2;
    LineStyle: 10-10;
    LineColor: #444444;
    FontColor: #444444;
    FontSize: 24;
  }
</style>

person "==Developer\n<size:16>[Person]</size>\n\nEdits source code using an editor" <<Element-RWxlbWVudCxQZXJzb24sRXh0>> as Developer
rectangle "==Editor\n<size:16>[Software System]</size>\n\nAllows Developer to modify source code and perform refactoring operations" <<Element-RWxlbWVudCxTb2Z0d2FyZSBTeXN0ZW0sRWRpdG9yLEV4dA==>> as Editor
database "==Source Code\n<size:16>[a set of files stored on disk]</size>" <<Element-RWxlbWVudCxEQg==>> as 3
rectangle "==C-xrefactory\n<size:16>[Software System]</size>\n\nAnalyses source code, receives and processes requests for navigation and refactoring" <<Element-RWxlbWVudCxTb2Z0d2FyZSBTeXN0ZW0=>> as Cxrefactory

Editor --> Cxrefactory <<Relationship-UmVsYXRpb25zaGlw>> : "extends functionality using\n<size:16>[plugin]</size>"
Cxrefactory --> 3 <<Relationship-UmVsYXRpb25zaGlw>> : "reads source code from\n<size:16>[file I/O]</size>"
Developer --> Editor <<Relationship-UmVsYXRpb25zaGlw>> : "usual editor/IDE operations"
Developer --> Cxrefactory <<Relationship-UmVsYXRpb25zaGlw>> : "configuration and command line invocations"
Editor --> 3 <<Relationship-UmVsYXRpb25zaGlw>> : "normal editing operations"
Editor --> Cxrefactory <<Relationship-UmVsYXRpb25zaGlw>> : "navigation and refactoring requests"
Cxrefactory --> Editor <<Relationship-UmVsYXRpb25zaGlw>> : "positioning and editing responses"
Cxrefactory --> 3 <<Relationship-UmVsYXRpb25zaGlw>> : "read/analyze"

@enduml

3. Functional Overview

The c-xref program is actually a mish-mash of a multitude of features baked into one program. This is the major cause of the mess that it is source-wise.

It was

  • a generator for persistent cross-reference data

  • a reference server for editors, serving cross-reference, navigational and completion data over a protocol

  • a refactoring server (the worlds first to cross the Refactoring Rubicon)

  • an HTML cross-reference generator (probably the root of the project) (REMOVED)

  • a C macro generator for structure fill (and other) functions (REMOVED)

It is the first three that are unique and constitutes the great value of this project. The last two have been removed from the source, the last one because it was a hack and prevented modern, tidy, building, coding and refactoring. The HTML cross-reference generator has been superseeded by modern alternatives like Doxygen and is not at the core of the goal of this project.

One might surmise that it was the HTML-crossreference generator that was the initial purpose of what the original Xrefactory was based upon. Once that was in place the other followed, and were basically only bolted on top without much re-architecting the C sources.

What we’d like to do is partition the project into separate parts, each having a clear usage.

The following sections are aimed at describing various features of c-xrefactory.

3.1. Options, option files and configuration

The current version of C-xrefactory allows only two possible sets of configuration/options.

The primary storage is (currently) the file $HOME/.c-xrefrc which stores the "standard" options for all projects. Each project has a separate section which is started with a section marker, the project name surrounded by square brackets, [project1].

When you start c-xref you can use the command line option -xrefrc to request that a particular option file should be used instead of the "standard options".

When running the edit server there seems to be no way to indicate a different options files for different projects/files. Although you can start the server with -xrefrc you will be stuck with that in the whole session and for all projects.

3.2. LSP

The LSP protocol is a common protocol for language servers such as clangd and c-xrefactory. It allows an editor (client) to interface to a server to request information, such as reference positions, and operations, such as refactorings, without knowing exactly which server it talks to.

Recent versions of c-xrefactory have an initial implementation of a very small portion of the LSP protocol. The plan is to fully integrate the functionality of c-xrefactory into the LSP protocol. This will allow use of c-xrefactory from not only Emacs but also Visual Studio Code or any other editor that supports the LSP protocol.

3.2.1. LSP Protocol Limitations

The LSP protocol was designed for single-shot, non-interactive operations. This creates constraints for c-xrefactory’s advanced refactorings:

Interactive Refactorings: C-xrefactory’s extract/parameter operations require multi-step user input (names, positions, declarations). LSP’s textDocument/codeAction doesn’t support interactive dialogs.

Symbol Browsing: C-xrefactory provides interactive symbol browsers with filtering and keyboard navigation. LSP returns flat reference lists with no standard for interactive UI.

Strategy: The LSP implementation aims to:

  • Provide basic IDE features (definition, completion, simple refactorings) to modern editors

  • Expose c-xrefactory’s advanced refactoring capabilities where possible

  • Keep the Emacs client as the primary interface for full interactive features

LSP serves to make c-xrefactory more accessible while the Emacs client probably will remain the gateway to its complete refactoring power.

4. Quality Attributes

The most important quality attributes are

  • correctness - a refactoring should never alter the behaviour of the refactored code

  • completness - no reference to a symbol should ever be missed

  • performance - a refactoring should be sufficiently quick so the user keeps focus on the task at hand

5. Constraints

TBD.

6. Principles

6.1. Reference Database and Parsing

The reference database is used only to hold externally visible identifiers to ensure that references to an identifier can be found across all files in the used source.

All symbols that are only visible inside a unit is handled by reparsing the file of interest.

6.2. TBD.

7. Software Architecture

7.1. Container View

Failed to generate image: Could not load PlantUML. Either require 'asciidoctor-diagram-plantuml' or specify the location of the PlantUML JAR(s) using the 'DIAGRAM_PLANTUML_CLASSPATH' environment variable. Alternatively a PlantUML binary can be provided (plantuml-native in $PATH).
@startuml
title <size:24>Container View: C-xrefactory</size>

set separator none
left to right direction
skinparam ranksep 60
skinparam nodesep 30
hide stereotype

<style>
  root {
    BackgroundColor: #ffffff;
    FontColor: #444444;
  }
  // Element,Container
  .Element-RWxlbWVudCxDb250YWluZXI= {
    BackgroundColor: #438dd5;
    LineColor: #2e6295;
    LineStyle: 0;
    LineThickness: 2;
    RoundCorner: 20;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,Container,DB
  .Element-RWxlbWVudCxDb250YWluZXIsREI= {
    BackgroundColor: #438dd5;
    LineColor: #2e6295;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,DB
  .Element-RWxlbWVudCxEQg== {
    BackgroundColor: #ffffff;
    LineColor: #444444;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #444444;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,Person,Ext
  .Element-RWxlbWVudCxQZXJzb24sRXh0 {
    BackgroundColor: #686868;
    LineColor: #484848;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,Software System,Editor,Ext
  .Element-RWxlbWVudCxTb2Z0d2FyZSBTeXN0ZW0sRWRpdG9yLEV4dA== {
    BackgroundColor: #686868;
    LineColor: #484848;
    LineStyle: 0;
    LineThickness: 2;
    RoundCorner: 20;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Relationship
  .Relationship-UmVsYXRpb25zaGlw {
    LineThickness: 2;
    LineStyle: 10-10;
    LineColor: #444444;
    FontColor: #444444;
    FontSize: 24;
  }
  // C-xrefactory
  .Boundary-Qy14cmVmYWN0b3J5 {
    BackgroundColor: #ffffff;
    LineColor: #0b4884;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #1168bd;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
  }
</style>

person "==Developer\n<size:16>[Person]</size>\n\nEdits source code using an editor" <<Element-RWxlbWVudCxQZXJzb24sRXh0>> as Developer
rectangle "==Editor\n<size:16>[Software System]</size>\n\nAllows Developer to modify source code and perform refactoring operations" <<Element-RWxlbWVudCxTb2Z0d2FyZSBTeXN0ZW0sRWRpdG9yLEV4dA==>> as Editor
database "==Source Code\n<size:16>[a set of files stored on disk]</size>" <<Element-RWxlbWVudCxEQg==>> as 3

rectangle "C-xrefactory\n<size:16>[Software System]</size>" <<Boundary-Qy14cmVmYWN0b3J5>> {
  rectangle "==cxrefCore\n<size:16>[Container: Refactoring Browser core]</size>\n\nC Language program" <<Element-RWxlbWVudCxDb250YWluZXI=>> as Cxrefactory.cxrefCore
  database "==settingsStore\n<size:16>[Container: Configuration file for project settings]</size>\n\nNon-standard format settings file" <<Element-RWxlbWVudCxDb250YWluZXIsREI=>> as Cxrefactory.settingsStore
  rectangle "==editorExtension\n<size:16>[Container: Plugin]</size>\n\nExtends the Editor with c-xref operations and interfaces to the c-xrefactory API" <<Element-RWxlbWVudCxDb250YWluZXI=>> as Cxrefactory.editorExtension
  database "==referencesDb\n<size:16>[Container: Persistent symbol database storing references, definitions, and metadata for all project symbols. Created via -create/-update operations, indexed by symbol hash for fast lookup]</size>\n\nCross-Reference Database (.cx files)" <<Element-RWxlbWVudCxDb250YWluZXIsREI=>> as Cxrefactory.referencesDb
}

Editor --> Cxrefactory.editorExtension <<Relationship-UmVsYXRpb25zaGlw>> : "extends functionality using\n<size:16>[plugin]</size>"
Cxrefactory.cxrefCore --> 3 <<Relationship-UmVsYXRpb25zaGlw>> : "reads source code from\n<size:16>[file I/O]</size>"
Cxrefactory.cxrefCore --> Cxrefactory.referencesDb <<Relationship-UmVsYXRpb25zaGlw>> : "builds and queries\n<size:16>[hash-based lookup]</size>"
Cxrefactory.editorExtension --> Cxrefactory.settingsStore <<Relationship-UmVsYXRpb25zaGlw>> : "writes\n<size:16>[new project wizard]</size>"
Cxrefactory.cxrefCore --> Cxrefactory.settingsStore <<Relationship-UmVsYXRpb25zaGlw>> : "read"
Cxrefactory.editorExtension --> Cxrefactory.cxrefCore <<Relationship-UmVsYXRpb25zaGlw>> : "API\n<size:16>[requests information and gets commands to modify source code]</size>"
Cxrefactory.cxrefCore --> Cxrefactory.referencesDb <<Relationship-UmVsYXRpb25zaGlw>> : "read/write"
Cxrefactory.cxrefCore --> 3 <<Relationship-UmVsYXRpb25zaGlw>> : "read/analyze"
Developer --> Editor <<Relationship-UmVsYXRpb25zaGlw>> : "usual editor/IDE operations"
Editor --> 3 <<Relationship-UmVsYXRpb25zaGlw>> : "normal editing operations"
Editor --> Cxrefactory.editorExtension <<Relationship-UmVsYXRpb25zaGlw>> : "extends\n<size:16>[Editor extension protocol]</size>"
Developer --> Cxrefactory.settingsStore <<Relationship-UmVsYXRpb25zaGlw>> : "edit"
Cxrefactory.editorExtension --> 3 <<Relationship-UmVsYXRpb25zaGlw>> : "extended c-xrefactory operations"

@enduml

7.2. Containers

At this point the description of the internal structure of the containers are tentative. The actual interfaces are not particularly clean, most code files can and do include much every other module.

One focus area for the ongoing work is to try to pry out modules/components from the code mess by moving functions around, renaming and hiding functions, where possible.

7.2.1. CxrefCore

Failed to generate image: Could not load PlantUML. Either require 'asciidoctor-diagram-plantuml' or specify the location of the PlantUML JAR(s) using the 'DIAGRAM_PLANTUML_CLASSPATH' environment variable. Alternatively a PlantUML binary can be provided (plantuml-native in $PATH).
@startuml
title <size:24>Component View: C-xrefactory - cxrefCore</size>

set separator none
left to right direction
skinparam ranksep 60
skinparam nodesep 30
hide stereotype

<style>
  root {
    BackgroundColor: #ffffff;
    FontColor: #444444;
  }
  // Element,Component
  .Element-RWxlbWVudCxDb21wb25lbnQ= {
    BackgroundColor: #85bbf0;
    LineColor: #5d82a8;
    LineStyle: 0;
    LineThickness: 2;
    RoundCorner: 20;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,Container,DB
  .Element-RWxlbWVudCxDb250YWluZXIsREI= {
    BackgroundColor: #438dd5;
    LineColor: #2e6295;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,DB
  .Element-RWxlbWVudCxEQg== {
    BackgroundColor: #ffffff;
    LineColor: #444444;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #444444;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Relationship
  .Relationship-UmVsYXRpb25zaGlw {
    LineThickness: 2;
    LineStyle: 10-10;
    LineColor: #444444;
    FontColor: #444444;
    FontSize: 24;
  }
  // C-xrefactory
  .Boundary-Qy14cmVmYWN0b3J5 {
    BackgroundColor: #ffffff;
    LineColor: #0b4884;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #1168bd;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
  }
  // cxrefCore
  .Boundary-Y3hyZWZDb3Jl {
    BackgroundColor: #ffffff;
    LineColor: #2e6295;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #438dd5;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
  }
</style>

database "==Source Code\n<size:16>[a set of files stored on disk]</size>" <<Element-RWxlbWVudCxEQg==>> as 3

rectangle "C-xrefactory\n<size:16>[Software System]</size>" <<Boundary-Qy14cmVmYWN0b3J5>> {
  rectangle "cxrefCore\n<size:16>[Container: Refactoring Browser core]</size>" <<Boundary-Y3hyZWZDb3Jl>> {
    rectangle "==main\n<size:16>[Component: C]</size>\n\nMain program" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.main
    rectangle "==xref\n<size:16>[Component: C]</size>\n\nCross-referencer" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.xref
    rectangle "==server\n<size:16>[Component: C]</size>\n\nEditor Server" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.server
    rectangle "==refactory\n<size:16>[Component: C]</size>\n\nRefactory" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.refactory
    rectangle "==lsp\n<size:16>[Component: Language Server Protocol implementation for IDE integration - currently limited by symbol database architecture]</size>\n\nLSP Server" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.lsp
    rectangle "==lspAdapter\n<size:16>[Component: Converts LSP requests to c-xrefactory operations - needs symbol database abstraction]</size>\n\nLSP Adapter" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.lspAdapter
    rectangle "==cxref\n<size:16>[Component: C]</size>\n\nReference handler" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.cxref
    rectangle "==cxfile\n<size:16>[Component: Manages persistent cross-reference database with symbol indexing and file-based storage]</size>\n\nReference Storage (.cx files)" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.cxfile
    rectangle "==browserStack\n<size:16>[Component: Runtime symbol context stack for navigation, containing current references and symbol menus]</size>\n\nBrowser Stack" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.browserStack
    rectangle "==symbolResolver\n<size:16>[Component: Converts cursor positions to symbols, loads references, and finds definitions]</size>\n\nSymbol Resolution Pipeline" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.symbolResolver
    rectangle "==referenceTable\n<size:16>[Component: Runtime symbol cache loaded from .cx files for active session]</size>\n\nIn-Memory Reference Table" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.referenceTable
    rectangle "==yylex\n<size:16>[Component: C]</size>\n\nLexical Analyser" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.yylex
    rectangle "==parser\n<size:16>[Component: C]</size>\n\nParser" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.cxrefCore.parser
  }

  database "==referencesDb\n<size:16>[Container: Persistent symbol database storing references, definitions, and metadata for all project symbols. Created via -create/-update operations, indexed by symbol hash for fast lookup]</size>\n\nCross-Reference Database (.cx files)" <<Element-RWxlbWVudCxDb250YWluZXIsREI=>> as Cxrefactory.referencesDb
}

Cxrefactory.cxrefCore.main --> Cxrefactory.cxrefCore.xref <<Relationship-UmVsYXRpb25zaGlw>> : "dispatches to\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.main --> Cxrefactory.cxrefCore.server <<Relationship-UmVsYXRpb25zaGlw>> : "dispatches to\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.main --> Cxrefactory.cxrefCore.refactory <<Relationship-UmVsYXRpb25zaGlw>> : "dispatches to\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.main --> Cxrefactory.cxrefCore.lsp <<Relationship-UmVsYXRpb25zaGlw>> : "dispatches to\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.refactory --> Cxrefactory.cxrefCore.server <<Relationship-UmVsYXRpb25zaGlw>> : "uses\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.cxref --> Cxrefactory.cxrefCore.symbolResolver <<Relationship-UmVsYXRpb25zaGlw>> : "delegates symbol lookup to\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.symbolResolver --> Cxrefactory.cxrefCore.cxfile <<Relationship-UmVsYXRpb25zaGlw>> : "loads symbol data via\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.symbolResolver --> Cxrefactory.cxrefCore.referenceTable <<Relationship-UmVsYXRpb25zaGlw>> : "populates and queries\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.symbolResolver --> Cxrefactory.cxrefCore.browserStack <<Relationship-UmVsYXRpb25zaGlw>> : "builds navigation context in\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.cxfile --> Cxrefactory.cxrefCore.referenceTable <<Relationship-UmVsYXRpb25zaGlw>> : "loads symbols into\n<size:16>[batch loading]</size>"
Cxrefactory.cxrefCore.lsp --> Cxrefactory.cxrefCore.lspAdapter <<Relationship-UmVsYXRpb25zaGlw>> : "delegates to\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.lspAdapter --> Cxrefactory.cxrefCore.refactory <<Relationship-UmVsYXRpb25zaGlw>> : "uses for refactoring\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.lspAdapter --> Cxrefactory.cxrefCore.cxref <<Relationship-UmVsYXRpb25zaGlw>> : "uses for navigation\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.lspAdapter --> Cxrefactory.cxrefCore.symbolResolver <<Relationship-UmVsYXRpb25zaGlw>> : "attempts symbol lookup via\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.xref --> Cxrefactory.cxrefCore.cxref <<Relationship-UmVsYXRpb25zaGlw>> : "handles references using\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.server --> Cxrefactory.cxrefCore.cxref <<Relationship-UmVsYXRpb25zaGlw>> : "handles references using\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.refactory --> Cxrefactory.cxrefCore.cxref <<Relationship-UmVsYXRpb25zaGlw>> : "handles references using\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.parser --> Cxrefactory.cxrefCore.yylex <<Relationship-UmVsYXRpb25zaGlw>> : "reads tokenized source using\n<size:16>[buffering]</size>"
Cxrefactory.cxrefCore.xref --> Cxrefactory.cxrefCore.parser <<Relationship-UmVsYXRpb25zaGlw>> : "parses source code using\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.server --> Cxrefactory.cxrefCore.parser <<Relationship-UmVsYXRpb25zaGlw>> : "parses source code using\n<size:16>[call]</size>"
Cxrefactory.cxrefCore.yylex --> 3 <<Relationship-UmVsYXRpb25zaGlw>> : "reads source code from\n<size:16>[file I/O]</size>"
Cxrefactory.cxrefCore.cxfile --> Cxrefactory.referencesDb <<Relationship-UmVsYXRpb25zaGlw>> : "builds and queries\n<size:16>[hash-based lookup]</size>"

@enduml

cxrefCore is the core container. It does all the work when it comes to finding and reporting references to symbols, communicating refactoring requests as well as storing reference information for longer term storage and caching.

Although c-xref can be used as a command line tool, which can be handy when debugging or exploring, it is normally used in "server" mode. In server mode the communication between the editor extension and the cxrefCore container is a back-and-forth communication using a non-standard protocol over standard pipes.

The responsibilities of cxrefCore can largely be divided into

  • parsing source files to create, maintain the references database which stores all inter-module references

  • parsing source files to get important information such as positions for a functions begin and end

  • managing editor buffer state (as it might differ from the file on disc)

  • performing symbol navigation

  • creating and serving completion suggestions

  • performing refactorings such as renames, extracts and parameter manipulation

At this point it seems like refactorings are performed as separate invocations of c-xref rather than through the server interface.

7.2.2. EditorExtension

Failed to generate image: Could not load PlantUML. Either require 'asciidoctor-diagram-plantuml' or specify the location of the PlantUML JAR(s) using the 'DIAGRAM_PLANTUML_CLASSPATH' environment variable. Alternatively a PlantUML binary can be provided (plantuml-native in $PATH).
@startuml
title <size:24>Component View: C-xrefactory - editorExtension</size>

set separator none
left to right direction
skinparam ranksep 60
skinparam nodesep 30
hide stereotype

<style>
  root {
    BackgroundColor: #ffffff;
    FontColor: #444444;
  }
  // Element,Component
  .Element-RWxlbWVudCxDb21wb25lbnQ= {
    BackgroundColor: #85bbf0;
    LineColor: #5d82a8;
    LineStyle: 0;
    LineThickness: 2;
    RoundCorner: 20;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Element,Software System,Editor,Ext
  .Element-RWxlbWVudCxTb2Z0d2FyZSBTeXN0ZW0sRWRpdG9yLEV4dA== {
    BackgroundColor: #686868;
    LineColor: #484848;
    LineStyle: 0;
    LineThickness: 2;
    RoundCorner: 20;
    FontColor: #ffffff;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
    MaximumWidth: 450;
  }
  // Relationship
  .Relationship-UmVsYXRpb25zaGlw {
    LineThickness: 2;
    LineStyle: 10-10;
    LineColor: #444444;
    FontColor: #444444;
    FontSize: 24;
  }
  // C-xrefactory
  .Boundary-Qy14cmVmYWN0b3J5 {
    BackgroundColor: #ffffff;
    LineColor: #0b4884;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #1168bd;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
  }
  // editorExtension
  .Boundary-ZWRpdG9yRXh0ZW5zaW9u {
    BackgroundColor: #ffffff;
    LineColor: #2e6295;
    LineStyle: 0;
    LineThickness: 2;
    FontColor: #438dd5;
    FontSize: 24;
    HorizontalAlignment: center;
    Shadowing: 0;
  }
</style>

rectangle "==Editor\n<size:16>[Software System]</size>\n\nAllows Developer to modify source code and perform refactoring operations" <<Element-RWxlbWVudCxTb2Z0d2FyZSBTeXN0ZW0sRWRpdG9yLEV4dA==>> as Editor

rectangle "C-xrefactory\n<size:16>[Software System]</size>" <<Boundary-Qy14cmVmYWN0b3J5>> {
  rectangle "editorExtension\n<size:16>[Container: Plugin]</size>" <<Boundary-ZWRpdG9yRXh0ZW5zaW9u>> {
    rectangle "==cxref.el\n<size:16>[Component: elisp]</size>" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.editorExtension.cxrefel
    rectangle "==cxrefactory.el\n<size:16>[Component: elisp]</size>" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.editorExtension.cxrefactoryel
    rectangle "==cxrefprotocol.el\n<size:16>[Component: elisp]</size>" <<Element-RWxlbWVudCxDb21wb25lbnQ=>> as Cxrefactory.editorExtension.cxrefprotocolel
  }

}

Editor --> Cxrefactory.editorExtension.cxrefactoryel <<Relationship-UmVsYXRpb25zaGlw>> : "extends functionality using\n<size:16>[plugin]</size>"

@enduml

The EditorExtension container is responsible for plugging into an editor of choice and handle the user interface, buffer management and executing the refactoring edit operations.

Currently there is only one such extension supported, for Emacs, although there existed code, still available in the repo history, for an extension for jEdit which hasn’t been updated, modified or checked for a long time and no longer is a part of this project.

7.2.3. ReferencesDB

The References database stores crossreferencing information for symbols visible outside the module it is defined in. Information about local/static symbols are not stored but gathered by parsing that particular source file on demand.

Currently this information is stored in a somewhat cryptic, optimized text format.

This storage can be divided into multiple files, probably for faster access. Symbols are then hashed to know which of the "database" files it is stored in. As all crossreferencing information for a symbol is stored in the same "record", this allows reading only a single file when a symbol is looked up.

8. Code

8.1. Commands

The editorExtension calls the server using command line options. These are then converted to first a command enum starting in OLO ("on-line operation") or AVR ("available refactoring").

Some times the server needs to call the crossreferencer which is performed in the same manner, command line options, but this call is internal so the wanted arguments are stored in a vector which is passed to the xref() in the same manner as main() passes the actual argc/argv.

Many of the commands require extra arguments like positions/markers which are passed in as extra arguments. E.g. a rename requires the name to rename to which is sent in the renameto= option, which is argparsed and stored in the option structure.

Some of these extra arguments are fairly random, like -olcxparnum= and -olcxparnum2=. This should be cleaned up.

A move towards "events" with arguments would be helpful. This would mean that we need to:

  • List all "events" that c-xref need to handle

  • Define the parameters/extra info that each of them need

  • Clean up the command line options to follow this

  • Create event structures to match each event and pass this to server, xref and refactory

  • Rebuild the main command loop to parse command line options into event structures

8.2. Passes

There is a variable in main() called firstPassing which is set and passed down through mainEditServer() until it is reset in mainFileProcessingInitialisations().

This is probably related to the fact that c-xref allows for passing over the analyzed source multiple passes in case you compile the project sources with different C defines. Variables in the c-xref sources indicate this, e.g the loops in mainEditServerProcessFile() and mainXrefProcessInputFile() (which are both strangely limited by setting the maxPass variable to 1 before entering the loop…​).

8.3. Parsers

C-xref uses a patched version of Berkley yacc to generate parsers. There are a number of parsers

  • C

  • Yacc

  • C pre-processor expressions

There might also exist small traces of the Java parser, which was previously a part of the free c-xref, and the C++ parser that existed but was proprietary.

The patch to byacc is mainly to the skeleton and seems to relate mostly to handling of errors and adding a recursive parsing feature that is required for Java, which was supported previously. It is not impossible that this patch might not be necessary now that Java parsing is not necessary, but this has not been tried.

Some changes are also made to be able to accomodate multiple parsers in the same executable, mostly solved by CPP macros renaming the parsing datastructures so that they can be accessed using the standard names in the parsing skeleton. The Makefile generates the parsers and renames the generated files as appropriate.

8.4. Integrated Preprocessor

C-xrefactory includes its own integrated C preprocessor implementation rather than using the system’s preprocessor (cpp, clang preprocessor, etc.). This is a crucial architectural decision that enables core functionality.

8.4.1. Why Not Use the System Preprocessor?

Using an external preprocessor would mean that all macros would be expanded before c-xrefactory sees the code. This would make it impossible to:

  • Navigate to macro definitions

  • Show macro usage and references

  • Refactor macro names and parameters

  • Complete macro identifiers

  • See macro arguments as distinct symbols

  • Understand the code structure as the programmer wrote it

By parsing the source at the macro level, c-xrefactory operates as a source-level tool rather than a post-preprocessed tool. This allows it to work with the code as developers see and write it, preserving all macro information for navigation and refactoring.

8.4.2. Implementation Details

The integrated preprocessor is implemented in yylex.c and includes:

  • Macro definition handling (processDefineDirective())

  • Macro expansion with argument substitution

  • Conditional compilation (#if, #ifdef, #ifndef, #elif, #else, #endif)

  • Include file processing (#include, #include_next)

  • Pragma directives (limited support)

  • Expression evaluation for #if directives (cppexp_parser.y)

8.4.3. Limitations

The integrated preprocessor does not support all modern preprocessor features:

  • Platform-specific predefined macros (like arm64, x86_64) are not automatically defined

  • Some compiler-specific extensions may not be recognized (e.g. has_feature(), building_module())

These limitations mean that c-xrefactory may report syntax errors when encountering modern platform headers that use these features, even though the code compiles correctly with standard compilers.

8.4.4. Trade-offs

This design choice represents a fundamental trade-off:

  • Gained: Full macro-level navigation and refactoring capabilities

  • Lost: Perfect compatibility with all preprocessor extensions and platform-specific features

The benefit of macro-level refactoring is considered more valuable than perfect preprocessor compatibility, as it is a key differentiator for c-xrefactory.

8.5. Refactoring and the parsers

Some refactorings need more detailed information about the code, maybe all do?

One example, at least, is parameter manipulation. Then the refactorer calls the appropriate parser (serverEditParseBuffer()) which collects information in the corresponding semantic actions. This information is stored in various global variables, like parameterBeginPosition.

The parser is filling out a ParsedInfo structure which conveys information that can be used e.g. when extracting functions etc.

At this point I don’t understand exactly how this interaction is performed, there seems to be no way to parse only appropriate parts, so the whole file need to be re-parsed.

Findings:

  • some global variables are set as a result of command line and arguments parsing, depending on which "command" the server is acting on

  • the semantic rules in the parser(s) contains code that matches these global variables and then inserts special lexems in the lexem stream

One example is how a Java 'move static method' was performed. It requires a target position. That position is transferred from command line options to global variables. When the Java parser was parsing a class or similar it (or rather the lexer) looks at that "ddtarget position information" and inserts a OL_MARKER_TOKEN in the stream.

TODO: What extra "operation" the parsing should perform and return data for should be packaged into some type of "command" or parameter object that should be passed to the parser, rather than relying on global variables.

8.6. Reading Files

Here are some speculations about how the complex file reading is structured.

Each file is identified by a filenumber, which is an index into the file table, and seems to have a lexBuffer tied to it so that you can just continue from where ever you were. That in turn contains a CharacterBuffer that handles the actual character reading.

And there is also an "editorBuffer"…​

The intricate interactions between these are hard to follow as the code here are littered with short character names which are copies of fields in the structures, and infested with many macros, probably in an ignorant attempt at optimizing. ("The root of all evil is premature optimization" and "Make it work, make it right, make it fast".)

It seems that everything start in initInput() in yylex.c where the only existing call to fillFileDescriptor() is made. But you might wonder why this function does some initial reading, this should be pushed down to the buffers in the file descriptor.

8.6.1. Lexing/scanning

Lexing/scanning is performed in two layers, one in lexer.c which seems to be doing the actual lexing into lexems which are put in a lexembuffer. This contains a sequence of encoded and compressed symbols which first has a LexemCode which is followed by extra data, like Position. These seems to always be added but not always necessary.

The higher level "scanning" is performed, as per ususal, by yylex.c. lexembuffer defines some functions to put and get lexems, chars (identifiers and file names?) as well as integers and positions.

At this point the put/get lexem functions take a pointer to a pointer to chars (which presumably is the lexem stream in the lexembuffer) which it also advances. This requires the caller to manage the LexemBuffer’s internal pointers outside and finally set them right when done.

It would be much better to call the "putLexem()"-functions with a lexemBuffer but there seems to be a few cases where the destination (often dd) is not a lexem stream inside a lexemBuffer. These might be related to macro handling.

This is a work-in-progress. Currently most of the "normal" usages are prepared to use the LexemBuffer’s pointers. But the handling of macros and defines are cases where the lexems are not put in a LexemBuffer. See the TODO.org for current status of this Mikado sequence.

8.6.2. Semantic information

As the refactoring functions need some amount of semantic information, in the sense of information gathered during parsing, this information is collected in various ways when c-xref calls the "sub-task" to do the parsing required.

Two structures hold information about various things, among which are the memory index at certain points of the parsing. Thus it is possible to verify e.g. that a editor region does not cover a break in block or function structure. This structure is, at the point of writing, called parsedInfo and definitely need to be tidied up.

8.7. Reference Database

c-xref run in "xref" mode creates, or updates, a database of references for all externally visible symbols it encounters.

A good design should have a clean and generic interface to the reference database, but this is still a work in progress to chisel this out.

8.7.1. Architecture Overview

c-xrefactory’s core functionality relies on a sophisticated symbol database that stores cross-references, definitions, and usage information for all symbols in a project. The database is implemented as a collection of binary .cx files and supports:

  • Creation via -create operations that parse all project files

  • Updates via -update operations (incremental or full)

  • Queries during symbol lookup operations

8.7.2. Database Structure and Format

The symbol database uses a hash-partitioned file structure:

cxrefs/
├── files        # File metadata and paths
├── 0000         # Symbol data for hash bucket 0
├── 0001         # Symbol data for hash bucket 1
└── ...          # Additional hash buckets based on referenceFileCount

Each .cx file contains structured records with specific keys:

  • CXFI_FILE_NAME: File paths and metadata

  • CXFI_SYMBOL_NAME: Symbol names, types, and attributes

  • CXFI_REFERENCE: Individual symbol references with positions and usage types

8.7.3. Symbol Resolution Pipeline

The symbol lookup process follows this pipeline:

1. Cursor Position Input
   ↓
2. Parse current file to identify symbol at position
   ↓
3. scanReferencesToCreateMenu(symbolName)
   ↓
4. Load symbol data from .cx files
   ↓
5. Populate sessionData.browserStack
   ↓
6. Find best definition via olcxOrderRefsAndGotoDefinition()
   ↓
7. Navigate to definition position

8.7.4. Key Data Structures

Browser Stack (sessionData.browserStack)

The browser stack is the runtime data structure for symbol navigation:

typedef struct {
    OlcxReferences *top;           // Current symbol being browsed
    OlcxReferences *root;          // Stack base
} OlcxReferencesStack;

typedef struct {
    Reference *current;            // Current reference
    Reference *references;         // All references for symbol
    SymbolsMenu *symbolsMenu;      // Available symbols for selection
    Position callerPosition;       // Where lookup was initiated
    ServerOperation operation;     // Type of operation (OLO_PUSH, etc.)
} OlcxReferences;
Reference Information
typedef struct {
    Position position;             // File, line, column
    Usage usage;                   // UsageDefined, UsageDeclared, UsageUsed
    struct Reference *next;        // Linked list of references
} Reference;

typedef struct {
    char *linkName;               // Symbol name with scope information
    Type type;                    // Symbol type (function, variable, etc.)
    Storage storage;              // Storage class
    Scope scope;                  // Visibility scope
    int includedFileNumber;       // File containing symbol
    Reference *references;        // All references to this symbol
} ReferenceItem;

8.7.5. Database Operations

Database Creation (-create)
  1. Parse all project files using C/Yacc parsers

  2. Generate ReferenceItem entries for each symbol

  3. Create Reference entries for each symbol usage

  4. Write structured records to .cx files

  5. Build hash-based index for fast lookup

Database Updates (-update)
  • Full Update: Rebuild entire database

  • Fast Update: Only process modified files based on timestamps

  • Incremental: Smart detection of changed dependencies

Symbol Lookup (OLO_PUSH operations)
  1. scanReferencesToCreateMenu(symbolName) loads symbol data

  2. createSelectionMenu() builds navigation menus

  3. olProcessSelectedReferences() populates browser stack

  4. olcxOrderRefsAndGotoDefinition() finds best definition

8.7.6. LSP Integration Challenges

Architectural Mismatch

The current architecture was designed for batch processing and persistent databases, while LSP requires on-demand processing and responsive queries.

Current LSP Problems:

  1. Persistent File Dependency: LSP findDefinition() calls scanReferencesToCreateMenu() which expects pre-existing .cx files

  2. Project-Wide Requirement: Full project analysis required before individual file operations

  3. Batch Processing Model: No support for incremental, per-request symbol resolution

  4. Cold Start Problem: New projects have no .cx files, causing all LSP operations to fail

Error Flow in LSP Mode:

LSP textDocument/definition request
↓
findDefinition() called
↓
scanReferencesToCreateMenu() called
↓
No .cx files exist → empty results
↓
browserStack remains empty
↓
All definition requests return same default position
Performance Characteristics
Operation File-Based (.cx) On-Demand Parsing

Cold Start

Requires -create first

Parse file immediately

Warm Queries

O(1) hash lookup

O(file_size) parsing

Memory Usage

Low (streaming)

High (in-memory cache)

Incremental Updates

Smart file tracking

Per-file invalidation

Multi-project

Separate databases

Workspace-scoped

A proposal for a unified symbol database architecture has been documented. See Proposed Unified Symbol Database in the Proposed Refactorings chapter.

8.7.7. CXFILE

The current implementation of the reference database is file based, with an optimized storage format.

There is limited support to automatically keep these updated during an edit-compile cycle, you might have to update manually now and then.

The project settings (or command line options) indicate where the file(s) are created and one option controls the number of files to be used, -refnum.

This file (or files) contains compact, but textual representations of the cross-reference information. Format is somewhat complex, but here are somethings that I think I have found out:

  • the encoding has single character markers which are listed at the top of cxfile.c

  • the coding seems to often start with a number and then a character, such as '4l' (4 ell) means line 4, 23c mean column 23

  • references seems to be optimized to not repeat information if it would be a repetition, such as '15l3cr7cr' means that there are two references on line 15, one in column 3 the other in column 7

  • so there is a notion of "current" for all values which need not be repeated

  • e.g. references all use 'fsulc' fields, i.e. file, symbol index, usage, line and column, but do not repeat a 'fsulc' as long as it is the same

  • some "fields" have a length indicator before, such as filenames ('6:/abc.c') indicated by ':' and version information ('34v file format: C-xrefactory 1.6.0 ') indicated by 'v'.

So a line might say

12205f 1522108169p m1ia 84:/home/...

The line identifies the file with id 12205. The file was last included in an update of refs at sometime which is identified by 1522108169 (mtime), has not been part of a full update of xrefs, was mentioned on the command line. (I don’t know what the 'a' means…​) Finally, the file name itself is 84 characters long.

TODO: Build a tool to decipher this so that tests can query the generated data for expected data. This is now partly ongoing in the 'utils' directory.

8.7.8. Reference Database Reading

All information about an externally visible symbol is stored in one, and only one reference file, determined by hashing the linkname of the symbol. So it will always suffice to read one reference file when consulting the reference database (in the form of CXFILE) for a symbol.

The reading of the CXFILE format is controlled by `scanFunctionTable`s. These consists of a list of entries, one for each key/tag/recordCode (see format description above) that the scan will process.

As the reference file reading encounters a key/tag/recordCode it will consult the table and see if there is an entry pointing to a handler function for that key/tag/recordCode. If so, it will be called.

8.8. Editor Plugin

The editor plugin has three different responsibilities:

  • serve as the UI for the user when interacting with certain c-xref related functions

  • query c-xref server for symbol references and support navigating these in the source

  • initiate source code operations ("refactorings") and execute the resulting edits

Basically Emacs (and probably other editors) starts c-xref in "server-mode" using -server which connects the editor with c-xref through stdout/stdin. If you have (setq c-xref-debug-mode t) this command is logged in the *Messages* buffer with the prefix "calling:".

Commands are sent from the editor to the server on its standard input. They looks very much like normal command line options, and in fact c-xref will parse that input in the same way using the same code. When the editor sends an end-of-options line, the server will start executing whatever was sent, and return some information in the file given as an -o option when the editor starts the c-xref server process. The file is named and created by the editor and usually resides in /tmp. With c-xref-debug-mode set to on this is logged as "sending:". If you (setq c-xref-debug-preserve-tmp-files t) Emacs will also not delete the temporary files it creates so that you can inspect them afterwards.

When the server has finished processing the command and placed the output in the output file it sends a <sync> reply.

The editor can then pick up the result from the output file and do what it needs to do with it ("dispatching:").

8.8.1. Invocations

The editor invokes a new c-xref process for the following cases:

  • Refactoring

    Each refactoring operation calls a new instance of c-xref?

  • Create Project

    When a c-xref function is executed in the editor and there is no project covering that file, an interactive "create project" session is started, which is run by a separate c-xref process.

8.8.2. Buffers

There is some magical editor buffer management happening inside of c-xref which is not clear to me at this point. Basically it looks like the editor-side tries to keep the server in sync with which buffers are opened with what file…​

At this point I suspect that -preload <file1> <file2> means that the editor has saved a copy of <file1> in <file2> and requests the server to set up a "buffer" describing that file and use it instead of the <file1> that recides on disk.

This is essential when doing refactoring since the version of the file most likely only exists in the editor, so the editor has to tell the server the current content somehow, this is the -preload option.

8.9. Editor Server

When serving an editor the c-xrefactory application is divided into the server, c-xref and the editor part, at this point only Emacs:en are supported so that’s implemented in the editor/Emacs-packages.

8.9.1. Interaction

The initial invocation of the edit server creates a process with which communication is over stdin/stdout using a protocol which from the editor is basically a version of the command line options.

When the editor has delivered all information to the server it sends 'end-of-option' as a command and the edit server processes whatever it has and responds with <sync> which means that the editor can fetch the result in the file it named as the output file using the '-o' option.

As long as the communication between the editor and the server is open, the same output file will be used. This makes it hard to catch some interactions, since an editor operation might result in multiple interactions, and the output file is then re-used.

Setting the emacs variable c-xref-debug-mode forces the editor to copy the content of such an output file to a separate temporary file before re-using it.

For some interactions the editor starts a completely new and fresh c-xref process, see below. And actually you can’t do refactorings using the server, they have to be separate calls. (Yes?) I have yet to discover why this design choice was made.

There are many things in the sources that handles refactorings separately, such as refactoring_options, which is a separate copy of the options structure used only when refactoring.

8.9.2. Protocol

Communication between the editor and the server is performed using text through standard input/output to/from c-xref. The protocol is defined in src/protocol.tc and must match editor/emacs/c-xrefprotocol.el.

The definition of the protocol only caters for the server→editor part, the editor→server part consists of command lines resembling the command line options and arguments, and actually is handled by the same code.

The file protocol.tc is included in protocol.h and protocol.c which generates definitions and declarations for the elements through using some macros.

There is a similar structure with c-xrefprotocol.elt which includes protocol.tc to wrap the PROTOCOL_ITEMs into defvars.

There is also some Makefile trickery that ensures that the C and elisp impementations are in sync.

One noteable detail of the protocol is that it carries strings in their native format, utf-8. This means that lengths need to indicate characters not bytes.

8.9.3. Invocation of server

The editor fires up a server and keeps talking over the established channel (elisp function 'c-xref-start-server-process'). This probably puts extra demands on the memory management in the server, since it might need to handle multiple information sets and options (as read from a .cxrefrc-file) for multiple projects simultaneously over a longer period of time. (E.g. if the user enters the editor starting with one project and then continues to work on another then new project options need to be read, and new reference information be generated, read and cached.)

TODO: Figure out and describe how this works by looking at the elisp-sources.

FINDINGS:

  • c-xref-start-server-process in c-xref.el

  • c-xref-send-data-to-running-process in c-xref.el

  • c-xref-server-call-refactoring-task in c-xref.el

8.9.4. Communication Protocol

The editor server is started using the appropriate command line option and then it keeps the communication over stdin/stdout open.

The editor part sends command line options to the server, which looks something like (from the read_xrefs test case):

-encoding=european -olcxpush -urldirect  "-preload" "<file>" "-olmark=0" "-olcursor=6" "<file>" -xrefrc ".c-xrefrc" -p "<project>"
end-of-options

In this case the "-olcxpush" is the operative command which results in the following output

<goto>
 <position-lc line=1 col=4 len=66>CURDIR/single_int1.c</position-lc>
</goto>

As we can see from this interaction, the server will handle (all?) input as a command line and manage the options as if it was a command line invocation.

This explains the intricate interactions between the main program and the option handling.

The reason behind this might be that a user of the editor might be editing files on multiple projects at once, so every interrogation/operation needs to clearly set the context of that operation, which is what a user would do with the command line options.

8.9.5. OLCX Naming

It seems that all on-line editing server functions have an olcx prefix, "On-Line C-Xrefactory", maybe…​

8.10. Refactoring

This is of course, the core in why I want to restore this, to get at its refactoring capabilities. So far, much is not understood, but here are some bits and pieces.

8.10.1. Editor interface

One thing that really confused me in the beginning was that the editor, primarily Emacs, don’t use the actual server that it has started for refactoring operations (and perhaps for other things also?). Instead it creates a separate instance with which it talks to about one refactoring.

I’ve just managed to create the first automatic test for refactorings, olcx_refactory_rename. It was created by running the sandboxed emacs to record the communication and thus finding the commands to use.

Based on this learning it seems that a refactoring typically is a single invocation of c-xref with appropriate arguments (start & stop markers, the operation, and so on) and the server then answers with a sequence of operations, like

<goto>
 <position-off off=3 len=<n>>CURDIR/test_source/single_int1.c</position-off>
</goto>
<precheck len=<n>> single_int_on_line_1_col_4;</precheck>
<replacement>
 <str len=<n>>single_int_on_line_1_col_4</str>  <str len=<n>>single_int_on_line_1_col_44</str>
</replacement>

8.10.2. Interactions

I haven’t investigated the internal flow of such a sequence, but it is starting to look like c-xref is internally re-reading the initialization, I’m not at this point sure what this means, I hope it’s not internal recursion…​

8.10.3. Extraction

Each type of refactoring has it’s own little "language". E.g. extracting a method/function using -refactory -rfct-extract-function will return something like

<extraction-dialog type=newFunction_> <str len=20>	newFunction_(str);
</str>
 <str len=39>static void newFunction_(char str[]) {
</str>
 <str len=3>}

</str>
  <int val=2 len=0></int>
</extraction-dialog>

So there is much logic in the editor for this. I suspect that the three <str> parts are

  • what to replace the current region with

  • what to place before the current region

  • what to place after the current region

If this is correct then all extractions copy the region verbatim and then the server only have to figure out how to "glue" that to a semantically correct call/argument list.

As a side note the editor asks for a new name for the function and then calls the edit server with a rename request (having preloaded the new source file(s) of course).

8.10.4. Protocol

Dechiffrering the interaction between an editor and the edit server in c-xrefactory isn’t easy. The protocol isn’t very clear or concise. Here I’m starting to collect the important bits of the invocation, the required and relevant options and the returned information.

The test cases for various refactoring operations should give you some more details.

All of these require a -p (project) option to know which c-xref project options to read.

General Principles

Refactorings are done using a separate invocation, the edit server mode cannot handle refactorings. At least that is how the Emacs client does it (haven’t looked at the Jedit version).

I suspect that it once was a single server that did both the symbol management and the refactoring as there are remnants of a separate instance of the option structure named "refactoringOptions". Also the check for the refactoring mode is done using options.refactoringRegime == RegimeRefactory which seems strange.

Anyway, if the refactoring succeeds the suggested edits is as per usual in the communications buffer.

However, there are a couple of cases where the communcation does not end there. Possibly because the client needs to communicate some information back before the refactoring server can finish the job, like presenting some menu selection.

My guess at this point is that it is the refactoring server that closes the connection when it is done…​

Rename

Invocation: -rfct-rename -renameto=NEW_NAME -olcursor=POSITION FILE

Semantics: The symbol under the cursor (at POSITION in FILE) should be renamed (replaced at all occurrences) by NEW_NAME.

Result: sequence of

<goto>
 <position-off off=POSITION len=N>FILE</position-off>
</goto>
<precheck len=N>STRING</precheck>

followed by sequence of

<goto>
 <position-off off=POSITION len=N>FILE</position-off>
</goto>
<replacement>
 <str len=N>ORIGINAL</str>  <str len=N>REPLACEMENT</str>
</replacement>
Protocol Messages
<goto>{position-off}</goto> → editor

Request the editor to move cursor to the indicated position (file, position).

<precheck len={int}>{string}</precheck> → editor

Requests that the editor verifies that the text under the cursor matches the string.

<replacement>{str}{str}</replacement>

Requests that the editor replaces the string under the cursor, which should be 'string1', with 'string2'.

<position-off off={int} len={int}>{absolute path to file}</position-off>

Indicates a position in the given file. 'off' is the character position in the file.

8.11. Memory handling

c-xrefactory uses custom memory management via arena allocators rather than malloc/free for performance-critical operations.

See the Modules chapter for the design and architecture of the Memory module, and the Data Structures chapter for details on the arena allocator data structure and allocation model.

For debugging memory issues, especially arena lifetime violations, see the Development Environment chapter.

8.11.1. The Memory Type

Memory allocation is managed through the Memory structure, which implements an arena/bump allocator. Different memory arenas serve different purposes:

  • cxMemory - Cross-reference database (with overflow handling)

  • ppmMemory - Preprocessor macro expansion

  • macroBodyMemory - Macro definition storage

  • macroArgumentsMemory - Macro argument expansion

  • fileTableMemory - File table entries

See Modules chapter for detailed description of each arena’s purpose and lifetime.

8.11.2. Option Memory

The optMemory arena requires special handling because Options structures are copied during operation. When copying, all pointers into option memory must be adjusted to point into the target structure’s memory area, not the source’s.

Functions like copyOptions() perform this pointer adjustment through careful memory arithmetic, traversing a linked list of all memory locations that need updating.

The linked list nodes themselves are allocated in the Options structure’s dynamic memory.

8.12. Configuration

The legacy c-xref normally, in "production", uses a common configuration file in the users home directory, .c-xrefrc. When a new project is defined its options will be stored in this file as a new section.

It is possible to point to a specific configuration file using the command line option -xrefrc which is used extensively in the tests to isolate them from the users configuration.

Each "project" or "section" requires a name of the "project", which is the argument to the -p command line option. And it may contain most other command line options on one line each. These are always read, unless -no-stdop is used, before anything else. This allows for different "default" options for each project.

8.12.1. Options

There are three possible sources for options.

  • Configuration files (~/.c-xrefrc)

  • Command line options at invocation, including server

  • Piped options sent to the server in commands

Not all options are relevant in all cases.

All options sources uses exactly the same format so that the same code for decoding them can be used.

8.12.2. Logic

When the editor has a file open it needs to "belong" to a project. The logic for finding which is very intricate and complicated.

In this code there is also checks for things like if the file is already in the index, if the configuration file has changed since last time, indicating there are scenarios that are more complicated (the server, obviously).

But I also think this code should be simplified a lot.

9. Modules

The current state of c-xrefactory is not such that clean modules can easily be identified and located. This is obviously one important goal of the continuing refactoring work.

To be able to do that we need to understand the functionality enough so that clusters of code can be refactored to be more and more clear in terms of responsibilities and interfaces.

This section makes a stab at identifying some candidated to modules, as illustrated by the component diagram for cxrefCore.

9.1. Yylex

9.1.1. Responsibilities

  • Transform source text to sequences of lexems and additional information

  • Register and apply C pre-processor macros and defines as well as defines made as command line options (-D)

  • Handle include files by pushing and poping read contexts

9.1.2. Interface

The yylex module has the standard interface required by any yacc-based parser, which is a simple yylex(void) function.

9.2. Parser

9.3. Xref

9.4. Server

9.5. Refactory

9.6. Cxref

9.7. Main

9.8. Memory

9.8.1. Responsibilities

The Memory module provides arena-based allocation for performance-critical and request-scoped operations:

  • Fast allocation for macro expansion and lexical analysis

  • Bulk deallocation for request-scoped cleanup

  • Multiple specialized arenas for different data lifetimes

  • Overflow detection and optional dynamic resizing

9.8.2. Design Rationale

Historical Context

In the 1990s when c-xrefactory originated, memory was scarce. The design had to:

  1. Minimize allocation overhead (no malloc/free per token)

  2. Support large projects despite limited RAM

  3. Allow overflow recovery via flushing and reuse

  4. Enable efficient bulk cleanup

Most memory arenas use statically allocated areas. Only cxMemory supports dynamic resizing to handle out-of-memory situations by discarding, flushing and reusing memory. This forced implementation of a complex caching strategy since overflow could happen mid-file.

Modern Benefits

Even with abundant modern memory, arena allocators provide:

  • Performance: Bump pointer allocation is ~10x faster than malloc

  • Cache locality: Related data allocated contiguously

  • Automatic cleanup: Bulk deallocation prevents leaks

  • Request scoping: Natural fit for parsing/expansion operations

9.8.3. Arena Types and Lifetimes

Arena Purpose Lifetime

cxMemory

Symbol database, reference tables, cross-reference data

File or session

ppmMemory

Preprocessor macro expansion buffers (temporary allocations)

Per macro expansion

macroBodyMemory

Macro definition storage

Session

macroArgumentsMemory

Macro argument expansion

Per macro invocation

fileTableMemory

File metadata and paths

Session

optMemory

Command-line and config option strings (with special pointer adjustment)

Session

9.8.4. Key Design Patterns

Marker-Based Cleanup

Functions save a marker before temporary allocations:

char *marker = ppmAllocc(0);   // Save current index
// ... temporary allocations ...
ppmFreeUntil(marker);          // Bulk cleanup
Buffer Growth Pattern

Long-lived buffers that may need to grow:

// Allocate initial buffer
bufferDesc.buffer = ppmAllocc(initialSize);

// ... use buffer, may need growth ...

// Free temporaries FIRST
ppmFreeUntil(marker);

// NOW buffer can grow (it's at top-of-stack)
expandPreprocessorBufferIfOverflow(&bufferDesc, writePointer);
Overflow Handling

The cxMemory arena supports dynamic resizing:

bool cxMemoryOverflowHandler(int n) {
    // Attempt to resize arena
    // Return true if successful
}

memoryInit(&cxMemory, "cxMemory", cxMemoryOverflowHandler, initialSize);

When overflow occurs, handler can:

  1. Resize the arena (if within limits)

  2. Flush old data and reset

  3. Signal failure (fatal error)

9.8.5. Interface

Key functions (see memory.h):

// Initialization
void memoryInit(Memory *memory, char *name,
                bool (*overflowHandler)(int n), int size);

// Allocation
void *memoryAlloc(Memory *memory, size_t size);
void *memoryAllocc(Memory *memory, int count, size_t size);

// Reallocation (only for most recent allocation)
void *memoryRealloc(Memory *memory, void *pointer,
                    size_t oldSize, size_t newSize);

// Bulk deallocation
size_t memoryFreeUntil(Memory *memory, void *marker);

// Guards
bool memoryIsAtTop(Memory *memory, void *pointer, size_t size);

9.8.6. Common Pitfalls

See the "Arena Allocator Lifetime Violations" section in the Development Environment chapter for:

  • Attempting to resize buffers not at top-of-stack

  • Calling FreeUntil() too late

  • Mixing arena lifetimes

9.8.7. Future Directions

Modern systems have abundant virtual memory. Possible improvements:

  1. Simplify overflow handling - Allocate larger initial arenas

  2. Separate lifetime management - Don’t mix temporary and long-lived allocations

  3. Consider alternatives - Linear allocators for some use cases

  4. Add debug modes - Track allocation patterns and detect violations

The experimental FlushableMemory type explores some of these ideas but hasn’t replaced current implementation.

9.9. Cxfile

9.9.1. Responsibilities

Read and write the CXref database in "plain" text format.

9.9.2. File format

The current file format for the cross-reference data consists of records with the general format

<number><key>[<value>]

There are two important types of lines, a file information line and a symbol information line.

The actual keys are documented in cxfile.c, but here is an example file information line:

32571f  1715027668m 21:/usr/include/ctype.h

First we have two simple value/key pairs. We see "32571f" indicating that this is file information for file with file number 32571.

Secondly we have "1715027668m". This is the modification time of the file which is stored to be able to see if that file has been updated since the reference database was last written.

And the third part is "21:/usr/include/ctype.h", which is of a record type that is a bit more complex. The number is the length of the value. The ':' indicates that the record is a filename.

9.10. c-xref.el

9.11. c-xrefactory.el

10. Data Structures

There are a lot of different data structures used in c-xrefactory. This is a first step towards visualising some of them.

10.1. ReferenceableItem and Reference: Core Domain Concepts

These are the fundamental cross-reference data structures that represent the "what" and "where" of code entities.

10.1.1. ReferenceableItem

A ReferenceableItem represents a referenceable entity in the codebase - something that can be referenced from multiple locations:

  • Functions and variables

  • Types (structs, unions, enums, typedefs)

  • Macros

  • Include directives (special case: TypeCppInclude)

  • Yacc non-terminals and rules

Each ReferenceableItem contains:

  • linkName - Fully qualified name (e.g., "MyClass::method")

  • type - What kind of entity (function, variable, type, etc.)

  • storage, scope, visibility - Language properties

  • includeFile - For TypeCppInclude items, which file is being included

  • references - Linked list of all Reference (occurrences) of this entity

ReferenceableItems are stored in the referenceableItemTable (hash table) and persisted to .cx files.

10.1.2. Reference (Occurrence)

A Reference represents a single occurrence of a ReferenceableItem at a specific location:

  • position - File, line, and column where this occurrence appears

  • usage - How it’s used (definition, declaration, usage, etc.)

  • next - Pointer to next occurrence in the list

Each ReferenceableItem maintains a linked list of all its References, allowing you to find every place that entity appears in the codebase.

Note: The term "Reference" in this context means "occurrence" - one specific use of an entity at one location. This is distinct from C++ references or reference semantics.

10.2. Symbol (Parser Symbol Table)

There is also a structure called Symbol. This is separate from ReferenceableItem and serves a different purpose:

Symbol - Parser-level symbol table entry (temporary, exists only during parsing):

  • Used by the C/Yacc parser for semantic analysis

  • Contains type information, position, storage class

  • Lives in symbolTable (hash table) during file parsing

  • Not persisted - discarded after parsing completes

ReferenceableItem - Cross-reference entity (persistent across entire codebase):

  • Created FROM Symbol properties when a referenceable construct is found

  • Stored in referenceableItemTable and saved to .cx files

  • Accumulates References from all files in the project

Relationship: During parsing, when the parser encounters a referenceable symbol (function, variable, etc.), it:

  1. Creates a Symbol in symbolTable for semantic analysis

  2. Creates or finds a ReferenceableItem by copying Symbol properties

  3. Adds a Reference to that ReferenceableItem’s list

  4. Discards the Symbol when parsing completes

This separation allows the parser to maintain its own temporary symbol table while building the persistent cross-reference database.

Symbol structures
Figure 1. Symbol and Reference Data Structures

10.3. Files and Buffers

Many strange things are going on with reading files so that is not completely understood yet.

Here is an initial attempt at illustrating how some of the file and text/lexem buffers are related.

Buffer structures
Figure 2. File Descriptor and Buffer Relationships
It would be nice if the LexemStream structure could point to a LexemBuffer instead of holding separate pointers which are impossible to know what they actually point to…​
This could be achieved if we could remove the CharacterBuffer from LexemBuffer and make that a reference instead of a composition. Then we’d need to add a CharacterBuffer to the structures that has a LexemBuffer as a component (if they use it).

10.4. Modes

c-xrefactory operates in different modes ("regimes" in original c-xref parlance):

  • xref - batch mode reference generation

  • server - editor server

  • refactory - refactory browser

The default mode is "xref". The command line options -server and -refactory selects one of the other modes. Branching is done in the final lines in main().

The code for the modes are intertwined, probably through re-use of already existing functionality when extending to a refactoring browser.

One evidence for this is that the refactory module calls the "main task" as a "sub-task". This forces some intricate fiddling with the options data structure, like copying it. Which I don’t fully understand yet.

TODO?: Strip away the various "regimes" into more separated concerns and handle options differently.

10.5. Options

The Options datastructure is used to collect options from the command line as well as from options/configuration files and piped options from the editor client using process-to-process communication.

It consists of a collection of fields of the types

  • elementary types (bool, int, …​)

  • string (pointers to strings)

  • lists of strings (linked lists of pointers to strings)

10.5.1. Allocation & Copying

Options has its own allocation using optAlloc which allocates in a separate area, currently part of the options structure and utilizing "dynamic allocation" (dm_ functions on the Memory structure).

The Options structure are copied multiple times during a session, both as a backup (savedOptions) and into a separate options structure used by the Refactorer (refactoringOptions).

Since the options memory is then also copied, all pointers into the options memory need to be updated. To be able to do this, the options structure contains lists of addresses that needs to by "shifted".

When an option with a string or string list value is modified the option is registered in either the list of string valued options or the list of string list valued options. When an options structure is copied it must be performed using a deep copy function which "shifts" those options and their values (areas in the options memory) in the copy so that they point into the memory area of the copy, not the original.

After the deep copy the following point into the option memory of the copy

  • the lists of string and string list valued options (option fields)

  • all string and string valued option fields that are used (allocated)

  • all list nodes for the used option (allocated)

  • all list nodes for the string lists (allocated)

10.6. Arena Allocators (Memory)

Arena allocators (also called region-based or bump allocators) are the fundamental memory management strategy used throughout c-xrefactory for performance-critical operations like macro expansion and lexical analysis.

10.6.1. The Memory Structure

typedef struct memory {
    char   *name;              // Arena name for diagnostics
    bool  (*overflowHandler)(int n); // Optional resize handler
    int     index;             // Next allocation offset (bump pointer)
    int     max;               // High-water mark
    size_t  size;              // Total arena size
    char   *area;              // Actual memory region
} Memory;

10.6.2. Allocation Model

Arena allocators use bump pointer allocation:

  1. Allocation: Return &area[index], then index += size

  2. Deallocation: Bulk rollback via FreeUntil(marker)

  3. Reallocation: Only possible for most recent allocation

This is extremely fast (O(1) allocation) but requires stack-like discipline for deallocation.

10.6.3. Stack-Like Discipline

Arenas follow LIFO (last-in-first-out) cleanup:

marker = ppmAllocc(0);        // Save current index
temp1 = ppmAllocc(100);       // Allocate
temp2 = ppmAllocc(200);       // Allocate
// Use temp1, temp2...
ppmFreeUntil(marker);         // Free both temp1 and temp2

10.6.4. Key Constraint: Top-of-Stack Reallocation

Only the most recent allocation can be resized:

buffer = ppmAllocc(1000);     // Allocate buffer
temp = ppmAllocc(500);        // Allocate temporary
ppmReallocc(buffer, ...);     // ❌ FAILS - buffer not at top

This constraint is enforced by guards in memory.c (see Development Environment chapter).

10.6.5. Memory Arena Types

c-xrefactory uses specialized arenas for different purposes (see Modules chapter for details):

  • cxMemory - Cross-reference database and symbol tables

  • ppmMemory - Preprocessor macro expansion (temporary)

  • macroBodyMemory - Macro body buffers

  • macroArgumentsMemory - Macro argument expansion

  • fileTableMemory - File table entries

  • optMemory - Option strings (with special pointer adjustment)

10.7. Preload Mechanism

The preload mechanism allows the server to work with editor buffer contents that haven’t been saved to disk. This is essential for providing real-time symbol navigation and completion while the user is actively editing.

10.7.1. How It Works

When an editor buffer is modified but not yet saved:

  1. Editor Action: The Emacs client writes the current buffer content to a temporary file

  2. Server Request: The client sends a request with -preload <filename> <tmpfile> options

  3. Buffer Association: The server creates an EditorBuffer structure linking the on-disk filename to the temporary file containing the actual content

  4. Transparent Parsing: When the server needs to parse the file, it transparently reads from the temporary file instead of the on-disk file

10.7.2. Why It’s Needed

Without preload, the server would only see the last saved version of the file. The preload mechanism ensures that:

  • Symbol navigation works with the current buffer state

  • Completion suggests symbols based on what’s actually typed

  • Refactorings operate on the current code, not stale saved content

  • Users get immediate feedback without having to save constantly

10.7.3. Reference Management

When a file is preloaded, the server must handle reference updates carefully:

  • Old references from the previous file version must be removed from the reference table before parsing

  • This prevents duplicate references (one set at old positions, another at new positions)

  • The removal happens in removeReferencesForFile() when preloaded content is detected

10.8. Browser Stack

The browser stack maintains navigation history for symbol references, allowing users to browse through code by pushing symbol lookups and navigating back through previous queries.

10.8.1. Structure

The browser stack is a linked list of OlcxReferences entries, where each entry represents a symbol lookup session:

  • Stack entries contain complete symbol information and reference lists for one navigation session

  • Top pointer indicates the current active entry being navigated

  • Root pointer tracks the base of the stack (most recent entry still available)

  • Entries between root and top are "future" navigation states that can be returned to via "next"

10.8.2. Lifecycle

  1. Push: When user requests symbol references (e.g., -olcxpush), a new empty entry is created on the stack

  2. Population: After parsing, the entry is filled with BrowserMenu structures containing references

  3. Navigation: Commands like -olcxnext and -olcxprevious move through references in the current entry

  4. Pop: User can pop back to previous entries to return to earlier symbol lookups

10.8.3. Relationship to Parsing

The browser stack is populated in two stages:

  1. Parse-time: References are collected in the referenceableItemTable during file parsing

  2. Menu Creation: ReferenceableItems are wrapped in BrowserMenu structures and added to the browser stack entry via putOnLineLoadedReferences()

This separation means that browser stack entries can become stale if files are reparsed (e.g., with preloaded content) without refreshing the stack. Users typically need to pop and re-push to get fresh reference lists after edits.

10.9. Browser Menu

A browser menu is a navigable list of referenceable items with their occurrences, organized for presentation to the user in Emacs. Multiple items may appear in a single menu when name resolution finds several candidates (e.g., symbols with the same name in different scopes).

10.9.1. BrowserMenu Structure

Each BrowserMenu entry is a menu item wrapping a ReferenceableItem with UI presentation state:

  • referenceable: The embedded ReferenceableItem (the entity being browsed)

    • Contains linkName, type, storage, scope, visibility

    • Contains the list of references (occurrences) for this item

  • selected: Whether this item is currently selected for operations

  • visible: Whether this item passes current visibility filters

  • defaultPosition: The "best" occurrence to jump to (usually the definition)

  • defaultUsage: The usage type of the default occurrence

  • outOnLine: Display line number in the Emacs menu

  • markers: Editor markers for refactoring operations

  • next: Pointer to next menu item in the list

Key insight: BrowserMenu is not just a menu - it’s a menu item. A collection of BrowserMenu items forms the actual menu shown to the user.

10.9.2. Multiple Menu Items in One Session

A single browser stack entry (OlcxReferences) can contain multiple BrowserMenu items:

  • hkSelectedSym: Menu items that matched at the cursor position (after disambiguation)

  • symbolsMenu: Complete menu including related items (same name, similar signatures)

This allows users to:

  • See all candidates when a symbol is ambiguous

  • Navigate between related definitions (different scopes, include files)

  • Select specific items for refactoring operations

10.9.3. Menu Population

Browser menus are populated by scanning the referenceableItemTable:

  1. Symbol lookup: Find all ReferenceableItem entries matching the requested symbol

  2. Menu item creation: Wrap each matching item in a BrowserMenu structure

  3. Reference collection: References are already in the ReferenceableItem

  4. Sorting and filtering: Order items by relevance and apply visibility filters

  5. Selection: Mark items that best match the cursor context (e.g., same file)

10.10. Putting It All Together: Domain Model Summary

Understanding the complete flow from parsing to browsing:

10.10.1. During Parsing (Building the Cross-Reference Database)

  1. Parser creates Symbols: As the C/Yacc parser processes source code, it creates Symbol entries in symbolTable for semantic analysis

  2. ReferenceableItems are created/found: When encountering referenceable constructs (functions, variables, types, etc.):

    • Create a ReferenceableItem from Symbol properties

    • Check if it already exists in referenceableItemTable

    • If new, add it to the table; if exists, reuse the existing one

  3. References are recorded: Add a Reference (occurrence) to the ReferenceableItem’s list, recording position and usage

  4. Symbols are discarded: After parsing completes, the symbolTable is cleared (Symbols are temporary)

  5. Database is persisted: ReferenceableItems and their References are saved to .cx files

Result: A persistent database mapping each entity (ReferenceableItem) to all its occurrences (References) across the entire codebase.

10.10.2. During Browsing (Interactive Navigation)

  1. User requests symbol info: User places cursor on a symbol and invokes a command (e.g., "push to symbol")

  2. Symbol lookup: Server finds matching ReferenceableItem(s) in referenceableItemTable

  3. BrowserMenu creation: Each matching ReferenceableItem is wrapped in a BrowserMenu structure

    • Adds UI state (selected, visible, display position)

    • Marks best-fit match (e.g., same file as cursor)

  4. Stack push: BrowserMenu items are added to the browser stack (OlcxReferences entry)

  5. Display to user: Menu sent to Emacs showing all matching items and their occurrences

  6. Navigation: User can browse through references, select items, invoke refactorings

Result: Interactive navigation through the cross-reference database with selection and filtering.

10.10.3. Key Relationships

Symbol (parser)
    ↓ (creates during parsing)
ReferenceableItem (persistent entity)
    ├─→ references: Reference* (list of occurrences)
    └─→ stored in: referenceableItemTable
         ↓ (wrapped for browsing)
      BrowserMenu (UI wrapper)
         └─→ stored in: OlcxReferences.symbolsMenu (browser stack)

This architecture separates concerns:

  • Parser symbols - Temporary, for semantic analysis

  • Cross-reference database - Persistent, for finding all uses

  • Browser menus - Presentation layer, for user interaction

11. Algorithms

The code does not always explain the algorithms that it implements. This chapter will ultimately be a description of various algorithms used by c-xrefactory.

11.1. How is an Extract refactoring performed?

The region (mark and point/cursor positions) is sent to the c-xref server in a -refactory -rfct-extract command.

The server parses the relevant file and sets some information that can be used in some prechecks that are then performed, such as structure check, and then the server answers with

<extraction-dialog>
    <str .... /str>
    <str .... /str>
    <str .... /str>
</extraction-dialog>

The first string is the code that will replace the extracted code, such as a call to the extracted function. The second string is the header part that will preceed the extracted code ("preamble"), and the third is then of course any code that needs to go after the extracted code ("postamble").

The actual code in the region is never sent to, or returned from, the server. This is handled completely by the editor extension, and used verbatim (except if it is a macro that is extracted, in which case each line is terminated by the backslash) so no changes to that code can be made.

The pre- and post-ambles might be of varying complexity. E.g. when extracting a macro, the postamble can be completely empty. When extracting a function both may contain code to transfer and restore parameters into local variables to propagate in/out variables as required.

  1. The editor then requests a name from the user that it will use in a rename operation that renames the default named function/macro/variable.

11.2. How does lexem stream management work?

Lexical analysis uses a stack of LexemStream structures to handle nested macro expansions. The key insight is that the stream type acts as a discriminator for buffer ownership and cleanup strategy.

11.2.1. The Stream Types

typedef enum {
    NORMAL_STREAM,              // File or local buffer
    MACRO_STREAM,               // Macro expansion
    MACRO_ARGUMENT_STREAM,      // Macro argument expansion
} LexemStreamType;
Historically, there was also a CACHED_STREAM type when the caching mechanism was still active. This confirms that stream types are fundamentally about buffer ownership and refill strategy - each type encodes where the buffer came from and how to handle it when exhausted.
NORMAL_STREAM

Buffer from file’s lexemBuffer or a local temporary. Not allocated from arena memory, so no cleanup needed when stream exhausted.

MACRO_STREAM

Buffer allocated from macroBodyMemory arena during macro expansion. Must call mbmFreeUntil(stream.begin) when popping from stack to free the arena allocation.

MACRO_ARGUMENT_STREAM

Buffer allocated from ppmMemory arena during macro argument expansion. Signals END_OF_MACRO_ARGUMENT_EXCEPTION when exhausted (cleanup handled by caller).

11.2.2. The Refill Algorithm

When currentInput runs out of lexems (read >= write), refillInputIfEmpty() uses the stream type to decide what to do:

while (currentInput.read >= currentInput.write) {
    LexemStreamType inputType = currentInput.streamType;

    if (insideMacro()) {  // Stack not empty
        if (inputType == MACRO_ARGUMENT_STREAM) {
            return END_OF_MACRO_ARGUMENT_EXCEPTION;
        }
        // Only free MACRO_STREAM buffers (allocated from macroBodyMemory)
        if (inputType == MACRO_STREAM) {
            mbmFreeUntil(currentInput.begin);
        }
        currentInput = macroInputStack[--macroStackIndex];  // Pop
    }
    else if (inputType == NORMAL_STREAM) {
        // Refill from file
        buildLexemFromCharacters(&currentFile.characterBuffer, ...);
    }
}

11.2.3. Key Invariant

Stream type must match buffer allocation:

  • MACRO_STREAM → buffer allocated from macroBodyMemory

  • NORMAL_STREAM → buffer NOT from macro arenas

  • MACRO_ARGUMENT_STREAM → buffer from ppmMemory

Violating this invariant causes fatal errors when trying to free buffers from the wrong arena.

11.2.4. Common Bug Pattern

Pushing a NORMAL_STREAM onto the macro stack, then trying to free it as if it were MACRO_STREAM:

// WRONG: Blindly freeing without checking type
mbmFreeUntil(currentInput.begin);  // Fails if currentInput is NORMAL_STREAM!

// CORRECT: Check type first
if (inputType == MACRO_STREAM) {
    mbmFreeUntil(currentInput.begin);
}

11.3. Editor Buffers and Incremental Updates

This section describes how c-xrefactory handles the reality that source code exists in two places: on disk as files, and in memory as editor buffers. It also explains the different update strategies and how references flow through the system.

11.3.1. Editor Buffer Abstraction

The Duality of Source Code

When a user edits code in Emacs, the source code exists in two forms:

  • Disk files: The saved state on the filesystem

  • Editor buffers: The current (possibly unsaved) state in the editor

For code analysis to be useful during active editing, c-xrefactory must treat editor buffers as the source of truth when they exist.

The Preloading Mechanism

When the Emacs client sends a command to the c-xref server (like PUSH or NEXT), it uses the -preload option to transmit modified buffers:

-preload <original-file> <temp-file>

For example:

-olcxnext -olcursor=5 /project/foo.c -preload /project/foo.c /tmp/emacs-xxx.tmp

The process:

  1. Emacs creates a temporary file containing the current buffer content

  2. The temp file’s modification time represents when the buffer was last modified

  3. The server loads this into an EditorBuffer structure

  4. When parsing, the server reads from the temp file (buffer content) instead of the disk file

This ensures that the server analyzes what the user sees in the editor, not the potentially stale disk file.

EditorBuffer Lifecycle
EditorBuffer {
    char *name;              // Original filename
    char *preLoadedFromFile; // Path to temp file with buffer content
    time_t modificationTime; // When buffer was last modified
    ...
}
  • Created by loadAllOpenedEditorBuffers() at the start of each server operation

  • Lives only for the duration of that operation

  • Destroyed by closeAllEditorBuffers() at operation end

11.3.2. Modification Time Tracking

To implement incremental updates, c-xrefactory tracks when each file/buffer was last parsed.

The Fields

Each FileItem in the file table has:

  • lastParsedMtime - The modification time when we last parsed this file (any update mode)

  • lastFullUpdateMtime - The modification time when we last did a FULL update (including header propagation)

These are time_t values (seconds since epoch).

Dual Semantics

The lastParsedMtime field has dual semantics depending on context:

  • For disk files: Stores the file’s mtime when it was parsed

  • For editor buffers: Stores the buffer’s mtime (from the preloaded temp file)

This works because editorFileModificationTime() abstracts over both:

time_t editorFileModificationTime(char *filename) {
    EditorBuffer *buffer = getEditorBuffer(filename);
    if (buffer != NULL && buffer->preLoadedFromFile != NULL) {
        return buffer->modificationTime;  // Buffer time
    } else {
        return fileModificationTime(filename);  // Disk time
    }
}

The abstraction is seamless: code can check "has this file changed?" without caring whether it’s a disk file or editor buffer.

Change Detection

To determine if a file/buffer needs reparsing:

if (editorFileModificationTime(fileItem->name) != fileItem->lastParsedMtime) {
    // File/buffer has changed since we last parsed it
    reparse(fileItem);
    fileItem->lastParsedMtime = editorFileModificationTime(fileItem->name);
}

This pattern appears in:

  • schedulingToUpdate() - Marks files needing update before batch processing

  • processModifiedFilesForNavigation() - Detects modified buffers during navigation

11.3.3. Update Strategies

C-xrefactory has two update strategies that trade off speed against completeness.

Fast Update (Default)

When used:

  • Automatic updates before PUSH operations (if enabled)

  • Explicit -fastupdate command

What it does:

  1. Checks which source files (.c) have changed (compares modification times)

  2. Reparses only those changed source files

  3. Updates the references database

Trade-off:

  • Fast: Only reparses files that actually changed

  • Incomplete: Doesn’t detect when header files change

Example:

foo.h modified → fast update → foo.h not reparsed
foo.c unchanged → foo.c not reparsed
Result: foo.c still has stale information about symbols from foo.h
Full Update

When used:

  • Explicit -update command

  • When -exactpositionresolve is enabled (forces full update)

What it does:

  1. Checks which files (source OR headers) have changed

  2. If a header changed, finds all source files that include it (transitively)

  3. Reparses all affected source files

  4. Updates the references database

The algorithm (makeIncludeClosureOfFilesToUpdate):

  1. Mark all changed files as scheduledToUpdate

  2. For each marked file:

    • Find all files that #include it (by looking up include references)

    • Mark those files as scheduledToUpdate too

  3. Repeat until no new files are added (transitive closure)

  4. Reparse all marked source files

Trade-off:

  • Complete: Catches header changes and propagates to all users

  • Slower: Can trigger reparsing of many source files if a common header changes

Example:

common.h modified → full update → finds 50 files that include it
Result: Reparses all 50 source files to pick up header changes
When Does the Difference Matter?

With modern CPUs and SSDs, the performance difference is often negligible for small to medium projects. The fast update’s header-blindness can lead to subtle bugs where changes don’t propagate. Full update is generally safer and more correct.

11.3.4. Reference Lifecycle

References flow through multiple stages in c-xrefactory, with different storage locations and ownership models at each stage.

Stage 1: Parsing

When a file is parsed:

  1. Symbols are discovered (functions, variables, types, etc.)

  2. For each symbol, a ReferenceableItem is created or looked up in the referenceableItemTable

  3. Each usage of that symbol creates a Reference with a position

  4. The reference is added to the referenceable’s reference list

Storage: cxMemory (a custom arena allocator)

Lifetime: Lives until the next update that reparses that file

Stage 2: The ReferenceableItemTable

The canonical storage for all references.

ReferenceableItem {
    char *linkName;        // Symbol identifier
    Type type;             // Function, variable, macro, etc.
    Reference *references; // Linked list of all uses
    ...
}

Key properties:

  • Allocated in cxMemory

  • Persistent across multiple server operations

  • Updated incrementally as files are reparsed

  • References are not individually free’d - they’re arena-allocated

Stage 3: Session Stacks

When a user performs a PUSH operation (browsing a symbol), a session is created:

SessionStackEntry {
    BrowserMenu *menu;       // Selected symbols
    Reference *references;   // COPY of references for navigation
    Reference *current;      // Current position in navigation
    ...
}

Key properties:

  • The references list is a copy (via malloc) of references from the referenceableItemTable

  • Each reference is individually allocated with malloc (see addReferenceToList)

  • When the session is destroyed, references are individually free’d (see `freeReferences)

  • Sessions are snapshots in time - they don’t automatically update when the table changes

Why Separate Storage?

Memory ownership:

  • Table references: Arena-allocated, freed in bulk

  • Session references: Individually allocated, individually freed

If sessions pointed directly to table references, we’d have:

  • Dangling pointers when the table is updated

  • Double-free errors when sessions are destroyed

  • Memory corruption from mixed allocation strategies

Snapshots vs. live data:

  • The table is the "live" source of truth

  • Sessions are working copies for a specific browsing operation

  • Users expect their navigation stack to be stable during browsing

The Staleness Problem

The separation causes a problem: session references can become stale.

Scenario:

  1. User PUSHes symbol foo → Session created with references at lines 10, 50, 100

  2. User edits a file, adding lines

  3. User navigates with NEXT → Session still points to lines 10, 50, 100 (wrong!)

Solution: processModifiedFilesForNavigation()

When NEXT/PREVIOUS operations occur, the server:

  1. Detects which editor buffers have changed (modification time check)

  2. Reparses those buffers (updates referenceableItemTable)

  3. Rebuilds the current session’s reference list from the updated table

  4. Preserves the user’s navigation position by index

// Find user's position in old list
int currentIndex = 0;
Reference *ref = session->references;
while (ref != NULL && ref != session->current) {
    currentIndex++;
    ref = ref->next;
}

// Free old list and rebuild from table
freeReferences(session->references);
session->references = NULL;
for (BrowserMenu *menu = session->menu; menu != NULL; menu = menu->next) {
    if (menu->selected) {
        ReferenceableItem *updatedItem = lookupInTable(&menu->referenceable);
        addReferencesFromFileToList(updatedItem->references, ANY_FILE,
                                    &session->references);
    }
}

// Restore position by index
ref = session->references;
for (int i = 0; i < currentIndex && ref->next != NULL; i++) {
    ref = ref->next;
}
session->current = ref;

This keeps navigation working correctly with live-edited code.

Trade-offs of the Incremental Approach

Advantages:

  • Minimal latency - only reparses changed buffers

  • Uses editor buffer content (user’s current view)

  • Preserves navigation position naturally

Limitations:

  • Like fast update: doesn’t reparse includers of changed headers

  • Only updates the current session (other sessions on the stack remain stale)

  • Only happens during NEXT/PREVIOUS (not other operations)

For typical usage (navigating within files being actively edited), these limitations rarely matter. A fresh PUSH creates a new session with fresh references.

11.3.5. Summary

The key insights:

  1. Editor buffers are the source of truth when they exist (via preloading)

  2. Modification times are tracked uniformly for files and buffers

  3. Fast update trades completeness for speed (doesn’t chase headers)

  4. Full update is more thorough but can reparse many files

  5. References live in two places: canonical table (arena memory) and session copies (malloc)

  6. Sessions are snapshots that can become stale, requiring incremental rebuilding during navigation

11.4. How does …​

TBD.

12. Development Environment

12.1. Developing, here be dragons…​

First the code is terrible, lots of single and double character variables (cc, ccc, ..) and lost of administration on local variables rather than the structures that are actually there. And there are also a lot of macros. Unfortunately macros are hard to refactor to functions. (But I’m making progress…​)

As there is no general way to refactor a macro to a function, various techniques must be applied. I wrote a blog post about one that have been fairly successful.

But actually it’s rather fun to be able to make small changes and see the structure emerge, hone your refactoring and design skills, and working on a project that started 20 years ago which still is valuable, to me, and I hope, to others.

There should probably be a whole section on how to contribute and develop c-xrefactory but until then here’s a short list of what you need:

  • C development environment (GNU/Clang/Make/…​)

  • Unittests are written using Cgreen

  • Clean code and refactoring knowledge (to drive the code to a better and cleaner state)

Helpful would be:

  • Compiler building knowledge (in the general sense, Yacc, but AST:s and symbol table stuff are heavily used)

12.2. Setup

TBD.

12.3. Building

You should be able build c-xref using something like (may have changed over time…​)

cd src
make
make unit
make test

But since the details of the building process are somewhat contrieved and not so easy to see through, here’s the place where that should be described.

One step in the build process was generating initialization information for all the things in standard include files, which of course became very dependent on the system you are running this on. This has now moved into functions inside c-xref itself, like finding DEFINEs and include paths.

The initial recovered c-xrefactory relied on having a working c-xref for the current system. I don’t really know how they managed to do that for all the various systems they were supporting.

Modern thinking is that you should always be able to build from source, so this is something that needed change. We also want to distribute c-xref as an el-get library which requires building from source and should generate a version specific for the current system.

The strategy selected, until some better idea comes along, is to try to build a c-xref.bs, if there isn’t one already, from the sources in the repository and then use that to re-generate the definitions and rebuild a proper c-xref. See Bootstrapping.

We have managed to remove the complete bootstrapping step, so c-xrefactory now builds like any other project.

12.4. Versions

The current sources are in 1.6.X range. This is the same as the orginal xrefactory and probably also the proprietary C++ supporting version.

There is an option, "-xrefactory-II", that might indicate that something was going on. But currently the only difference seems to be if the edit server protocol output is in the form of non-structured fprintf:s or using functions in the ppc-family (either calling ppcGenRecord() or `fprint`ing using some PPC-symbol). This, and hinted to in how the emacs-part starts the server and some initial server option variables in refactory.c, indicates that the communication from the editor and the refactory server is using this. It does not look like this is a forward to next generation attempt.

What we should do is investigate if this switch actually is used anywhere but in the editor server context, and if so, if it can be made the default and the 'non-xrefactory-II' communication removed.

12.5. Coding

12.5.1. Naming

C-xref started (probably) as a cross-referencer for the languages supported (C, Java, C++), orginally had the name "xref" which became "xrefactory" when refactoring support was added. And when Mariàn released a "C only" version in 2009 most of all the "xref" references and names was changed to "c-xref". So, as most software, there is a history and a naming legacy to remember.

Here are some of the conventions in naming that are being used:

olcx

"On-line CX" (Cross-reference) ?

OLO

"On-line option" - some kind of option for the server

12.5.2. Modules and Include Files

The source code for c-xrefactory was using a very old C style with a separate proto.h where all prototypes for all externally visible functions were placed. Definitions are all over the place and it was hard to see where data is actually declared. This must change into module-oriented include-strategy.

Of course this will have to change into the modern x.h/x.c externally visible interface model so that we get clean modules that can be unittested.

The function prototypes have been now moved out to header files for each "module". Some of the types have also done that, but this is still a work in progress.

12.6. Debugging

TBD. Attachning gdb, server-driver…​

yaccp from src/.gdbinit can ease the printing of Yacc semantic data fields…​

A helpful option is the recently added -commandlog=…​ which allows you to capture all command arguments sent to the server/xref process to a file. This makes it possible to capture command sequences and "replay" them. Useful both for debugging and creating tests.

12.6.1. Arena Allocator Lifetime Violations

The preprocessor macro expansion code uses arena allocators (ppmMemory, macroBodyMemory) with stack-like discipline. Arena allocators are fast (pointer bumping) but require careful lifetime management.

The Problem Pattern

Arena allocators can only resize the most recent allocation ("top-of-stack"). A common violation occurs when trying to resize a buffer after other allocations:

buffer = ppmAllocc(size);           // Allocate buffer
marker = ppmAllocc(0);              // Save marker for cleanup
temp = ppmAllocc(tempSize);         // Allocate temporary
ppmReallocc(buffer, newSize, ...);  // ❌ FAILS - buffer not at top!
ppmFreeUntil(marker);               // Free temporaries

The correct pattern frees temporaries before growing the buffer:

buffer = ppmAllocc(size);           // Allocate buffer
marker = ppmAllocc(0);              // Save marker
temp = ppmAllocc(tempSize);         // Allocate temporary (use it)
ppmFreeUntil(marker);               // Free temporaries FIRST
ppmReallocc(buffer, newSize, ...);  // ✅ Works - buffer now at top
Lifetime Violation Guards

The arena allocator includes guards that catch lifetime violations with detailed diagnostics:

Guard 1: Buffer Resize Guard (memory.c in memoryRealloc())

Checks buffer is at top-of-stack before resizing. Provides expected vs actual locations and suggests moving ppmFreeUntil() earlier.

Guard 2: FreeUntil Bounds Guard (memory.c in memoryFreeUntil())

Ensures marker is within valid allocated range. Catches corrupted or wrong-arena markers.

Guard 3: Top-of-Stack Helper (memoryIsAtTop())

Allows explicit verification before operations requiring top-of-stack:

assert(memoryIsAtTop(&ppmMemory, buffer, oldSize));
ppmReallocc(buffer, newSize, sizeof(char), oldSize);
Example: test_collation_long_expansion

This test case triggered the violation guard when macro expansion created a very large output (19 FLAG_STRING invocations). The collate() function was calling ppmFreeUntil() after copyRemainingLexems(), which needed to grow the caller’s buffer.

The fix: move ppmFreeUntil() to before copyRemainingLexems(). By this point, temporary allocations from macro expansion were already used and could be freed, allowing the buffer to become top-of-stack again.

Debugging With Guards

When a guard triggers:

  1. Read the fatal error messages - they explain what went wrong

  2. Look for the assertion location in the stack trace

  3. Check if ppmFreeUntil() is being called too late

  4. Verify buffer growth happens after temporaries are freed

The guards turn subtle crashes into clear diagnostics that point to the fix.

12.7. Testing

12.7.1. Unittests

There are not very many unittests at this point, only covering a quarter of the code. The "units" in this project are unclear and entangled so creating unittests is hard since it was not build to be tested, test driven or even clearly modularized.

All unittests use Cgreen as the unittest framework. If you are unfamiliar with it the most important point is that it can mock functions, so you will find mock implementations of all external functions for a module in a corresponding <module>.mock file.

Many modules are at least under test, meaning there is a <module>_tests.c in the unittest directory. Often only containing an empty test.

12.7.2. Acceptance Tests

In the tests directory you will find tests that exercise the external behaviour of c-xref, "acceptance tests" or "system tests". Some tests actually do only that, they wouldn’t really count as tests as there are no verification except for the code being executed.

There are two basic strategies for the tests:

  • run a c-xref command, catch its output and verify

  • run a series of command using the EDIT_SERVER_DRIVER, collect output and results and verify

Some tests do not even test its output in any meaningful way and only provide coverage.

Some tests do a very bad job at verifying, either because my understanding at that time was very low, or because it is hard to verify the output. E.g. a "test" for generating references might only grepping the CXrefs files for some strings, not verifying that they actually point to the correct place.

Hopefully this will change as the code gets into a better state and the understanding grows.

12.7.3. Test Structure

Tests live in the tests directory and are auto-discovered by name: any directory starting with test_ will be recognized as a test case.

Each test typically includes:

  • source.c (or similar) - the code under test

  • expected - the expected output

  • Makefile - test runner that uses the boilerplate

Most tests use tests/Makefile.boilerplate which provides common macros:

include ../Makefile.boilerplate

$(TEST):
	$(COMMAND) source.c -o output.tmp
	$(NORMALIZE) output.tmp > output
	$(VERIFY)

The key macros are:

$(COMMAND)

Runs c-xref with standard options and the test’s .c-xrefrc

$(NORMALIZE)

Removes timestamps and other variable output

$(VERIFY)

Compares output with expected, removes output on success

When $(VERIFY) passes, the output file is removed. This means you can easily identify failing tests by looking for test directories that still contain an output file. The utils/failing script lists these.

To suspend a test (skip it during test runs), create a .suspended file in the test directory.

12.7.4. General Setup

Since all(?) c-xref operation rely on an options file which must contain absolute file paths (because the server runs as a separate process) it must be generated whenever the tests are to be run in a different location (new clone, test was renamed, …​).

This is performed by using a common template in tests and a target in tests/Maefile.boilerplate.

Each test should have a clean target that removes any temporary and generated files, including the .c-xrefrc file and generated references. This way it is easy to ensure that all tests have updated .c-xrefrc files.

12.7.5. Edit Server Driver Tests

Since many operations are performed from the editor, and the editor starts an "edit server" process, many tests need to emulate this behaviour.

The edit server session is mostly used for navigation. Refactorings are actually performed as separate invocations of c-xref.

In utils there is a server_driver.py script, which will take as input a file containing a sequence of commands. You can use this to start an edit, refactory or reference server session and then feed it with commands in the same fashion as an editor would do. The script also handles the communication through the buffer file (see [Editor Interface](./Design:-Editor-Interface)).

12.7.6. Creating More Edit Server Tests

You can relatively easy re-create a sequence of interactions by using the sandboxed Emacs in tests/sandboxed_emacs.

There are two ways to use it, "make spy" or "make pure". With the "spy" an intermediate spy is injected between the editor and the edit server, capturing the interaction to a file.

With "pure" you just get the editor setup with c-xref-debug-mode and c-xref-debug-preserve-tmp-files on. This means that you can do what ever editor interactions you want and see the communication in the *Messages* buffer. See [Editor Interface](./Design:-Editor-Interface) for details.

Once you have figure out which part of the *Messages* buffer are interesting you can copy that out to a file and run utils/messages2commands.py on it to get a file formatted for input to server_driver.py.

the messages2commands script converts all occurrences of the current directory to CURDIR so it is handy to be in the same directory as the sources when you run the conversion.
the messages2commands script removes any -preload so you need to take care that the positions inside the buffers are not changed between interactions lest the -olcursor and -olmark will be wrong. (You can just undo the change after a refactoring or rename). Of course this also applies if you want to mimic a sequence of refactorings, like the jexercise move method example. Sources will then change so the next refactoring works from content of buffers, so you have to handle this specifically.
-preload is the mechanism where the editor can send modified buffers to c-xref so thay you don’t have to save between refactorings, which is particularly important in the case of extract since the extraction creates a default name which the editor then does a rename of.

12.8. Utilities

12.8.1. Covers

utils/covers.py is a Python script that, in some enviroments, can list which test cases execute a particular line.

This is handy when you want to debug or step through a particular part of the code. Find a test that covers that particular line and run it using the debugger (usually make debug in the test directory).

Synopsis:

covers.py <file> <line>

12.8.2. Sandboxed

utils/sandboxed starts a sandboxed Emacs that uses the current elisp code and the c-xref from src. This allows you to run a test changes without having to polute your own setup.

This actually runs the tests/sandboxed_emacs pure version, which also sets up a completely isolated Emacs environment with its own packages loaded, configuration etc. See below.

Synopsis:

sandboxed

12.9. Debugging the protocol

There is a "pipe spy" in tests/sandboxed_emacs. You can build the spy using

make spy

and then start a sandboxed Emacs which invokes the spy using

make

This Emacs will be sandboxed to use its own .emacs-files and have HOME set to this directory.

The spy will log the communication between Emacs and the real c-xref (src/c-xref) in log files in /tmp.

NOTE that Emacs will invoke several instanced of what it believes is the real c-xref so there will be several log files to inspect.

13. Deployment

TBD.

14. Decision Log

This chapter provides a historical record of significant design and architectural decisions made during the evolution of c-xrefactory.

14.1. Overview

The decision log documents choices that have shaped the architecture, implementation, and direction of c-xrefactory. Most decisions from the original 1990s development are lost to history, but as they can be deduced from the codebase and commit history, they are being retroactively documented.

All architectural decisions are recorded using the Architecture Decision Record (ADR) format and stored in the adr/ directory. These ADRs are automatically integrated into the Structurizr documentation system.

14.2. Viewing Decision Records

The ADRs can be accessed in several ways:

  1. Via Structurizr: When viewing the Structurizr documentation, navigate to the "Decisions" section to see all ADRs with cross-references and visualizations.

  2. Directly in the repository: Browse the adr/ directory for markdown files containing individual decision records.

  3. Command line: Use ls doc/adr/*.md from the project root to list all decisions.

14.3. Decision Categories

Current ADRs cover several categories:

  • Simplification decisions: Removing unused features (Java support, HTML generation, etc.)

  • Tooling decisions: Choice of ADR format, documentation system

  • Configuration decisions: Automatic config file discovery

  • Format decisions: Reference data storage format

For the complete list of decisions and their rationale, see the ADR directory or the Decisions section in the Structurizr documentation.

14.4. Creating New ADRs

When making significant architectural decisions:

  1. Copy the template from adr/templates/

  2. Number it sequentially (e.g., 0012-description.md)

  3. Fill in the context, decision, and consequences

  4. Commit it alongside the implementation

  5. Reference it in commit messages and pull requests

See ADR-0007 for details on the ADR format and process.

15. Roadmap

This chapter outlines the development roadmap for c-xrefactory, tracking completed work, current priorities, and future architectural goals.

15.1. Guiding Principles

The roadmap is guided by these principles:

  • Incremental improvement: Each step should provide immediate value while moving toward long-term goals

  • Test-driven modernization: Maintain 85%+ test coverage to enable confident refactoring

  • Backward compatibility: Preserve existing Emacs workflows while enabling modern IDE integration

  • Architectural simplification: Replace artificial mode distinctions with unified, smart on-demand behavior

  • Legacy code respect: Work with the existing 1990s codebase thoughtfully, not against it

15.2. Recent Accomplishments (2025 Q1)

15.2.1. C Preprocessor Compliance Fixes

Problem: c-xrefactory was incorrectly expanding function-like macros during ## token pasting, violating C99 §6.10.3.3 which mandates that operands of ## should NOT be macro-expanded before concatenation.

Impact: Caused SIGSEGV in test_ffmpeg when processing complex standard library macros like INT64_MAX which expands to (INT64_C(9223372036854775807)) where INT64_C(c) is defined as c ## L.

Fix: Removed macro expansion logic from collate() function in src/yylex.c:1749-1797. Now matches GCC behavior exactly.

Test Updates: Updated 5 test cases to match correct GCC preprocessor behavior:

  • test_collate_empty

  • test_token_pasting_with_expansion

  • test_token_pasting_with_expansion_lhs

  • test_token_pasting_with_expansion_rhs

  • test_token_pasting_with_illegal_expansion

15.2.2. Memory Scaling Improvements

Problem: Large projects like ffmpeg were hitting the 30MB stackMemory limit during symbol table construction.

Root Cause: Stack memory is a persistent arena allocator that accumulates symbol tables across all parsed files. Growth is expected and normal for large projects.

Fix: Increased SIZE_stackMemory from 30MB to 100MB in src/constants.h:19.

Diagnostic Additions: Added nesting level and stack index tracking in src/xref.c to monitor memory usage patterns and detect unbalanced block nesting.

15.2.3. Caching System Removal

Decision: Removed the complex lexem stream caching mechanism to enable architectural refactoring.

Impact: Performance regression for large multi-file parsing (6.4× slowdown on test_ffmpeg/test_systemd), but enables critical modernization work.

Rationale: The ~300 lines of cache code were deeply intertwined with parsing and memory management, blocking refactoring efforts. Removal provides clearer code structure at the cost of temporary performance regression.

See: ADR-0012: Remove Lexem Stream Caching for detailed rationale, trade-offs, and future optimization strategies.

15.3. Short-term Goals

15.3.1. Test Coverage Expansion

Current Status: 85%+ overall coverage achieved (January 2025)

Priority: MEDIUM (maintenance mode)

Ongoing Work:

  • Maintain 85%+ overall coverage as new features are added

  • Add focused tests for LSP integration points as they’re implemented

  • Systematic testing of error handling paths in new code

  • Monitor coverage regressions during refactoring

15.3.2. LSP Integration

Current Status: Experimental support for modern editor integration

Priority: HIGH - Opens c-xrefactory to wider audience beyond Emacs

Value: Enable VS Code, Neovim, and other LSP-capable editors to use c-xrefactory’s refactoring and navigation features.

Current Capabilities:

  • ✓ Go-to-definition for functions, global variables, types (single file)

  • ✓ Files parsed as opened, immediate symbol availability

Key Limitations:

  • Single-file scope only (must open file before navigating to it)

  • Local variables not supported

  • Only textDocument/definition implemented

Next Steps: See Chapter 16: LSP Integration for details and Chapter 17 for technical architecture.

15.3.3. Documentation Completeness

Current Status: Architecture documentation mostly complete

Priority: MEDIUM

Goals:

  • Complete Structurizr C4 model with all major components

  • Document memory management strategies across all arenas

  • Create user guide for LSP integration

  • Document LSP vs Emacs feature parity matrix

15.4. Medium-term Goals

Focused on internal code quality and architecture improvements that enable future features. Detailed in Chapter 17: Major Codebase Improvements.

15.4.1. Code Quality Improvements

LexemStream API: Reduce pointer parameter proliferation in macro expansion code (6+ params → 3-4). Foundation for further modularization.

Macro Module Extraction: Separate 800 lines of macro expansion logic from yylex.c for better testability and maintainability.

Header Re-parsing Optimization: Address performance regression from cache removal. Make incremental builds faster.

15.4.2. Architectural Improvements

Clean Persistence Store: Separate in-memory reference database from disk storage (.cx files). Enables LSP integration and testing without files.

Unified Symbol Database: Transition from batch-mode (-create required) to on-demand parsing. Enables zero cold-start for LSP.

Multi-file Support: Build include graph, enable cross-file navigation. Critical for full LSP feature parity.

15.5. Long-term Vision

User-facing features and modern IDE capabilities enabled by architectural improvements:

Additional Refactorings: Extract expression (not just statements), smart helper function detection, preview dialogs.

Full LSP Protocol: Complete textDocument/references, hover, completion, rename, document symbols, workspace search.

Modern IDE Showcase: VS Code extension, Neovim plugin, performance optimization for large projects.

For technical details on these items, see Chapter 17: Major Codebase Improvements and Chapter 16: Planned Features.

16. Planned Features

This chapter documents planned user-facing features—new refactorings, navigation capabilities, and editor integrations. These represent functionality users will interact with directly.

For internal architectural improvements and code quality work, see Chapter 17: Major Codebase Improvements.

For detailed specifications of refactoring operations (both existing and planned), see Chapter 19: Refactoring Recipes.

16.1. Move Function Between Files

Status: Phase 1 MVP complete (December 2024)

16.1.1. Use Case

Reorganize code by moving function definitions between source files while maintaining correctness. Essential for refactoring large codebases into better module structure.

Example: Extract a utility function from main.c to utils.c, automatically handling visibility changes and header declarations.

16.1.2. What Works Now

  • Move a function from one source file to another

  • Automatically removes static keyword (makes function externally accessible)

  • Adds extern declaration to target file’s header

  • Preserves comments and function decorations

  • Works for both C and Yacc files

16.1.3. Current Limitations

  • Header declaration placement may need manual adjustment

  • You must manually add required #include directives

  • Static helper functions aren’t moved automatically (Phase 3 goal)

16.1.4. Planned Improvements

Smarter header placement: Automatically find the right location in header files based on existing declarations and dependencies.

Helper function detection: Identify and optionally move static helper functions that the moved function depends on. Prevents broken builds.

Include management: Automatically add necessary #include directives based on dependencies.

Preview: Show what will be changed before applying the refactoring.

For technical implementation details, see Chapter 17: Clean Parser API.

16.2. LSP Integration

Status: EXPERIMENTAL (January 2026)

16.2.1. Use Case

Language Server Protocol (LSP) integration enables c-xrefactory to work with modern editors and IDEs (VS Code, vim/neovim with LSP plugins, etc.) beyond Emacs. This opens c-xrefactory’s refactoring and navigation capabilities to a much wider audience.

16.2.2. What Works

Go to Definition (textDocument/definition):

  • ✓ Functions - Jump to function definition from any call site

  • ✓ Global variables - Navigate to variable declarations

  • ✓ Types - Find typedef and struct definitions

File Parsing:

  • ✓ Files are parsed as they’re opened in the editor

  • ✓ Symbols become available for navigation immediately

16.2.3. Current Limitations

Single-file scope: Only symbols in the currently opened file are accessible. To navigate to a function in another file, you must first open that file.

No local variables: Go-to-definition doesn’t work for local variables or function parameters. This works in Emacs mode but requires architectural changes for LSP.

Missing LSP features: Only textDocument/definition is implemented. Coming features:

  • Find all references

  • Hover information

  • Code completion

  • Rename refactoring

  • Organize imports

16.2.4. How to Use

See the README for LSP client configuration examples. Basic setup:

  1. Build c-xrefactory with make

  2. Configure your editor’s LSP client to use c-xref -lsp as the server command

  3. Open C or Yacc files

  4. Use your editor’s "go to definition" command

17. Major Codebase Improvements

This chapter documents internal architectural changes and technical debt reduction efforts that improve code quality, maintainability, and performance. These are improvements to c-xrefactory’s own implementation, not features users interact with directly.

For planned user-facing features, see Chapter 16: Planned Features.

17.1. Clean Persistence Store Abstraction

17.1.1. Problem Statement

The cxfile module currently mixes multiple responsibilities that should be separated:

  • Persistence implementation (reading/writing .cx files) ✓ Correct level

  • Search and filter operations (matching search strings) ✗ Wrong level

  • Operation-specific loading (menu creation, macro completion, unused detection) ✗ Too specific

  • Treated as source of truth (when it should be just a cache) ✗ Architectural confusion

This mixing creates tight coupling, makes testing difficult, and prevents future architectural improvements (like LSP integration or alternative storage backends).

17.1.2. Architectural Confusion: In-Memory vs Persistence

The current code conflates two distinct concepts:

Reference Database (Smart Layer - Source of Truth)
  • Always in-memory: The referenceableItemTable hash table

  • Answers queries fast from RAM

  • Decides when to load from persistence

  • Decides when to save to persistence

  • Manages invalidation and staleness

  • This is the actual database

Persistence Store (Dumb Layer - Durable Cache)
  • Always on disk/database: The .cx files (or future SQLite, etc.)

  • Just knows how to save/load a specific format

  • Has NO in-memory state of its own

  • No business logic, no searching, no filtering

  • This is just the storage mechanism

The persistence layer is always about durable storage that survives restarts. "In-memory persistence" is a contradiction - persistence means surviving across sessions.

17.1.3. Current Architecture Problems

Mixed Abstraction Levels

The current cxfile.h public interface exposes:

// Operation-specific scanning (too high-level for storage)
extern void scanReferencesToCreateMenu(char *symbolName);
extern void scanForMacroUsage(char *symbolName);
extern void scanForGlobalUnused(char *cxrefFileName);
extern void scanForSearch(char *cxrefFileName);

// Implementation details leaked (partitioning is internal)
extern int cxFileHashNumberForSymbol(char *symbol);
extern void searchSymbolCheckReference(ReferenceableItem *item, Reference *ref);

// Generic API (already identified as "Abstract API")
extern bool loadFileNumbersFromStore(void);
extern void ensureReferencesAreLoadedFor(char *symbolName);
extern void saveReferencesToStore(bool updating, char *name);

Problems:

  • Four scan* functions: Encode specific use cases (menu, macro, unused, search) instead of providing generic operations

  • Implementation details exposed: cxFileHashNumberForSymbol() is a partitioning optimization that shouldn’t be public

  • Search logic in storage: searchSymbolCheckReference() mixes filtering with persistence

  • Side-effect heavy: All functions mutate global referenceableItemTable implicitly

Tight Coupling to Use Cases

Currently cxfile.c knows about:

  • Browser menus (scanReferencesToCreateMenu)

  • Macro completion (scanForMacroUsage)

  • Unused symbol detection (scanForGlobalUnused)

  • Symbol search (scanForSearch)

A persistence layer shouldn’t know about any of these! These are clients of storage, not responsibilities of storage.

Wrong Source of Truth

From the navigation debugging (see Insights chapter), we discovered:

// In cxref.c - loads FROM DISK directly
scanReferencesToCreateMenu(symbolName);  // Treats .cx files as truth

This bypasses the in-memory referenceableItemTable, treating disk as authoritative. But the actual truth is:

referenceableItemTable = Disk state (.cx files) + Preloaded editor buffers

The disk is just a cache of the last saved state, not the current truth.

17.1.4. Proposed Architecture

Two-Layer Design
┌──────────────────────────────────────────────────┐
│  Reference Database (Smart Layer)                │
│  Source of Truth: referenceableItemTable (RAM)   │
│                                                  │
│  - Fast in-memory queries                        │
│  - Knows what's current vs stale                 │
│  - Handles invalidation/refresh                  │
│  - Decides when to load/save                     │
│  - Unified interface for all clients             │
└──────────────────────────────────────────────────┘
                      ↓
        (only when loading/saving needed)
                      ↓
┌──────────────────────────────────────────────────┐
│  Persistence Store (Dumb Layer)                  │
│  Durable Cache: .cx files on disk                │
│                                                  │
│  - Just save/load binary format                  │
│  - No business logic                             │
│  - No in-memory state                            │
│  - Can be swapped (SQLite, Protobuf, etc.)       │
└──────────────────────────────────────────────────┘
Clean Persistence Store Interface
// persistence_store.h - Pure storage operations
#ifndef PERSISTENCE_STORE_H_INCLUDED
#define PERSISTENCE_STORE_H_INCLUDED

#include "referenceableitem.h"

/* ============================================
 * Store Lifecycle
 * ============================================ */

// Load metadata (file numbers, timestamps)
extern bool persistenceLoadMetadata(void);

// Persist all references to durable storage
extern void persistenceSaveAll(bool updating, char *location);

/* ============================================
 * Reference Loading (into referenceableItemTable)
 * ============================================ */

// Load references for specific symbol into memory table
// Returns true if symbol found in storage
extern bool persistenceLoadSymbol(char *symbolName);

/* ============================================
 * Bulk Scanning
 * ============================================ */

// Scan entire storage, calling visitor for each reference
// Visitor decides what to do with each reference
typedef void (*PersistenceVisitor)(ReferenceableItem *item,
                                    Reference *ref,
                                    void *context);
extern void persistenceScanAll(PersistenceVisitor visitor, void *context);

#endif
Reference Database Interface

This layer provides the "smart" operations that clients actually need:

// reference_database.h - Smart operations (initially thin wrapper)
#ifndef REFERENCE_DATABASE_H_INCLUDED
#define REFERENCE_DATABASE_H_INCLUDED

#include "referenceableitem.h"
#include "position.h"

// Initialize the in-memory reference database
extern bool refDbInitialize(void);

// Lookup symbol in the in-memory table
// Automatically loads from persistence if not in memory
extern ReferenceableItem* refDbLookupSymbol(char *name);

// Get all references for a symbol
// Ensures data is loaded and current
extern Reference* refDbGetReferences(char *symbolName);

// Mark a file's references as stale (needs reload)
extern void refDbInvalidateFile(int fileNumber);

// Persist the current in-memory state
extern void refDbSave(bool updating, char *location);

#endif
This interface starts simple but is where the Unified Symbol Database refactoring (next section) will add smart on-demand loading, dependency tracking, and invalidation logic.

17.1.5. Migration Strategy

Phase 1: Create Clean Boundaries (No Behavior Change)
  1. Rename cxfile.hcxfile_internal.h (implementation detail)

  2. Create persistence_store.h with clean interface

  3. Create persistence_store.c that delegates to cxfile_* functions

  4. Update all callers to use persistence_store.h instead of cxfile.h

Result: Same behavior, cleaner names, clear intent.

Phase 2: Extract Operation Logic

Move operation-specific logic OUT of persistence layer:

Search Logic

searchSymbolCheckReference() → new search.c module

// Before: In cxfile.c (WRONG LEVEL)
void searchSymbolCheckReference(ReferenceableItem *item, Reference *ref) {
    if (searchStringMatch(...)) {
        reportMatch(item, ref);
    }
}

// After: In search.c (RIGHT LEVEL)
void searchForSymbol(char *pattern) {
    persistenceScanAll(checkSearchMatch, pattern);
}

static void checkSearchMatch(ReferenceableItem *item, Reference *ref, void *ctx) {
    if (searchStringMatch(item->linkName, (char *)ctx)) {
        reportMatch(item, ref);
    }
}
Unused Detection

Extract from cxfile.c to refactorings.c or new analysis.c

Menu Creation

Keep in cxref.c but use clean refDbGetReferences() instead of direct disk scanning

Phase 3: Hide Implementation Details

Make these functions static (internal to cxfile.c):

  • cxFileHashNumberForSymbol() - partitioning is implementation detail

  • searchSymbolCheckReference() - moved to search module

  • All internal scanning logic

Phase 4: Consolidate Scanning Functions

Replace four specific scanners with generic visitor pattern:

Old (Use-Case Specific) New (Generic)

scanReferencesToCreateMenu(name)

refDbGetReferences(name)

scanForMacroUsage(name)

refDbGetReferences(name)

scanForGlobalUnused(location)

persistenceScanAll(checkUnused, ctx)

scanForSearch(location)

persistenceScanAll(checkMatch, pattern)

17.1.6. Benefits

Testability

Can mock persistence_store.h for testing navigation without .cx files

Future Storage Backends

Could implement with SQLite, in-memory testing mock, or different binary format

Clear Boundaries

Storage vs search vs navigation clearly separated

Simpler Interface

4 functions instead of 9, clearer responsibilities

Less Coupling

Parsing code doesn’t call storage directly, goes through reference database

LSP Readiness

Reference database layer is where LSP textDocument/didChange integration will live

17.1.7. Relationship to Unified Symbol Database

This refactoring is the foundation for the larger Unified Symbol Database Architecture (next section):

  • Clean persistence layer enables swapping storage backends without affecting clients

  • Reference database layer is where smart on-demand loading logic will live

  • Separation of concerns makes the unified architecture clearer

Migration path:

  1. First: Clean up persistence (this refactoring)

  2. Then: Add smart on-demand logic to reference database (next refactoring)

  3. Finally: Both Emacs and LSP use same unified code path

17.1.8. Implementation Checklist

  • Create persistence_store.h with proposed interface

  • Create persistence_store.c delegating to cxfile

  • Rename cxfile.hcxfile_internal.h

  • Create reference_database.h (thin wrapper initially)

  • Create reference_database.c delegating to persistence

  • Update all callers to use new interfaces

  • Move search logic from cxfile to search module

  • Move unused detection logic to appropriate module

  • Make cxFileHashNumberForSymbol static

  • Make searchSymbolCheckReference static or remove

  • Add unit tests for both interfaces

  • Update documentation

17.1.9. References

  • Current implementation: src/cxfile.c, src/cxfile.h

  • In-memory table: src/reftab.c (referenceableItemTable)

  • Navigation architecture: Chapter 17 (Insights) - Navigation Architecture section

17.2. Unified Symbol Database Architecture

17.2.1. Problem Statement

The current symbol database has evolved from a batch cross-referencer (like ctags) with artificial distinctions between "file-based" and "on-demand" modes. This creates:

  • Mode complexity: Different code paths for Emacs vs LSP clients

  • Cold start problems: Requires upfront -create operation before use

  • Manual updates: Users must remember to run -update after changes

  • Inconsistent behavior: Different modes provide different guarantees

  • Maintenance overhead: Multiple implementations to maintain and test

17.2.2. Current Architecture Limitations

The existing system distinguishes between:

Aspect File-Based Mode On-Demand Mode

Cold Start

Requires -create first

Parse file immediately

Warm Queries

O(1) hash lookup

O(file_size) parsing

Memory Usage

Low (streaming)

High (in-memory cache)

Incremental Updates

Smart file tracking

Per-file invalidation

Multi-project

Separate databases

Workspace-scoped

17.2.3. Proposed Solution

Core Insight: Unified On-Demand Architecture

Both Emacs and LSP clients want the same thing: up-to-date symbol and reference information. The distinction between "file-based" and "on-demand" modes is artificial complexity. Instead, c-xrefactory should provide a unified interface that:

  1. Always ensures information is current using existing dependency tracking

  2. Scans incrementally only what’s needed, when needed

  3. Uses .cx files as persistent cache for optimization

  4. Eliminates cold start problems by avoiding upfront full-project scanning

Simplified Interface Design
typedef struct SymbolDatabase SymbolDatabase;

typedef struct {
    // Unified operations for any client (Emacs or LSP)
    Symbol* (*lookupSymbol)(SymbolDatabase* db, const char* name, Position pos);
    ReferenceList* (*getReferences)(SymbolDatabase* db, const char* name, Position pos);
    ReferenceList* (*getOccurrences)(SymbolDatabase* db, const char* name, Position pos);

    // All complexity hidden in implementation:
    // - File modification checking (existing: checkFileModifiedTime)
    // - Include dependency tracking (existing: cachedIncludedFilePass)
    // - Incremental scanning (existing: makeIncludeClosureOfFilesToUpdate)
    // - Persistent caching (existing: .cx file system)
} SymbolDatabaseOperations;
Implementation Strategy: Smart On-Demand
This refactoring builds on the Clean Persistence Store foundation (previous section). The smart logic lives in the in-memory reference database layer, while .cx files remain a durable cache for persistence.

The implementation leverages existing sophisticated logic:

Symbol* lookupSymbol(const char* name, Position pos) {
    // 1. Check if in-memory table has current information
    if (refTableHasCurrent(name, pos)) {
        return lookupInMemoryTable(name, pos);  // Answer from RAM
    }

    // 2. Load from persistence if not in memory
    if (!refTableHasSymbol(name)) {
        persistenceLoadSymbol(name);  // Load .cx into referenceableItemTable
    }

    // 3. Check if loaded data is stale (file modified since load)
    if (refTableDataIsStale(name, pos)) {
        // Use existing dependency tracking to scan minimal set
        FileList* filesToScan = calculateDependencyClosure(pos.file);
        for (FileItem* file : filesToScan) {
            if (checkFileModifiedTime(file->fileNumber)) {
                parseFileAndUpdateTable(file);  // Reparse into memory
            }
        }
    }

    // 4. Answer from in-memory table (now guaranteed current)
    return lookupInMemoryTable(name, pos);
}

Architectural Layers:

Query → Reference Database (in-memory, smart) → Persistence Store (disk, dumb)
        ↑                                         ↑
        Always answers from RAM                   Only for load/save

Key Benefits:

  • No artificial modes - same code path for all clients

  • No cold start - first lookup triggers minimal necessary scanning

  • Incremental by design - only scans files that need updating

  • In-memory speed - all queries answered from RAM (referenceableItemTable)

  • Durable cache - .cx files persist state across sessions

  • Existing logic reuse - leverages proven dependency tracking system

Legacy Architecture Recognition

c-xrefactory evolved from a batch cross-referencer (like ctags) and was enhanced for real-time use:

# Legacy batch workflow:
c-xref -create project.c     # Full scan, build .cx database
c-xref -update modified.c    # Incremental update
c-xref -olcxpush symbol      # Query pre-built database

# Unified approach:
c-xref -server               # Start server, scan on-demand as needed
c-xref -lsp                  # Same logic, different protocol
The .cx files are a durable cache that persists the in-memory referenceableItemTable between sessions. They are an optimization, not the database itself. The actual database is always in RAM.

17.2.4. Implementation Plan

Phase 1: Interface Unification
  • Create unified SymbolDatabase interface

  • Wrap existing logic in smart on-demand implementation

  • Replace explicit -create/-update commands with automatic dependency checking

  • Both Emacs and LSP use same code path

Phase 2: Optimization
  • Enhance existing dependency tracking for finer-grained invalidation

  • Optimize in-memory caching strategies

  • Background .cx file maintenance for long-running sessions

  • Performance tuning for large codebases

17.2.5. Benefits

Architectural Simplification
  • Single code path for both Emacs and LSP clients - eliminates maintenance overhead

  • No mode distinctions - same smart logic serves all use cases optimally

  • Leverages existing logic - reuses proven dependency tracking and caching systems

  • Reduced complexity - eliminates artificial FILE_BASED/ON_DEMAND/HYBRID modes

User Experience Improvements
  • Zero configuration - works immediately on any C project without setup

  • No cold start delay - first symbol lookup triggers minimal necessary scanning

  • Transparent caching - .cx files automatically maintained as performance optimization

  • Consistent behavior - same results whether using Emacs or modern IDE with LSP

Performance Characteristics
  • Minimal initial cost - avoids expensive upfront full-project scanning

  • Smart incremental updates - only rescans files that have actually changed

  • Automatic dependency tracking - includes files affected by changes get updated

  • Persistent optimization - analysis results cached across sessions

Development Benefits
  • Backward compatibility - existing Emacs workflows continue unchanged

  • Forward compatibility - natural path to modern LSP integration

  • Reduced maintenance - single implementation instead of multiple modes

  • Enhanced testability - unified logic easier to test comprehensively

17.2.6. Existing Infrastructure

Sophisticated Dependency Tracking

The unified approach leverages c-xrefactory’s existing sophisticated dependency management that handles include file relationships automatically:

File Modification Tracking (filetable.h):

typedef struct fileItem {
    char *name;
    time_t lastModified;        // Last known modification time
    time_t lastInspected;       // Last time we checked
    time_t lastUpdateMtime;     // Last update cycle time
    time_t lastFullUpdateMtime; // Last full update time
    // ... scheduling and state flags
} FileItem;

bool checkFileModifiedTime(int fileNumber);

Include Dependency Tracking (yylex.c):

void pushInclude(FILE *file, EditorBuffer *buffer, char *name, char *prepend) {
    // ... setup include stack
    includeStack.stack[includeStack.pointer++] = currentFile;
    // Track include relationships for dependency analysis
}

Automatic Include Closure (xref.c:81-108):

static void makeIncludeClosureOfFilesToUpdate(void) {
    // If file A includes file B, and B is modified, A gets scheduled for update
    // This uses the reference database to track include relationships
    bool fileAddedFlag = true;
    while (fileAddedFlag) {
        // Iterative closure: keeps adding dependent files until stable
        for (all scheduled files) {
            find_all_files_that_include_this_file();
            schedule_them_for_update();
        }
    }
}
This dependency tracking infrastructure is already production-ready and handles the complex cases (transitive dependencies, modification time checking, include stack management). The unified symbol database can leverage this existing logic instead of reimplementing dependency management.

17.2.7. Open Questions

  1. Should we maintain backward compatibility with explicit -create/-update commands?

    • Probably yes, at least as no-ops or aliases to make transition easier

  2. How to handle very large projects (>1M LOC)?

    • May need workspace-level configuration for incremental scanning thresholds

    • Consider lazy loading of symbol data

  3. What’s the migration path for existing users?

    • Existing .cx files should continue to work

    • Auto-migrate on first run with new version

    • Provide clear documentation on new behavior

17.2.8. References

  • Current implementation: src/cxfile.c, src/xref.c

  • File tracking: src/filetable.h, src/filetable.c

  • Dependency tracking: src/xref.c lines 81-108

  • Current database description: See chapter 08 (Code) - Reference Database section

17.3. Extract Macro Expansion Module

17.3.1. Problem Statement

The yylex.c file is 2353 lines and combines multiple responsibilities:

  • Lexical analysis and token reading

  • File and buffer management

  • Preprocessor directive processing

  • Macro expansion system (~800 lines)

The macro expansion code is a substantial, cohesive subsystem that would benefit from extraction into its own module. Currently, it’s deeply embedded in yylex.c, making both lexing and macro expansion harder to understand and test in isolation.

17.3.2. Current Architecture

The macro expansion system in yylex.c comprises:

Core Responsibilities (~800 lines)
  • Macro call expansion - Main orchestration (expandMacroCall())

  • Argument processing - Collection and recursive expansion

  • Token collation - ## operator implementation

  • Stringification - # operator implementation

  • Memory management - Separate arenas for macro bodies (MBM) and arguments (PPM)

  • Cyclic detection - Preventing infinite macro recursion

Key State
int macroStackIndex;  // Current macro expansion depth
static LexemStream macroInputStack[MACRO_INPUT_STACK_SIZE];
static Memory macroBodyMemory;      // Long-lived: macro definitions
static Memory macroArgumentsMemory; // Short-lived: expansion temporaries
Memory Lifetime Separation

The system uses two distinct memory arenas with different lifetimes:

  • MBM (Macro Body Memory): Persistent storage for macro definitions throughout compilation

  • PPM (PreProcessor Memory): Temporary storage for expansion, collation, and argument processing

This separation is fundamental and should be preserved in any refactoring.

17.3.3. Proposed Solution

Extract macro expansion into a new module: macroexpansion.c/h

Public Interface

The new module would expose a minimal, focused API:

// Initialization
void initMacroExpansion(void);
int getMacroBodyMemoryIndex(void);
void setMacroBodyMemoryIndex(int index);

// Core expansion
bool expandMacroCall(Symbol *macroSymbol, Position position);
bool insideMacro(void);
int getMacroStackDepth(void);

// Memory allocation (exposed for macro definition processing)
void *macroBodyAlloc(size_t size);
void *macroBodyRealloc(void *ptr, size_t oldSize, size_t newSize);
void *macroArgumentAlloc(size_t size);
Module Boundaries

What moves to macroexpansion.c:

  • Macro call expansion and argument processing

  • Token collation (collate() and helpers)

  • Stringification (macroArgumentsToString())

  • Cyclic call detection

  • MBM/PPM memory management

  • Buffer expansion utilities (expandPreprocessorBufferIfOverflow(), etc.)

What remains in yylex.c:

  • Lexing and file input

  • Preprocessor directive processing (#define, #ifdef, etc.)

  • Include file handling

  • Main yylex() function

  • Macro symbol table operations

Dependencies:

The macro module would depend on:

  • Lexem stream operations (reading/writing)

  • Symbol lookup (findMacroSymbol())

  • Cross-referencing (for collation and expansion references)

  • Current input state (via accessor functions)

17.3.4. Benefits

Architectural
  • Separation of concerns: Lexing vs. preprocessing clearly separated

  • Reduced file size: yylex.c drops from 2353 → ~1550 lines (34% reduction)

  • Testability: Macro expansion can be unit tested independently

  • Clearer ownership: Macro state and memory management centralized

Maintainability
  • Focused modules: Each file has a single, clear purpose

  • Easier reasoning: Macro behavior isolated from lexer concerns

  • Better documentation: Module-level documentation for macro system

Future flexibility
  • Could support different macro systems (C vs. C++)

  • Easier to add macro debugging/tracing

  • Independent optimization of macro expansion

17.3.5. Implementation Strategy

Phase 1: Preparation (Already Complete)

✓ Create LexemBufferDescriptor type for buffer management
✓ Refactor buffer expansion functions to use descriptor
✓ Eliminate return values for size updates

Phase 2: Create Module Structure
  • Create macroexpansion.h with public interface

  • Create macroexpansion.c with initial implementations

  • Move LexemBufferDescriptor to appropriate header

  • Create accessor functions for currentInput state

Phase 3: Incremental Function Migration

Move functions in this order (lowest risk first):

  1. Memory management - MBM/PPM allocation functions

  2. Buffer expansion - expandPreprocessorBufferIfOverflow(), expandMacroBodyBufferIfOverflow()

  3. Support utilities - cyclicCall(), prependMacroInput()

  4. Token processing - collate(), resolveMacroArgument(), etc.

  5. Core expansion - expandMacroCall(), createMacroBodyAsNewStream(), etc.

Phase 4: Integration and Cleanup
  • Update yylex.c to use new interface

  • Run full test suite after each migration step

  • Add focused unit tests for macro expansion

  • Update build system

  • Document the new architecture

17.3.6. Risks and Mitigation

Risk: Complex dependencies

Mitigation:

  • Create clear accessor functions for shared state

  • Use incremental approach - one function group at a time

  • Validate with tests after each step

Risk: Performance overhead

Mitigation:

  • Keep critical functions inline where necessary

  • Profile before/after migration

  • Current code already has abstraction layers

Assessment: Low risk - macro operations are complex enough that function call overhead is negligible

Risk: Breaking existing tests

Mitigation:

  • Run test suite after every migration step

  • Keep interface behavior identical

  • Use compiler to catch interface mismatches

17.3.7. Success Metrics

  • All existing tests pass

  • yylex.c reduced to ~1550 lines

  • New focused tests for macro expansion added

  • No performance regression (< 5% overhead acceptable)

  • Code review confirms improved clarity

17.3.8. Open Questions

  1. Should findMacroSymbol() move to the macro module or stay in yylex.c?

    • It’s used by both lexer (for expansion triggering) and macro module (for nested expansions)

    • Probably belongs in a shared location or as part of symbol table operations

  2. How to handle currentInput global state?

    • Options: Pass explicitly, use accessor functions, or provide context structure

    • Accessor functions likely cleanest: getCurrentInput(), setCurrentInput()

  3. Should we extract preprocessor directives at the same time?

    • No - keep changes focused

    • Could be a future refactoring after macro extraction proves successful

17.3.9. References

  • Current code: src/yylex.c lines 1327-2089 (macro expansion system)

  • Memory management: src/memory.h, src/memory.c

  • Symbol operations: src/symbol.h

  • Related refactoring: [LexemStream API Improvements] addresses buffer management patterns


This refactoring is independent of the LexemStream API improvements but would benefit from them being completed first, as they simplify buffer management patterns throughout the macro expansion code.

17.4. Move Function Between Files

17.4.1. Problem Statement

Moving functions between C source files is a frequent manual refactoring task that requires:

  • Cutting function definition from source file

  • Pasting into target file

  • Changing static to extern (or vice versa)

  • Adding function declaration to appropriate header

  • Updating #include directives in both files

  • Ensuring all dependencies (headers, types, macros) are available in target file

This is error-prone and tedious, especially for functions with complex dependencies.

17.4.2. Discovery: Existing Move Refactorings in c-xref-java

The Java-supporting version at /home/thoni/Utveckling/c-xref-java contains substantial move refactoring infrastructure:

// From c-xref-java/src/refactorings.def
#define AVR_MOVE_STATIC_METHOD 80
#define AVR_MOVE_CLASS 160
#define AVR_MOVE_CLASS_TO_NEW_FILE 170
#define AVR_MOVE_STATIC_FIELD 70
#define AVR_MOVE_FIELD 90

Key implementations in c-xref-java/src/refactory.c:

  • moveStaticMethod() (line 2974) - Moves static methods between classes

  • moveClass() (line 3074+) - Moves entire classes

  • moveStaticObjectAndMakeItPublic() (line 2814) - Core move logic

  • applyExpandShortNames() (line 2300) - Dependency expansion before move

  • reduceNamesAndAddImports() (line 2645) - Import management after move

17.4.3. Leverageable Code from Java Version

Core Move Logic (Directly Reusable)

Text block extraction/movement is language-agnostic:

// From refactory.c:2865-2874
size = mend->offset - mstart->offset;
moveBlockInEditorBuffer(target, mstart, size, &editorUndo);

Reference finding and updating already works for C:

// From refactory.c:2843-2862
occs = getReferences(point, STANDARD_SELECT_SYMBOLS_MESSAGE, PPCV_BROWSER_TYPE_INFO);
LIST_MERGE_SORT(EditorMarkerList, occs, editorMarkerListBefore);
for (EditorMarkerList *ll = occs; ll != NULL; ll = ll->next) {
    // Update each call site
}

Access modifier changes map to C visibility:

// From refactory.c:2868
changeAccessModifier(point, limitIndex, "public");
// C equivalent: change "static" → "extern", add header declaration
Dependency Management Pattern (Needs C Adaptation)

Java approach:

// 1. Expand short names to fully qualified
applyExpandShortNames(point);
   // Before:  List myList;
   // After:   java.util.List myList;

// 2. Move the code
moveBlockInEditorBuffer(target, mstart, size, &editorUndo);

// 3. Reduce names and add imports
reduceNamesAndAddImports(&regions, INTERACTIVE_NO);
   // Before:  java.util.List myList;
   // After:   List myList; (with "import java.util.List;" added)

C equivalent approach:

// 1. Find required headers by analyzing dependencies
findRequiredHeaders(functionStart, functionEnd, &neededHeaders, &headerCount);
   // Scan function body for:
   // - Function calls → which header declares them?
   // - Type references → which header defines them?
   // - Macro usage → which header defines them?
   // Use symbol database to find declarations

// 2. Move the code (same as Java)
moveBlockInEditorBuffer(target, mstart, size, &editorUndo);

// 3. Add include directives to target file
addRequiredIncludes(targetFile, neededHeaders, headerCount);
   // Insert #include directives at top of target file

17.4.4. Proposed Implementation

Estimation Approach: We use relative complexity estimates (Story Points) with Phase 1 as the baseline (1×). This is more meaningful for hobby/OSS development than calendar time estimates. As phases complete, we can track velocity and refine future estimates.

Phase 1: Basic Move (MVP - Immediate Value) ✓ COMPLETED

Relative complexity: 1× (baseline for future estimates)

Status: Completed December 2024 / January 2025

Implementation: Handles 80% of common move function use cases with minimal user intervention.

What was implemented:

  1. Parser Integration (src/c_parser.y, src/yacc_parser.y):

    • Added semantic actions to capture function boundaries during parsing

    • Uses parsedPositions[IPP_FUNCTION_BEGIN/END] to store function extent

    • Includes closing brace and trailing newline in boundary detection

    • Server operation OLO_GET_FUNCTION_BOUNDS triggers boundary capture

  2. Function Boundary Detection (src/refactory.c):

    • getFunctionBoundariesForMoving() uses parser to find exact function extent

    • Handles comments and decorations before/after function

    • Works for both static and non-static functions

  3. Static Keyword Removal (src/refactory.c):

    • Simple string search for "static " from function start to name

    • Only removes static when moving between files (not within same file)

    • Automatically removes without user prompting (streamlined UX)

    • Removes 7 characters ("static ") before moving to avoid offset tracking

  4. UI Integration (src/cxref.c):

    • "Move Function" appears in refactoring menu only on function definitions (not call sites)

    • Uses isDefinitionUsage() to filter appropriately

    • Two-step workflow: set target, then invoke move

  5. Target Validation (src/refactory.c, src/yylex.c):

    • Validates target is at global scope (not inside function/struct)

    • Uses parsedInfo.moveTargetAccepted flag from parser

  6. Tests:

    • test_move_c_function_to_other_file/ - moves between files, removes static

    • test_move_c_function_inside_file/ - moves within file, keeps static

    • Both tests passing with deterministic output

What user manually handles (acceptable for MVP):

  • Adding function declaration to header files

  • Adding #include directives for dependencies

  • Moving tightly-coupled static helper functions

Files modified:

  • src/c_parser.y - parser integration for boundary detection

  • src/yacc_parser.y - same for yacc files

  • src/refactory.c - simplified moveStaticFunctionAndMakeItExtern()

  • src/cxref.c - UI filter for definitions only

  • src/misc.c - added OLO_GET_FUNCTION_BOUNDS to requiresCreatingRefs()

  • tests/test_move_c_function_*/ - enabled and passing

Phase 2: Add Function Declaration to Header

Relative complexity: ~0.5× Phase 1

Add automatic function declaration to target file’s header:

Approach:

Following C conventions, when moving a function to target.c, add its declaration to target.h so other files can call it. This is much simpler than analyzing all dependencies.

Algorithm:

  1. Determine target file path (e.g., /path/to/target.c)

  2. Derive header path by replacing .c with .h (e.g., /path/to/target.h)

  3. If header doesn’t exist, create it with include guards

  4. Find insertion point in header (after includes, before end of guard)

  5. Generate function declaration from Symbol: extern void func(int a);

  6. Insert declaration into header

  7. User manually adds any needed #include directives to target file

Implementation sketch:

char *getHeaderPathForSourceFile(char *sourcePath) {
    // "/path/to/target.c" -> "/path/to/target.h"
    char *header = stackMemoryAlloc(strlen(sourcePath) + 1);
    strcpy(header, sourcePath);
    char *ext = strrchr(header, '.');
    if (ext && strcmp(ext, ".c") == 0) {
        strcpy(ext, ".h");
    }
    return header;
}

void createHeaderIfNeeded(char *headerPath) {
    if (!fileExists(headerPath)) {
        // Create header with include guards
        // "#ifndef TARGET_H\n#define TARGET_H\n\n#endif\n"
        char *guardName = makeIncludeGuardName(headerPath);
        // Write to file or EditorBuffer
    }
}

char *generateFunctionSignature(Symbol *function) {
    // Reconstruct function signature from Symbol
    // Example: "extern void func(int a, char *b)"
    // Need to walk function->typeModifier to build type string
    // This is the trickiest part
}

void addFunctionDeclarationToHeader(Symbol *function, char *headerPath) {
    // 1. Open or load header into EditorBuffer
    // 2. Find insertion point (before final #endif)
    // 3. Generate signature: "extern void func(int a);\n"
    // 4. Insert declaration
}

// In moveFunction() after moving the function block:
if (movingBetweenFiles) {
    char *headerPath = getHeaderPathForSourceFile(targetFilePath);
    createHeaderIfNeeded(headerPath);
    addFunctionDeclarationToHeader(parsedInfo.function, headerPath);
}

Required infrastructure (already exists): - Symbol with function information (symbol.h) - Symbol.typeModifier - contains return type and parameter information - EditorBuffer manipulation for header file editing - File path utilities

Challenges:

  1. Generating function signature: Need to reconstruct void func(int a, char *b) from Symbol’s type information

    • Walk typeModifier chain to build type string

    • Handle pointers, arrays, function pointers correctly

    • Parameter names might not be available (use arg1, arg2, etc.)

  2. Finding insertion point in header: Where to add the declaration?

    • After #include directives

    • Before final #endif of include guard

    • After existing declarations (maintain some order)

  3. Creating header file: If target.h doesn’t exist

    • Generate include guard name from filename (TARGET_H)

    • Create basic header structure

    • Or warn user and skip?

  4. Header guard formats: Different styles exist

    • #ifndef HEADER_H / #define HEADER_H / #endif

    • #pragma once

    • Need to detect and handle both

Value: - Automatically makes moved function callable from other files - Follows C convention of declarations in .h, definitions in .c - Much simpler than full dependency analysis - User still controls #include directives (maintains flexibility)

New code required: ~150 lines - Header path derivation: ~20 lines - Signature generation from Symbol: ~60 lines (complex type reconstruction) - Header file manipulation: ~50 lines - Include guard detection/creation: ~20 lines

Success Criteria: 1. ✓ When moving function to target.c, declaration is added to target.h 2. ✓ Function signature is correctly generated (return type, name, parameters) 3. ✓ Declaration is inserted at appropriate location in header 4. ✓ If header doesn’t exist, either create it or warn user 5. ✓ Duplicate declarations are avoided (check if already present) 6. ✓ Works with both #ifndef guards and #pragma once 7. ✓ No regression in Phase 1 functionality

Acceptable limitations (defer to Phase 3/4): - User manually adds #include directives to target .c file - May not preserve exact parameter names if not in Symbol - Basic include guard format if creating new header - Doesn’t remove declaration from old header (user handles manually)

Phase 3: Static Helper Functions

Relative complexity: 1-1.5× Phase 1

Detect and optionally move tightly-coupled static helpers:

static void detectStaticHelpers(EditorMarker *functionStart,
                                 EditorMarker *functionEnd,
                                 Symbol ***helpersOut, int *countOut) {
    SymbolList *helpers = NULL;

    // Find all function calls within the moved function
    parseRegion(functionStart, functionEnd);
    for (Symbol *calledFunc = calledFunctions; calledFunc != NULL;
         calledFunc = calledFunc->next) {
        // Is it static in the same file?
        if (calledFunc->storage == StorageStatic &&
            calledFunc->pos.file == functionStart->buffer->fileNumber) {
            // Add to helpers list
            addSymbol(&helpers, calledFunc);
        }
    }

    *countOut = countList(helpers);
    *helpersOut = listToArray(helpers);
}

// In moveFunction():
if (staticHelperCount > 0) {
    if (promptUser("Move %d static helper function(s) too?", staticHelperCount)) {
        for (int i = 0; i < staticHelperCount; i++) {
            moveFunction(staticHelpers[i]->pos, target);
        }
    }
}

Value: - Prevents broken builds from missing static helpers - Maintains logical cohesion - Reduces manual follow-up work

Phase 4: Polish

Scope: Smaller effort, quality-of-life improvements

  • Preview dialog showing what will be moved

  • Better insertion point heuristics (group with related functions)

  • Handle comments and documentation

  • Update existing callers if needed

17.4.5. Code Locations

In c-xref-java (Leverage These)
Function Purpose Line

moveStaticObjectAndMakeItPublic

Core move logic with reference updating

2814

getMethodLimitsForMoving

Find function boundaries using parser

2908

applyExpandShortNames

Pattern for dependency analysis (adapt for C)

2300

reduceNamesAndAddImports

Pattern for import management (adapt for C)

2645

moveBlockInEditorBuffer

Language-agnostic text movement

Used at 2867

getReferences

Find all call sites (already works for C)

Used at 2843

changeAccessModifier

Change visibility (adapt for static/extern)

Used at 2868

In c-xrefactory (Already Available)
  • EditorMarker position tracking - src/editor.h

  • EditorBuffer text manipulation - src/editorbuffer.h

  • Undo/redo infrastructure - src/editor.c

  • Symbol database lookups - src/reftab.c

  • Reference tracking - src/reference.c

17.4.6. Implementation Summary

||=== ||Phase |Value | ||Phase 1: Basic Move ||High - Eliminates manual cut/paste, handles 80% of cases | ||Phase 2: Include Management ||High - Prevents build breakage, major time saver | ||Phase 3: Static Helpers ||Medium - Nice to have, improves completeness | ||Phase 4: Polish ||Low - Quality of life improvements ||===

Phases 1-2 provide most value, independently testable Leverages existing Java refactoring code, reduces complexity significantly

17.4.7. Benefits

Developer Productivity
  • Frequent manual refactoring becomes automated

  • Reduces errors from missed dependencies

  • Maintains code quality during reorganization

Code Quality
  • Correct header management (no missing includes)

  • Proper visibility (static vs extern)

  • Preserves all references and call sites

Showcase for c-xrefactory
  • Demonstrates unique C-specific refactoring capabilities

  • Differentiates from generic LSP servers

  • Builds on existing Java refactoring experience

17.4.8. Risks and Mitigation

Risk Likelihood Mitigation

Complex header dependencies

Medium

Start with simple cases in Phase 1, iterate in Phase 2

Parser limitations finding function boundaries

Low

Already works for extract function refactoring

Missing static helper detection

Medium

Make it optional, user can manually move helpers

Breaking builds with incorrect includes

Medium

Generate preview, allow user to review before applying

17.4.9. Success Metrics

  • Phase 1 MVP: Successfully moves function 90% of cases without build errors

  • Phase 2: Correctly identifies and adds required headers 80% of time

  • Phase 3: Detects static helpers with >95% accuracy

  • User adoption: Refactoring used frequently by active c-xrefactory users

17.4.10. Alternatives Considered

Manual refactoring with editor macros
  • Fragile, error-prone

  • No understanding of code structure

  • Can’t handle dependencies

Use generic LSP move/rename
  • LSP has no "move function" operation

  • Would need custom extension

  • Doesn’t understand C header dependencies

Wait for Clangd to implement
  • Clangd has no move function refactoring

  • C++-focused, may not handle C idioms well

  • We have opportunity to do it first and better

17.4.11. References

  • Java implementation: /home/thoni/Utveckling/c-xref-java/src/refactory.c

  • Current extract function: src/extract.c (proves function boundary detection works)

  • Editor infrastructure: src/editor.c, src/editorbuffer.c

  • Symbol database: src/reftab.c, src/referenceableitemtable.c

17.4.12. Next Steps

  1. After persistence store cleanup: Take a break from architectural work

  2. Implement Phase 1 MVP: Smaller effort, high value

  3. Gather usage feedback: Identify most common dependency patterns

  4. Implement Phase 2: Based on real-world usage patterns

  5. Document and showcase: Blog post, video demonstration


Even Phase 1 MVP provides significant value by eliminating manual text manipulation. The developer who does this refactoring "very often" confirms the high ROI potential.

17.5. Clean Parser API and Multi-File Definition Support

17.5.1. Current State: Recent LSP Definition Finding Success

The recent implementation of textDocument/definition for LSP created a clean, operation-based parser API in parsing.h/c that successfully decouples high-level operations from low-level parsing details. This API discovery presents an opportunity to gradually modernize legacy parsing infrastructure.

What works: * Single-file definition lookup via clean parsing.h interface * Operation-based configuration (ParserOperation enum) * No reliance on .cx files for immediate queries

What doesn’t work yet: * Multi-file definitions (symbol defined in different file than reference) * Cross-file type resolution * Include graph persistence

17.5.2. Problem Statement

The LSP definition finding currently fails when a symbol’s definition is in a different file from the reference. This requires:

  1. Symbol type information - Current reference database only stores name and position, not types needed for resolution

  2. Include graph - Don’t know which files include which headers to find definition files

  3. Multi-file parsing - Ability to efficiently reparse relevant files without loading entire project

More broadly, the legacy codebase still relies on magic string command routing to drive parsing behavior:

// Scattered throughout codebase (cxref.c, refactory.c, server.c, etc.)
parseBufferUsingServer(project, point, mark, "-olcxpush", NULL);
parseBufferUsingServer(project, point, mark, "-olcxmovetarget", NULL);
parseBufferUsingServer(project, point, mark, "-olcxmodemenu", NULL);
// ... 20+ more magic strings, each with unclear semantics

This makes the codebase fragile, hard to understand, and difficult to extend with new operations.

17.5.3. The Parser API Innovation

The new parsing.h interface provides operation-based configuration (see src/parsing.h for details):

Key innovations: * Enum-based operations - Discoverable list (IDE autocomplete shows all options) * Configuration struct - Single point of setup, no scattered global flags * Operation predicates - Self-documenting behavior (needsReferenceAtCursor, allowsDuplicateReferences) * Type-safe - Compile-time checking replaces runtime magic string matching

17.5.4. Benefits of Backporting This API

Code Clarity and Safety

Replace magic strings with discoverable enums:

/* Before: What does this do? Need to search for documentation */
parseBufferUsingServer(options.project, target, NULL, "-olcxmovetarget", NULL);

/* After: Type-safe and self-documenting via ParserOperation enum */
parsingConfig.operation = PARSE_TO_VALIDATE_MOVE_TARGET;
parsingConfig.positionOfSelectedReference = makePositionFromMarker(target);
parseCurrentInputFile(LANG_C);

Replace scattered global flags with structured configuration:

/* Before: Scattered throughout, unclear semantics */
options.olcxPushButton = 1;
options.olcxRenameOption = 1;
s_noParametersToExpand = 0;
originalFileNumber = someValue;

/* After: All configuration in one place, clear purpose */
parsingConfig.operation = PARSE_TO_COMPLETE;
parsingConfig.targetParameterIndex = 5;
parsingConfig.extractMode = EXTRACT_VARIABLE;
Foundation for Multi-File Definition Lookup

The clean API makes it easier to implement required enhancements:

/* New operation for multi-file definition lookup */
typedef enum {
    /* ... existing operations ... */
    PARSE_TO_RESOLVE_DEFINITION,  /* NEW: Find definition across files */
} ParserOperation;

/* Simpler to add new parsing modes when API is clean */
parsingConfig.operation = PARSE_TO_RESOLVE_DEFINITION;
parsingConfig.symbolName = "findReferenceableAt";
parsingConfig.startFile = fileNumber;
Easier Feature Development

Adding new operations becomes straightforward:

  1. Add to ParserOperation enum

  2. Implement predicate functions if needed

  3. Add semantic actions in grammar

  4. Done - no magic strings to maintain

17.5.5. Implementation Plan: Three Non-Blocking Phases

Phase 1: Expose New API Alongside Legacy Code

Scope: Purely additive - new API works alongside legacy code Risk: LOW (no changes to existing code paths) Value: Proof of concept, enables gradual migration

Approach: 1. parseWithConfig() becomes primary entry point 2. parseBufferUsingServer() internally bridges legacy magic strings to new ParsingConfig 3. Both code paths work identically (verified with tests) 4. No changes needed to existing functionality

Benefit: Can merge this without affecting legacy code at all

Phase 2: Migrate High-Value Callsites

Scope: Refactor 50%+ of parsing infrastructure Risk: MEDIUM (refactoring, but with tests) Value: Significant improvement to codebase clarity

Priority callsites: 1. LSP handlers - Already partially using new API 2. Definition finding - Multi-file lookup 3. Extract refactoring - Complex logic, multiple operations 4. Move function - Multiple parsing passes

Benefit: 50%+ of parsing infrastructure modernized, multi-file definitions enabled

Phase 3: Complete Legacy Code Cleanup

Scope: Remove all magic string command routing, consolidate parsing Risk: MEDIUM (larger refactoring) Value: Long-term maintainability and extensibility

Scope: * Remove all magic string command routing * Consolidate parsing entry points * Eliminate scattered global flags * Document parser behavior in code

Benefit: Codebase future-proof for new operations

Important: Phase 3 is optional - Phases 1-2 already provide most value

17.5.6. Multi-File Definition: Requirements

1. Symbol Type Information Persistence

Current: ReferenceableItemTable only stores name, position, kind Needed: Minimal type info for cross-file resolution

Lightweight approach (preferred over full type serialization):

/* Extend ReferenceableItem with minimal type data */
struct ReferenceableItem {
    /* ... existing fields ... */
    char *returnType;      /* For functions: return type string */
    int   parameterCount;  /* For functions: number of parameters */
    bool  isDefinition;    /* True if this is the function/variable definition */
};

Scope: Extend ReferenceableItem, careful memory management

2. Include Graph Construction

Current: Include relationships processed during lexing, not persisted Needed: Map of which files include which headers

Lightweight approach:

typedef struct {
    int   sourceFile;         /* File with #include directive */
    int   includedFile;       /* File being included */
    bool  isSystemInclude;    /* <> vs \"\" */
} IncludeEdge;

/* Stored in file's metadata, retrievable by file number */
int getIncludedFileCount(int fileNumber);
int getIncludedFile(int fileNumber, int index);
int *getIncludingFiles(int fileNumber);  /* Files that include this one */

Scope: Moderate effort, moderate complexity

3. Incremental Multi-File Parsing

Current: Full project parse or single-file parse Needed: Efficient "parse files needed for definition lookup" operation

Approach: 1. Use include graph to find "definition file candidates" 2. Parse only those files (typically 1-5 for most projects) 3. Store symbols in special "definition lookup" arena 4. Return to in-memory reference table when done

Scope: Parse relevant files, update reference table

Total for full multi-file support: Larger foundational work combining above three components

17.5.7. Code Locations

New Parser API: * src/parsing.h - Public interface (167 lines) * src/parsing.c - Implementation with bridges (187 lines) * Functions: getParserOperation(), needsReferenceAtCursor(), parseToCreateReferences()

Legacy code currently using magic strings: * src/cxref.c - 20+ parseBufferUsingServer() calls * src/refactory.c - 10+ operations * src/extract.c - Multiple parsing passes * src/complete.c - Completion-specific parsing * src/server.c - Operation dispatch

Files for multi-file support: * src/reference_database.h/c - Add symbol type info * src/yylex.c - Build include graph * src/filetable.h/c - Persist include relationships

17.5.8. Success Metrics

Phase 1: * ✓ New ParsingConfig API works alongside legacy strings * ✓ No performance regression * ✓ Both code paths tested and working

Phase 2: * ✓ 50%+ of parsing callsites use new API * ✓ Multi-file definition lookup works * ✓ All existing tests still pass

Phase 3 (if pursued): * ✓ All parsing driven by enum operations * ✓ Magic strings completely removed * ✓ New developers understand control flow

17.5.9. Risks and Mitigation

Risk Likelihood Mitigation

Breaking existing workflows

Medium

Phase 1 keeps old API working, extensive testing, clear migration timeline

Incomplete refactoring

Medium

Phase 1-2 provide value even if Phase 3 never completes

Performance regression

Low

New API does same work, just different control flow; profile before/after

Memory overhead from type info

Medium

Use lightweight approach, store only essential fields, consider lazy loading

This refactoring supports: * ADR-0014 (On-Demand Parsing) - Single code path for all operations * Unified Symbol Database (next major refactoring) - Requires clean parsing API * LSP Multi-File Support - Immediately enables cross-file definition lookup

17.5.11. Notes

The key insight is that we can backport this gradually without breaking anything. The bridge pattern (new API calls old code for now) lets us improve incrementally, retiring magic strings and globals one section at a time.

Phase 1 proves the approach works with minimal risk. Phases 2-3 build on proven foundation, so failure to complete them doesn’t harm existing functionality.

Multi-file definitions are one motivating use case, but the real long-term benefit is a codebase where parsing behavior is discoverable and type-safe instead of driven by magic strings scattered throughout 25-year-old code.

18. Insights

This chapter contains notes of all insights, large and small, that I make as I work on this project. These insights should at some point be moved to some other, more structured, part of this document. But rather than trying to find a structure where each new finding fits, I’m making it easy to just dump them here. We can refactor these into a better and better structure as we go.

18.1. Yacc semantic data

As per usual a Yacc grammar requires each non-terminal to have a type. Those types are named after which types of data they collect and propagate. The names always starts with ast_ and then comes the data type. For example if some non-terminal needs to propagate a Symbol and a Position that structure would be called ast_symbolParameterPair ("Pair" being thrown in there for good measure…​).

Each of those structures also always carries a begin and end position for that structure. That means that any "ast" struct has three fields, begin, end and the data. The data are sometimes a struct, like in this case, but can also be a single value, like an int or a pointer to a Symbol.

Failed to generate image: Could not load PlantUML. Either require 'asciidoctor-diagram-plantuml' or specify the location of the PlantUML JAR(s) using the 'DIAGRAM_PLANTUML_CLASSPATH' environment variable. Alternatively a PlantUML binary can be provided (plantuml-native in $PATH).

class ast_symbolPositionPair {
Position begin
Position end
}

ast_symbolPositionPair *-- SymbolPositionPair : data

class SymbolPositionPair {
Symbol *symbol
Position position
}

18.2. Navigation Architecture and the Preloading Limitation

Date: 2025-12-22

18.2.1. How Symbol Navigation Works

Symbol navigation (PUSH/NEXT/PREVIOUS/POP) merges references from two sources:

  1. Disk CXrefs Database - Reflects saved files on disk

  2. In-Memory ReferenceableItem Table - Reflects current server state including preloaded editor buffers

When you PUSH on a symbol, the navigation menu creation (function createSelectionMenuForOperation) does:

  1. Load from disk - Scans CXrefs files for the symbol (via scanReferencesToCreateMenu)

    • Creates menu items with disk-based line numbers

  2. Merge from memory - Maps over the in-memory table (via putOnLineLoadedReferences)

    • Adds references from parsed/preloaded buffers with current line numbers

    • Duplicate detection prevents the same reference appearing twice

  3. Build session - Copies merged references to the navigation session

This dual-source approach allows navigation without full project parse while providing updated positions for modified files.

18.2.2. The Fundamental Limitation

The Emacs client only preloads the currently active file, not all modified editor buffers.

This creates incorrect navigation after editing non-current files:

1. PUSH on symbol in common.h → session has refs with disk line numbers
2. NEXT to usage in source1.c (becomes current, gets preloaded)
3. User adds line in source1.c → server knows via preload
4. NEXT to source2.c → source1.c NO LONGER preloaded
   - Server has no knowledge of source1.c modification
   - Session still has old line number from step 1
5. NEXT wraps to source1.c → WRONG LINE
   - Points to line before actual usage

18.2.3. Why Our Fix Attempt Didn’t Work

We attempted to fix this (commit 8052518a4) by detecting modified files via timestamp comparison and reparsing them in processModifiedFilesForNavigation. This failed because:

  • Can only detect modifications in preloaded files (current file)

  • Cannot know about changes in non-preloaded modified buffers

  • Client protocol doesn’t support sending multiple preloaded buffers

18.2.4. Potential Solutions

  • Client preloads all modified buffers - Requires Emacs client changes

  • LSP-style protocol (did_open/did_change) - Major architectural change

  • Accept the limitation - Document current behavior

  • Force save before cross-file navigation - Intrusive UX

As of 2025-12-22, the limitation remains documented but unfixed. The attempted fix was reverted to preserve the original working architecture.

18.2.5. Architectural Invariants

MUST be maintained:

  • Disk CXrefs = State of files on disk (from last tags generation)

  • ReferenceableItem Table = Disk state + preloaded editor buffers

  • Session references = Snapshot at PUSH time

The table can NEVER reflect non-preloaded modified files because the server doesn’t receive that information from the client.

19. Refactoring Recipes

This chapter documents mechanical steps for refactoring operations. Each recipe describes the algorithmic steps that an automated refactoring tool would perform.

For detailed discussions of refactoring feature architecture and implementation phases, see Chapter 16a: Planned Refactoring Features.

19.1. Existing Refactorings

Refactorings that are already implemented and available.

19.1.1. Rename Symbol

Implemented. TBD.

19.1.2. Extract Function

Implemented. TBD.

19.1.3. Reorder Parameters

Implemented. TBD.

19.1.4. Make Function Static

Purpose: Convert functions that are only used within their compilation unit to static storage class for better encapsulation and compiler optimization.

When to use:

  • Function has external linkage but no external callers

  • Want to make implementation details explicit

  • Enable compiler optimizations and reduce global namespace pollution

Input:

  • Non-static function definition

  • All references to that function

Availability:

When cursor is on a non-static function definition where all callers are in the same file.

Algorithm:

  1. Check current storage class - Skip if already static

  2. Find definition and all references

    • Locate function definition (not declarations)

    • Collect all call sites across project

  3. Verify all references are local

    • For each reference (excluding definition itself):

      • Check if in same file as definition

      • If any reference is in different file, abort

    • Check if declared in header files (public API), abort if yes

  4. Apply transformation

    • Find beginning of function definition

    • Insert "static " before return type

Output:

  • Function marked as static

  • Compiler can optimize more aggressively

  • Clear signal that function is internal

Example:

// Before - helper function with external linkage
int helperCompare(const void *a, const void *b) {
    return *(int*)a - *(int*)b;
}

void publicSort(int *array, size_t n) {
    qsort(array, n, sizeof(int), helperCompare);
}

// After - explicitly internal
static int helperCompare(const void *a, const void *b) {
    return *(int*)a - *(int*)b;
}

void publicSort(int *array, size_t n) {
    qsort(array, n, sizeof(int), helperCompare);
}

Benefits:

  • Better encapsulation and code clarity

  • Enables inlining and other compiler optimizations

  • Smaller symbol tables, no name collisions

  • Safe to refactor (can’t break external code)

Notes:

  • Similar to "Unused Symbols" detection but finds LOCAL-ONLY usage instead of NO usage

  • Cannot handle functions used via function pointers passed externally (requires manual verification)

19.2. Suggested Refactorings

Refactorings that have been proposed, designed, or partially implemented but are not yet available.

19.2.1. Move Type to New File

Input:

  • Type name to move

  • Source file containing type definition

  • Target file (new or existing)

Algorithm:

  1. Availability

    • Available when the selected symbol is a type, that symbol is the type to move

  2. Identify dependencies

    • Determine what types/macros the definition references/uses

  3. Create/update target file:

    • If new file: create with include guards and appropriate includes/forward declarations

    • If existing: open the file and find a suitable insertion location

  4. Move definition

    • Copy type definition to target file

    • Add necessary includes and forward declarations

  5. Replace in source file:

    • If target is new file: Replace type definition with #include "targetfile.h"

    • If target is existing file: Remove type definition, add #include "targetfile.h" if not already present in the source file

Output:

  • Type definition moved to target file

  • Source file includes the target file

  • Clean compilation

Notes:

  • For new header files, steps 3-5 are particularly simple: create the new header with the type and replace the definition in source with an include

  • For existing headers, must check if include is already present before adding

  • Forward declarations (e.g., struct foo;) are sufficient for pointer-only dependencies

  • Full type definitions or includes needed for non-pointer members


19.3. Introduce Semantic Type Aliases

Purpose: Make implicit semantic distinctions explicit by introducing type aliases for a single struct used for multiple purposes.

When to use:

  • A single struct/type is reused for semantically different purposes

  • Different usage contexts use different subsets of fields

  • Want to clarify intent without changing implementation

  • Want to prepare for future type divergence

Input:

  • Original type name (e.g., OlcxReferencesStack)

  • List of semantic contexts where type is used (e.g., "browser", "completion", "retrieval")

  • Target file for type aliases (new or existing header)

Algorithm:

  1. Analyze usage patterns - Identify distinct semantic contexts where type is used

    • Group usage sites by purpose/domain

    • Note which fields are used in each context

    • Verify that contexts are truly semantically different

  2. Create type aliases - In appropriate header file, define semantic aliases: c typedef OriginalType SemanticName1; typedef OriginalType SemanticName2; // etc.

  3. Update structure declarations - Change struct/variable declarations to use semantic types:

    • Data structure fields

    • Global variables

    • Static variables

  4. Update function signatures - Change function parameters to use semantic types:

    • Functions operating on specific context → specific alias

    • Generic functions operating on any context → generic alias (if created)

  5. Update call sites - Verify all usages compile with new types

  6. Verify - Compile to ensure type compatibility

Output:

  • Multiple type aliases for same underlying type

  • Declarations and signatures use semantic types

  • Intent clarified through type system

  • Foundation for future divergence

Example:

Given a "kitchen sink" struct used for three purposes:

// Before - single type for everything
typedef struct OlcxReferencesStack {
    OlcxReferences *top;
    OlcxReferences *root;
} OlcxReferencesStack;

typedef struct SessionData {
    OlcxReferencesStack browserStack;      // Uses: references, symbolsMenu
    OlcxReferencesStack completionsStack;  // Uses: completions
    OlcxReferencesStack retrieverStack;    // Uses: completions
} SessionData;

void pushEmptySession(OlcxReferencesStack *stack);  // Generic

After introducing semantic aliases:

// After - semantic aliases make intent clear
typedef struct OlcxReferencesStack {
    OlcxReferences *top;
    OlcxReferences *root;
} OlcxReferencesStack;

// Semantic aliases
typedef OlcxReferencesStack ReferencesStack;   // Generic
typedef OlcxReferencesStack BrowserStack;      // For navigation
typedef OlcxReferencesStack CompletionStack;   // For completion
typedef OlcxReferencesStack RetrieverStack;    // For search

typedef struct SessionData {
    BrowserStack    browserStack;
    CompletionStack completionsStack;
    RetrieverStack  retrieverStack;
} SessionData;

void pushEmptySession(ReferencesStack *stack);  // Generic operation

Benefits:

  • Intent is immediately clear from type names

  • No runtime or ABI changes (aliases compile to same type)

  • Can add domain-specific operations per type later

  • Enables gradual migration toward separate types if needed

Notes:

  • Particularly useful in C where classes/interfaces are unavailable

  • Type aliases are compile-time only - no runtime overhead

  • Can coexist with original type name during migration

  • Common pattern when refactoring legacy C code


19.3.1. Rename Included File

Purpose: Rename a file appearing in an include and update all the include directives

When to use:

  • A (header) file is inappropriately named

  • In the process of renaming a complete C "module" this is one step (until c-xrefactory can do all of that)

Input:

  • The old and new file names

  • All #include locations for the old file

Availability

When the cursor is on an #include directive. The file it references will be the "source".

Algorithm:

  1. Rename the source file to the destination

  2. Update all include locations

    • This will often include multiple locations

Output:

  • New header file

  • All #include directives updated


19.3.2. Move Function to Different File

See Chapter 16a: Planned Refactoring Features for detailed design and implementation status.

Proposed refactoring to move a function definition from one C source file to another while automatically managing visibility (static vs extern) and potentially adding necessary declarations and includes.

Status: Phase 1 MVP complete.


19.3.3. Turn include guard into pragma once

Tentative.


19.3.4. Change return type

Tentative.

Purpose: Convert functions that are only used within their compilation unit to static storage class for better encapsulation and compiler optimization.

When to use:

  • Function has external linkage but no external callers

  • Want to make implementation details explicit

  • Enable compiler optimizations and reduce global namespace pollution

Input:

  • Non-static function definition

  • All references to that function

Availability:

When cursor is on a non-static function definition where all callers are in the same file.

Algorithm:

  1. Check current storage class - Skip if already static

  2. Find definition and all references

    • Locate function definition (not declarations)

    • Collect all call sites across project

  3. Verify all references are local

    • For each reference (excluding definition itself):

      • Check if in same file as definition

      • If any reference is in different file, abort

    • Check if declared in header files (public API), abort if yes

  4. Apply transformation

    • Find beginning of function definition

    • Insert "static " before return type

Output:

  • Function marked as static

  • Compiler can optimize more aggressively

  • Clear signal that function is internal

Example:

// Before - helper function with external linkage
int helperCompare(const void *a, const void *b) {
    return *(int*)a - *(int*)b;
}

void publicSort(int *array, size_t n) {
    qsort(array, n, sizeof(int), helperCompare);
}

// After - explicitly internal
static int helperCompare(const void *a, const void *b) {
    return *(int*)a - *(int*)b;
}

void publicSort(int *array, size_t n) {
    qsort(array, n, sizeof(int), helperCompare);
}

Benefits:

  • Better encapsulation and code clarity

  • Enables inlining and other compiler optimizations

  • Smaller symbol tables, no name collisions

  • Safe to refactor (can’t break external code)

Notes:

  • Similar to "Unused Symbols" detection but finds LOCAL-ONLY usage instead of NO usage

  • Cannot handle functions used via function pointers passed externally (requires manual verification)

  • Estimated complexity: ~0.3× Move Function Phase 1

20. Archive

In this section you can find some descriptions and saved texts that described how things were before. They are no longer true, since that quirk, magic or bad coding is gone. But it is kept here as an archive for those wanting to do backtracking to original sources.

20.1. Memory strategies

There were a multitude of specialized memory allocation functions. In principle there where two types, static and dynamic. The dynamic could be exteded using a overflow handler.

Also one type had a struct where the actual area was extended beyond the actual struct. This was very confusing…​

20.1.1. Static memory allocation

Static memory (SM_ prefix) are static areas allocated by the compiler which is then indexed using a similarly named index variable (e.g. ftMemory and ftMemoryIndex), something the macros took advantage of. These are

  • ftMemory

  • ppmMemory

  • mbMemory

One special case of static memory also exist:

  • stackMemory - synchronous with program structure and has CodeBlock markers, so there is a special stackMemoryInit() that initializes the outermost CodeBlock

These areas cannot be extended, when it overruns the program stops.

20.2. Trivial Prechecks

The refactorer can call the server using parseBufferUsingServer() and add some extra options (in text form). One example is setMovingPrecheckStandardEnvironment() where it calls the server with -olcxtrivialprecheck.

However parseBufferUsingServer() uses callServer() which never answerEditAction().

In answerEditAction() the call to (unused) olTrivialRefactoringPreCheck() also requires an options.trivialPreCheckCode which is neither send by setMovingPrecheckStandardEnvironment() nor parsed by processOptions().

The only guess I have is that previously all prechecks where handled by the -olcxtrivialprecheck option in calls to the server, and have now moved to their respective refactorings.

This theory should be checked by looking at the original source of the precheck functions and compare that with any possible checks in the corresponding refactoring code.

20.3. Caching System

The caching system described below has been archived as it is no longer part of the current architecture.

The c-xrefactory included a sophisticated caching system that enabled incremental parsing by caching parsed input streams, parser state, and file modification tracking. This optimization allowed for faster re-analysis when only portions of source files had changed. It also allowed the system to detect out-of-memory situations, discard, flush and re-use memory during file processing.

20.3.1. Core Design Principles

Cache Point Model: The system placed strategic snapshots of parser state at external definition boundaries (functions, global variables, etc.). When files were re-processed, the system could validate cache integrity, recover from cache points, and resume parsing only from the first changed definition onward.

Separation of Concerns: Recent refactoring had separated file tracking from cache validation:

  • updateFileModificationTracking() - Updated file timestamps without side effects

  • isFileModifiedSinceCached() - Pure validation function for cache integrity

20.3.2. Key Components

Cache Point Management (caching.c):

  • placeCachePoint(bool) - Placed strategic parser state snapshots

  • recoverFromCache() - Restored parser state from cache points

  • recoverCachePointZero() - Reset to initial cache state

File Modification Tracking:

The FileItem structure maintained multiple timestamp fields for tracking file modification:

struct FileItem {
    time_t lastModified;    // File's actual modification time
    time_t lastInspected;   // When we last checked the file
    // ... other fields
}

Input Stream Caching:

  • cacheInput() - Cached tokenized input from lexer

  • cachingIsActive() - Checked if caching was currently enabled

  • activateCaching() / deactivateCaching() - Controlled caching state

20.3.3. Parser Integration

C Parser Integration: Both C and Yacc parsers placed cache points after each external_definition, but only when not processing include files (includeStack.pointer == 0).

Parser-Specific Behavior:

  • C Parser: Full caching enabled with regular cache point placement

  • Yacc Parser: Explicitly deactivated caching via deactivateCaching() but still placed strategic cache points

  • Include Files: Cache points skipped during include processing

20.3.4. System Dependencies

The caching system was deeply integrated throughout the parsing pipeline:

Component Functions Used Purpose

main.c

initCaching(), activateCaching(), recoverCachePointZero()

Lifecycle control

lexer.c

cacheInput(), cachingIsActive(), deactivateCaching()

Input processing

yylex.c

updateFileModificationTracking()

File tracking

filetable.c

updateFileModificationTracking()

File management

xref.c

recoverFromCache(), recovery functions

Cross-reference coordination

c_parser.y

placeCachePoint()

C grammar integration

yacc_parser.y

deactivateCaching(), placeCachePoint()

Yacc grammar integration

20.3.5. Performance Characteristics

Cache Hit Scenarios:

  1. Full Cache Hit: No file modifications since last parse - parser state recovered from cache point zero with minimal re-processing

  2. Partial Cache Hit: File modified after Nth definition - recovery from cache point N with re-parsing only from point of change onward

  3. Cache Miss: File structure changed or timestamps invalid - full re-parse with new cache points placed

Optimization Benefits:

  • Memory usage scales with number of definitions, not file size

  • File modification checking minimizes unnecessary re-reads

  • Input stream caching reduces lexer overhead

  • Strategic cache point placement enables clean recovery at definition boundaries

20.4. HUGE Memory

Previously a HUGE model was also available (by re-compilation) to reach file numbers, lines and columns above 22 bits. But if you have more than 4 million lines (or columns!) you should probably do something radical before attempting cross referencing and refactoring.

20.5. Bootstrapping

20.5.1. BOOTSTRAP REMOVED!

Once the FILL-macros was removed, we could move the enum-generation to use the actual c-xref. So from now on we build c-xref directly from the sources in the repo. Changes to any enums will trigger a re-generation of the enumTxt-files but since the enumTxt-files are only conversion of enum values to strings any mismatch will not prevent compilation, and it would even be possible to a manual update. This is a big improvement over the previous situation!

20.5.2. FILLs REMOVED!

As indicated in FILL macros the bootstrapping of FILL-macros has finally and fully been removed.

Gone is also the compiler_defines.h, which was just removed without any obvious adverse effects. Maybe that will come back and bite me when we move to more platforms other than linux and MacOS…​

Left is, at this point, only the enumTxt generation, so most of the text below is kept for historical reasons.

20.5.3. Rationale

c-xref uses a load of structures, and lists of them, that need to be created and initialized in a lot of places (such as the parsers). To make this somewhat manageable, c-xref itself parses the strucures and generates macros that can be used to fill them with one call.

c-xref is also bootstrapped into reading in a lot of predefined header files to get system definitions as "preloaded definitions".

Why this pre-loading was necessary, I don’t exactly know. It might be an optimization, or an idea that was born early and then just kept on and on. In any case it creates an extra complexity building and maintaining and to the structure of c-xref.

So this must be removed, see below.

20.5.4. Mechanism

The bootstrapping uses c-xref's own capability to parse C-code and parse those structures and spit out filling macros, and some other stuff.

This is done using options like `-task_regime_generate' which prints a lot of data structures on the standard output which is then fed into generated versions of strFill, strTdef(no longer exists) and enumTxt by the Makefile.

The process starts with building a c-xref.bs executable from checked in sources. This compile uses a BOOTSTRAP define that causes some header files to include pre-generated versions of the generated files (currently strFill.bs.h and enumTxt.bs.h) which should work in all environments.

if you change the name of a field in a structure that is subject to FILL-generation you will need to manually update the strFill.bs.h, but a "make cleaner all" will show you where those are.

After the c-xref.bs has been built, it is used to generate strFill and enumTxt which might include specific structures for the current environment.

HOWEVER: if FILL macros are used for structures which are different on some platforms, say a FILE structure, that FILL macro will have difference number of arguments, so I’m not sure how smart this "smart" generation technique actually is.

TODO: Investigate alternative approaches to this generate "regime", perhaps move to a "class"-oriented structure with initialization functions for each "class" instead of macros.

20.5.5. Compiler defines

In options.h there are a number of definitions which somehow are sent to the compiler/preprocessor or used so that standard settings are the same as if a program will be compiled using the standard compiler on the platform. At this point I don’t know exactly how this conversion from C declarations to compile time definitions is done, maybe just entered as symbols in one of the many symboltables?

Typical examples include "__linux" but also on some platforms things like "fpos_t=long".

I’ve implemented a mechanism that uses "gcc -E -mD" to print out and catch all compiler defines in compiler_defines.h. This was necessary because of such definitions on Darwin which where not in the "pre-programmed" ones.

TODO?: As this is a more general approach it should possibly completely replace the "programmed" ones in options.c?

20.5.6. EnumTxt generation REMOVED!

To be able to print the string values of enums the module generate.c (called when regime was RegimeGenerate) could also generate string arrays for all enums. By replacing that with some pre-processor magic for the few that was actually needed (mostly in log_trace() calls) we could do away with that whole "generate" functionality too.

20.5.7. enumTxt

For some cases the string representing the value of an Enum is needed. c-xref handles this using the "usual" 'parse code and generate' method. The module generate.c does this generation too.

20.5.8. Include paths

Also in options.h some standard-like include paths are added, but there is a better attempt in getAndProcessGccOptions() which uses the compiler/preprocessor itself to figure out those paths.

TODO?: This is much better and should really be the only way, I think.

20.5.9. Problems

Since at bootstrap there must exist FILL-macros with the correct field names this strategy is an obstacle to cleaning up the code since every field is referenced in the FILL macros. When a field (in a structure which are filled using the FILL macro) changes name, this will make initial compilation impossible until the names of that field is also changed in the strFill.bs.h file.

One way to handle this is of course to use c-xrefactory itself and rename fields. This requires that the project settings also include a pass with BOOTSTRAP set, which it does.

20.5.10. Removing

I’ve started removing this step. In TODO.org I keep a hierarchical list of the actions to take (in a Mikado kind of style).

The basic strategy is to start with structures that no other structure depends on. Using the script utils/struct2dot.py you can generate a DOT graph that shows those dependencies.

Removal can be done in a couple of ways

  1. If it’s a very small structure you can replace a call to a FILL_XXX() macro with a compound literal.

  2. A better approach is usually to replace it with a fillXXX() function, or even better, with a newXXX(), if it consistently is preceeded with an allocation (in the same memory!). To see what fields vary you can grep all such calls, make a CSV-file from that, and compare all rows.

20.5.11. strTdef.h

The strTdef.h was generated using the option -typedefs as a part of the old -task_regime_generate strategy and generated typedef declarations for all types found in the parsed files.

I also think that you could actually merge the struct definition with the typedef so that strTdef.h would not be needed. But it seems that this design is because the structures in proto.h are not a directed graph, so loops makes that impossible. Instead the typedefs are included before the structs:

#include "strTdef.h"
struct someNode {
    S_someOtherNode *this;
    ...
struct someOtherNode {
    S_someNode *that;
    ...

This is now ideomatically solved using the structs themselves:

struct someNode {
    struct someOtherNode *this;
    ...
struct someOtherNode {
    struct someNode *that;
    ...

20.6. FILL macros

The FILL macros are now fully replaced by native functions or some other, more refactoring-friendly, mechanism. Yeah!

During bootstrapping a large number of macros named __FILL_xxxx is created. The intent is that you can fill a complete structure with one call, somewhat like a constructor, but here it’s used more generally every time a complex struct needs to be initialized.

There are even _FILLF_xxx macros which allows filling fields in sub-structures at the same time.

This is, in my mind, another catastrophic hack that makes understanding, and refactoring, c-xrefactory such a pain. Not to mention the extra bootstrap step.

I just discovered the compound literals of C99. And I’ll experiment with replacing some of the FILL macros with compound literals assignments instead.

FILL_symbolList(memb, pdd, NULL);

could become (I think):

memb = (SymbolList){.d = pdd, .next = NULL};

If successful, it would be much better, since we could probably get rid of the bootstrap, but primarily it would be more explicit about which fields are actually necessary to set.

20.7. Users

The -user option has now been removed, both in the tool and the editor adaptors, and with it one instance of a hashlist, the olcxTab, which now is a single structure, the sessionData.

There is an option called -user which Emacs sets to the frame-id. To me that indicates that the concept is that for each frame you create you get a different "user" with the c-xref server that you (Emacs) created.

The jedit adapter seems to do something similar:

options.add("-user");
Options.add(s.getViewParameter(data.viewId));

Looking at the sources to find when the function olcxSetCurrentUser() is called it seems that you could have different completion, refactorings, etc. going on at the same time in different frames.

Completions etc. requires user interaction so they are not controlled by the editor in itself only. At first glance though, the editor (Emacs) seems to block multiple refactorings and referencs maintenance tasks running at the same time.

This leaves just a few use cases for multiple "users", and I think it adds unnecessary complexity. Going for a more "one user" approach, like the model in the language server protocol, this could really be removed.