This document is very much a work in progress.

1. Intro

c-xrefactory is a software tool and a project aiming at restoring the sources of that old, but very useful, tool to a state where it can be enhanced and be the foundation for a highly capable refactoring browser.

It is currently in excellent working condition, so you can use it in your daily work. I do. For information about how to do that, see the README.md.

1.1. Caution

As indicated by the README.md, this is a long term restoration project. So anything you find in this document might be old, incorrect, guesses, temporary holders for thoughts or theories. Or actually true and useful.

Especially names of variables, functions and modules is prone to change as understanding of them increases. They might also be refactored into something entirently different.

This document has progressed from non-existing, to a collection of unstructured thougths, guesses, historic anecdotes, ideas and a collection of unstructured, pre-existing, wiki pages, and is now quite useful. Perhaps it will continue to be improved and "refactored" into something valuable for anyone who venture into this project.

The last part of this document is an Archive where completely obsolete descriptions have been moved for future software archeologists to find.

1.2. Background

You will find some background about the project in the README.md.

This document tries to collect the knowledge and understanding about how c-xrefactory actually works, plans for making it better, both in terms of working with the source, its structure and its features.

Hopefully over time this will be the design documentation of c-xrefactory, which, at that time, will be a fairly well structured and useful piece of software.

1.3. Goal

Ultimately c-xrefactory could become the refactoring browser for C, the one that everybody uses. As suggested by @tajmone in GitHub issue #39, by switching to a general protocol, we could possibly plug this in to many editors.

However, to do that we need to refactor out the protocol parts. And to do that we need a better structure, and to dare to change that, we need to understand more of the intricacies of this beast, and we need tests. So the normal legacy code catch 22 applies…​

Test coverage is starting to look good, coming up to slightly above 80% at the time of writing this. Many "tests" are just "application level" execution, rather than actual tests, but also this is improving.

2. Context

c-xrefactory is designed to be an aid for programmers as they write, edit, inspect, read and improve the code they are working on.

The editor is used for usual manual manipulation of the source code. C-xrefactory interacts with the editor to provide navigation and automated edits, refactorings, through the editor.

SystemContext

3. Functional Overview

The c-xref program is, or rather was, a mish-mash of a multitude of features baked into one program. This is the major cause of the mess that it is source-wise.

It was

  • a generator for persistent cross-reference data

  • a reference server for editors, serving cross-reference, navigational and completion data over a protocol

  • a refactoring server (the worlds first to cross the Refactoring Rubicon)

  • an HTML cross-reference generator (probably the root of the project) (REMOVED)

  • a C macro generator for structure fill (and other) functions (REMOVED)

It is the first three that are unique and constitutes the great value of this project. The last two have been removed from the source, the last one because it was a hack and prevented modern, tidy, building, coding and refactoring. The HTML cross-reference generator has been superseeded by modern alternatives like Doxygen and is not at the core of the goal of this project.

One might surmise that it was the HTML-crossreference generator that was the initial purpose of what the original Xrefactory was based upon. Once that was in place the other followed, and were basically only bolted on top without much re-architecting the C sources.

What we’d like to do is partition the project into separate parts, each having a clear usage.

The following sections are aimed at describing various features of c-xrefactory.

3.1. Main functionality

A programmer constantly needs to navigate, understand and improve the source code in order to lessen the cognitive load for understanding and making changes.

c-xrefactory provides two sets of functions for this directly from within the editor

  • navigation, searching and browsing symbols

  • automated refactorings, i.e. non-behaviour changing edits

C and Yacc source code is supported.

3.1.1. Navigation

A user can navigate all references of a symbol, limited to the semantic scope of that symbol, by "Goto Definition" and then navigate using "Next/previous Reference". This is a fast way to inspect where a symbol is used.

This also applies to non-terminals and semantic attributes in Yacc grammars!

3.1.2. Searching

A user can search for symbols by name using two operations:

Search Symbol

Finds all symbols whose name matches the search pattern. This includes functions, variables, types, macros — any symbol known to c-xrefactory, whether or not it is defined in the project.

Search Definition

Finds only symbols that have a definition within the project. Symbols that are merely used (e.g. called from an external library) are excluded.

Both operations accept wildcard patterns: matches any sequence of characters and ? matches a single character. For example, parse finds all symbols starting with "parse", while get_? finds three-letter symbols starting with "get_".

The search results are presented in a list. Use the up/down arrow keys to move between entries. Press RET to inspect a symbol — this navigates to its definition and enters the reference browser where Next/Previous Reference can be used to visit all references.

Pressing p and n in the search results navigates the search history — returning to the results of a previous or next search, not moving between entries in the current list.

Search currently reads from the on-disk references database, not from the live in-memory state. The database is saved automatically when Emacs exits or when the server is restarted. If the database has never been saved, search will return no results.

3.1.3. Completion

As c-xrefactory have information about symbols and their semantic scope, it can also provide semantically informed completions and suggestions.

3.1.4. Automated refactorings

In his book "Refactoring" Martin Fowler describes a large number of refactorings, changes to source code that does not change the behaviour, but improves it structure and readability. For each refactoring they describe step by step which edits to make to apply the refactoring manually.

The natural next step was of course to attempt to automate this in editors or IDEs, which started to happen.

In an article from 2001 Martin pronounces Xref, the ancestor to c-xrefactory, to be the first tool to cross "Refactorings Rubicon", being able to extract a function semantically correct.

The term "automated" means that some software can examine the source code and quickly and safely modify it using patterns from the list of possible refactorings, without user interaction. Many refactorings in the book, and on the website, are applicable mostly for OO-languages, but many also apply to C. c-xrefactory can perform some of them. More are considered for implementation.

  • "Rename Symbol" - change the name of a variable, type, function only for the semantic scope of the symbol

  • "Extract Macro/Function" - a region of the code can be extracted to a new function or macro

  • "Organize Includes" - clean up a list of #include directives by particioning and sorting them

  • "Rename Included File" - rename the file in the #include directive and update all other #include directives of that file

  • "Move Function To Other File" - move a function to another file, automatically add an extern declaration in an appropriate header file and ensure that is included in the file where the function originally was

Using these automated refactorings it is much easier and safer to continuously maintain and improve the quality of any code base.

3.2. Options, option files and configuration

The current version of C-xrefactory allows only two possible sets of configuration/options.

The primary storage is (currently) the file $HOME/.c-xrefrc which stores the "standard" options for all projects. Each project has a separate section which is started with a section marker, the project name surrounded by square brackets, [project1].

When you start c-xref you can use the command line option -xrefrc to request that a particular option file should be used instead of the "standard options".

When running the edit server there seems to be no way to indicate a different options files for different projects/files. Although you can start the server with -xrefrc you will be stuck with that in the whole session and for all projects.

3.3. LSP

The LSP protocol is a common protocol for language servers such as clangd and c-xrefactory. It allows an editor (client) to interface to a server to request information, such as reference positions, and operations, such as refactorings, without knowing exactly which server it talks to.

Recent versions of c-xrefactory have an initial implementation of a very small portion of the LSP protocol. The plan is to fully integrate the functionality of c-xrefactory into the LSP protocol. This will allow use of c-xrefactory from not only Emacs but also Visual Studio Code or any other editor that supports the LSP protocol.

3.3.1. LSP Protocol Limitations

The LSP protocol was designed for single-shot, non-interactive operations. This creates constraints for c-xrefactory’s advanced refactorings:

Interactive Refactorings: C-xrefactory’s extract/parameter operations require multi-step user input (names, positions, declarations). LSP’s textDocument/codeAction doesn’t support interactive dialogs.

Symbol Browsing: C-xrefactory provides interactive symbol browsers with filtering and keyboard navigation. LSP returns flat reference lists with no standard for interactive UI.

Strategy: The LSP implementation aims to:

  • Provide basic IDE features (definition, completion, simple refactorings) to modern editors

  • Expose c-xrefactory’s advanced refactoring capabilities where possible

  • Keep the Emacs client as the primary interface for full interactive features

LSP serves to make c-xrefactory more accessible while the Emacs client probably will remain the gateway to its complete refactoring power.

4. Quality Attributes

The most important quality attributes are

  • correctness - a refactoring should never alter the behaviour of the refactored code

  • completness - no reference to a symbol should ever be missed

  • performance - a refactoring should be sufficiently quick so the user keeps focus on the task at hand

5. Constraints

TBD.

6. Principles

6.1. Reference Database and Parsing

The reference database is used only to hold externally visible identifiers to ensure that references to an identifier can be found across all files in the used source.

All symbols that are only visible inside a unit is handled by reparsing the file of interest.

This describes the semantics of the persisted snapshot (.cx files), not the reference table in memory. The in-memory reference table holds all symbols encountered during parsing, including file-local ones. Only externally visible symbols are persisted to the snapshot because file-local symbols can always be reconstructed by reparsing. As the architecture moves toward "memory as truth" (see Roadmap), the distinction between "persisted" and "in-memory" symbols may evolve.

6.2. Terminology

The following terms are used throughout the documentation and codebase. Preferred terms are listed alphabetically; terms to avoid are noted to reduce ambiguity.

Cold start

Server startup with no persisted snapshot available. All compilation units must be fully parsed to populate the reference table. Same code path as warm start, just more work.

Compilation unit (CU)

A source file that is directly compiled (.c, .y). Discovered by globbing the project directory. Distinguished from header files, which are included transitively and not compiled independently.

Include structure

The graph of #include relationships between files, represented in the reference table as TypeCppInclude references. Cheap to build (text scanning for #include lines). Separate from symbol references, which require full parsing.

Initialization

The first-request setup: discover project, load options, interrogate compiler, restore snapshot, scan project structure. Happens once per session. Not the same as cold start — initialization happens on every session, cold start describes the absence of a snapshot.

Lightweight scan

Discovery of project structure by globbing for compilation units and text-scanning #include lines, without full parsing. Populates the file table and include structure. Replaces -create and the callXref() pre-refactoring pattern (ADR 22).

Persisted snapshot

The .cx files on disk. A point-in-time copy of the reference table from a previous session, loaded at startup to avoid a full parse. May be stale — reconciled against the filesystem via mtime comparison.
Avoid "disk db", "cache", "reference database" when referring to the .cx files. A snapshot is not queryable (no search operations on disk), has no invalidation semantics (unlike a cache), and is not the source of truth (unlike a database).

Reference table

The authoritative in-memory state during a running session. Comprises the referenceableItemTable (symbol references), the file table (file entries with modification tracking), and TypeCppInclude references (include structure). Populated by parsing, snapshot restoration, and lightweight scanning.
Avoid "in-memory db", "memory db". It is the live working set, not a database in the traditional sense.

Restoring

Loading a persisted snapshot into the reference table at startup. After restoration, mtime comparison determines which entries are fresh and which need reparsing.
Avoid "loading the database".

Saving / persisting

Writing the current reference table state to .cx files. Only disk-file-derived references are persisted; references from unsaved editor buffers are excluded so that mtime-based validation remains correct at next startup.
Avoid "writing the database".

Staleness

A file is stale when its content has changed since it was last parsed. Detected by comparing lastParsedMtime against the current file modification time (from disk or editor buffer). Stale files trigger reparsing during the entry refresh (ADR 20).

Steady state

The server is initialized and processing requests. Staleness detection and incremental reparsing happen per-request. The reference table is authoritative.
Avoid "hot start" — ambiguous.

Symbol references

The detailed reference information (positions, usage types) for identifiers, created by full parsing. Expensive to produce. Distinguished from include structure, which is cheap.

Entry refresh

The per-request mechanism that ensures the reference table is up-to-date before executing an operation. Covers both staleness (content was known but is now outdated) and unknown content (file discovered by scan but never parsed). Uses the include structure to determine what needs reparsing. Runs on every request in callServer().
See Server Mode Flow in Code for implementation details.

Warm start

Server startup with an existing persisted snapshot. Most compilation units are fresh (snapshot mtime matches disk mtime); only stale ones need reparsing. Same code path as cold start, less work.

7. Software Architecture

7.1. Container View

ContainerView

7.2. Containers

At this point the description of the internal structure of the containers are tentative. The actual interfaces are not particularly clean, most code files can and do include much every other module.

There is still ongoing work is to try to identify modules/components, which are not always directly mapped to source files, but based on higher level responsibilites.

7.2.1. CxrefProgram

CxrefProgram

cxrefProgram is the core container. It does all the work when it comes to finding and reporting references to symbols, communicating refactoring requests as well as storing reference information for longer term storage and caching.

Although c-xref can be used as a command line tool, which can be handy when debugging or exploring, it is normally used in "server" mode. In server mode the communication between the editor extension and the 'cxrefProgram` container is a back-and-forth communication using a non-standard protocol over standard pipes.

The responsibilities of cxrefProgram can largely be divided into

  • parsing source files to create, maintain the references database which stores all inter-module references

  • parsing source files to get important information such as positions for a functions begin and end

  • managing editor buffer state (as it might differ from the file on disc)

  • performing symbol navigation

  • creating and serving completion suggestions

  • performing refactorings such as renames, extracts and parameter manipulation

At this point it seems like refactorings are performed as separate invocations of c-xref rather than through the server interface.

7.2.2. EditorExtension

EditorExtension

The EditorExtension container is responsible for plugging into an editor of choice and handle the user interface, buffer management and executing the refactoring edit operations.

Currently there is only one such extension supported, for Emacs, although there existed code, still available in the repo history, for an extension for jEdit which hasn’t been updated, modified or checked for a long time and no longer is a part of this project.

There is a proof of concept implementation of a rudimentary LSP adapter which would make it possible to use c-xrefactory from a wide range of editors and IDEs, at least for many operations.

7.2.3. ReferencesDB

The References database stores crossreferencing information for symbols visible outside the module it is defined in. Information about local/static symbols are not stored but gathered by parsing that particular source file on demand.

Currently this information is stored in a somewhat cryptic, optimized text format.

This storage can be divided into multiple files, probably for faster access. Symbols are then hashed to know which of the "database" files it is stored in. As all crossreferencing information for a symbol is stored in the same "record", this allows reading only a single file when a symbol is looked up.

8. Components

This chapter describes the components of the cxrefProgram container, as defined in the C4 component diagram. Each section documents a component’s responsibilities, architecture, and interface.

Some components have clear boundaries and well-defined interfaces; others are still being separated from the legacy monolithic structure. The descriptions reflect the current state — what each component actually does today.

8.1. Parsing

8.1.1. Responsibilities

Parse C and Yacc source files, producing symbol references and semantic information for the reference database and for feature-specific operations (completion, extraction, refactoring).

8.1.2. Internal Structure

The parsing component consists of several internal modules:

  • Lexer and integrated preprocessor (lexer.c, yylex.c) — transforms source text into lexem sequences, handles C preprocessor directives (macro definition and expansion, conditional compilation, #include processing), and manages include file contexts by pushing and popping read states

  • Grammar parsers — three yacc-generated parsers: C (c_parser.y), Yacc (yacc_parser.y), and C preprocessor expressions (cppexp_parser.y)

  • Semantic actions — modules that hook into grammar rules during parsing:

    • semact.c — core semantic actions: symbol tables, type checking, reference creation

    • extract.c — feature semantic actions for extract refactoring (if PARSE_TO_EXTRACT)

    • complete.c — feature semantic actions for completion (if PARSE_TO_COMPLETE)

  • Dispatch layer (parsers.c) — selects the parser based on file language

  • Configuration and orchestration (parsing.c, parsing.h) — sets up parsing state and provides the external entry points

The integrated preprocessor is a key architectural choice: by implementing its own preprocessor rather than using the system’s, c-xrefactory can navigate to macro definitions, show macro usage, refactor macro names, and complete macro identifiers. The trade-off is imperfect compatibility with all compiler-specific preprocessor extensions.

8.1.3. Parser Operations

The parser’s behavior is configured through ParserOperation, decoupling parsing from server-level concerns:

Operation Purpose

PARSE_TO_CREATE_REFERENCES

Standard parse: create symbol references in the in-memory reference table

PARSE_TO_COMPLETE

Build completion candidates at cursor position

PARSE_TO_EXTRACT

Track blocks, variables, and control flow for extract refactoring

PARSE_TO_GET_FUNCTION_BOUNDS

Record function start/end positions

PARSE_TO_VALIDATE_MOVE_TARGET

Check if a position is a valid move-function target

PARSE_TO_TRACK_PARAMETERS

Track parameter positions for argument manipulation

The server maps its ServerOperation to a ParserOperation via getParserOperation(), so the parser never needs to know about server-level operation enums.

8.1.4. Parser Generation

The parsers are generated using a patched Berkeley yacc (byacc-1.9). The patch modifies the skeleton to support error recovery and a recursive parsing feature (originally for Java). CPP macros rename parser data structures so that multiple parsers can coexist in the same executable. The Makefile generates and renames the parser output files.

8.1.5. Interface

Key entry points (see parsing.h):

// Parse a file and create references in the in-memory table
void parseToCreateReferences(const char *fileName);

// Configure parsing for a specific file (sets language, includes, etc.)
void setupParsingConfig(int fileNumber);

// Dispatch to the appropriate parser
void callParser(int fileNumber, Language language);

parseToCreateReferences() is the clean entry point used by LSP mode, the entry-point reparse loop, and navigation refresh. It takes a filename, determines the language, sets up configuration, and parses.

callParser() is the lower-level dispatch used by server mode’s parseInputFile(), where configuration is set up separately with cursor position and operation-specific state.

8.2. Xref

8.2.1. Responsibilities

Build and update the on-disk cross-reference database (.cx files) by parsing all scheduled project files. The xref component is the batch counterpart to the interactive server: where the server processes one file per editor request, xref processes all project files in a single invocation.

8.2.2. Operations

  • Create (-create): Parse every project file from scratch, generate the full .cx database

  • Fast update (-update): Re-parse only files whose modification time has changed

  • Full update (-update -fullupdate): Re-parse modified files and their include closure — all compilation units that transitively include a modified header (makeIncludeClosureOfFilesToUpdate())

8.2.3. Memory Overflow Handling

When the in-memory reference table overflows mid-parse, xref flushes the accumulated references to disk via saveReferences(), recovers memory, and continues parsing the remaining files. This setjmp/longjmp-based overflow mechanism allows xref to handle projects larger than available memory.

8.2.4. Interface

void xref(ArgumentsVector args);                    // Top-level entry: open output, load buffers, call callXref, save
void callXref(ArgumentsVector args, XrefConfig *config);  // Core loop: schedule files, parse all, save references

callXref() is also called by the refactory to re-index files after refactoring operations.

8.3. Server

8.3.1. Responsibilities

Serve the editor extension by processing requests in an infinite loop. Each request is a set of command-line-style options received over stdin. The server:

  • Dispatches operations (navigation, completion, refactoring support, project management)

  • Ensures all operations see fresh in-memory references by reparsing stale files at the entry point

  • Manages the browsing session stack for navigation operations

  • Coordinates with the parser subsystem to process input files

8.3.2. Request Lifecycle

The server runs as a long-lived process started by the editor (c-xref -server). Communication is over stdin/stdout using a text protocol where commands look like command-line options.

server() — infinite request loop [server.c]
  │
  └─> FOR EACH REQUEST:
       ├─> Read options from pipe
       ├─> initServer() — process options, schedule input file
       ├─> callServer() — main dispatch
       │    ├─> loadAllOpenedEditorBuffers()
       │    ├─> Reparse stale preloaded files (Pass 1: CUs, Pass 2: header includers)
       │    ├─> prepareInputFileForRequest()
       │    ├─> FIRST REQUEST (GetProject):
       │    │    ├─ initializeProjectContext()
       │    │    ├─ processFileArguments() — discover CUs by globbing project dir
       │    │    ├─ loadFileNumbersFromStore() — load .cx snapshot into file table
       │    │    └─ parseDiscoveredCompilationUnits() — parse stale CUs, skip fresh
       │    ├─> Dispatch based on operation:
       │    │    ├─ Operations needing input file → processFile() → parse
       │    │    └─ Other operations (filter, pop, etc.)
       │    └─> Navigation operations use browsing stack
       ├─> answerEditorAction() — send response to editor
       └─> Cleanup (close buffers, close output)

8.3.3. Entry-Point Reparse (ADR 20)

Before dispatching any operation, the server reparses stale preloaded files so that all operations see fresh in-memory references. This separates the concern of "keeping data fresh" from individual operation logic.

The entry-point reparse loop:

  1. Iterates all editor buffers (files preloaded from the editor)

  2. For each stale compilation unit (.c, .y — determined by isCompilationUnit()): reparses the file, updates lastParsedMtime, and sets needsBrowsingStackRefresh on the FileItem

  3. For each stale header: walks the reverse-include graph (via TypeCppInclude references) to find compilation units that transitively include it, and reparses those CUs

  4. Sets options.cursorOffset = -1 during reparse to prevent the lexer from triggering on-line action handling

The needsBrowsingStackRefresh flag bridges the entry-point (which handles parsing) and the navigation module (which handles browsing stack updates). Navigation checks this flag instead of fileNumberIsStale() and calls updateSessionReferencesForFile() to update only the browsing stack without re-parsing.

After a browsing stack refresh, the server restores its position by finding the nearest reference to where it was, then advances (NEXT) or retreats (PREVIOUS) in list order. This respects the definition-first navigation ordering without assuming any particular sort order.

8.3.4. File Discovery and the File Table

The server can only reparse and navigate files it knows about. "Knowing" a file means it has an entry in the file table — a file number and metadata (FileItem). But being known does not mean being parsed: a file in the table may have no references in the referenceableItemTable yet. This distinction matters because the entry-point reparse loop, the reverse-include graph walk, and navigation all operate only on known files.

Files enter the file table through three mechanisms:

Disk database (loadFileNumbersFromStore()). On the first request that triggers project initialization, the server loads file numbers from the .cx database. This is why -create was historically required before any interactive use: it populated the persistent database that seeds the file table. Without it, the server would only know about the single file in the current request.

Editor preloads. When the editor sends -preload <file> <tmpfile>, the file enters the file table via the editor buffer mechanism. Files created during a session — new source files the user opens — are discovered this way without any explicit rescan.

Project directory glob (processFileArguments()). At startup, after project context initialization, the server walks options.inputFiles (typically ., the project root) recursively, filtering by known source suffixes (.c, .y, etc.) and honoring -prune paths. This discovers all compilation units in the project directory without requiring a prior -create invocation.

The three mechanisms are complementary: the disk database provides historical knowledge, preloads provide real-time updates during the session, and the project directory glob provides a fresh scan of what actually exists on disk at startup.

Design boundary: Files that appear on disk but are not in the file table — because they weren’t in the disk database, weren’t opened in the editor, and weren’t present at startup — are invisible to the server until the next session. The startup glob combined with editor preloads covers the primary use cases: all project files are discovered at startup, and new files created during the session enter through the editor.

8.3.5. Operation Classification

Operations are classified by what they need:

  • Needs reference database (needsReferenceDatabase): navigation, refactoring support, unused symbol detection — these push a browsing session

  • Requires input file processing (requiresProcessingInputFile): completion, search, extraction, and all reference-database operations — these parse the request’s input file

  • Neither: project management, filter changes, browsing stack manipulation — these operate on existing session state

8.3.6. Interface

void server(ArgumentsVector args);       // Infinite request loop
void callServer(ArgumentsVector baseArgs, ArgumentsVector requestArgs);  // Single request dispatch
void initServer(ArgumentsVector args);   // Per-request initialization

8.4. LSP Adapter

8.4.1. Responsibilities

Implement the Language Server Protocol interface, allowing LSP-capable editors (VS Code, Emacs lsp-mode, etc.) to use c-xrefactory’s parsing and navigation capabilities. Currently at proof-of-concept stage with partial textDocument/definition support.

8.4.2. Internal Structure

The LSP adapter is a self-contained subsystem with clear internal layering:

  • Message loop (lsp.c) — reads LSP framed messages (Content-Length headers + JSON body) from stdin, delegates to the dispatcher, and runs until shutdown/exit

  • Dispatcher (lsp_dispatcher.c) — maps LSP method strings ("textDocument/definition", "initialize", etc.) to handler functions via a static dispatch table

  • Handlers (lsp_handler.c) — implement individual LSP requests and notifications: initialize (sets up file table, editor buffers, parsing subsystem, and reference database), textDocument/didOpen (loads file content and parses it), textDocument/definition (delegates to adapter), shutdown/exit (cleanup)

  • Adapter (lsp_adapter.c) — bridges LSP concepts to c-xrefactory internals. findDefinition() converts LSP URI and position to an internal Position, queries the ReferenceDatabase, and returns an LSP location JSON object

  • Sender (lsp_sender.c) — formats and sends JSON responses with Content-Length framing

  • Utilities (lsp_utils.c) — coordinate conversions: URI to file path, LSP line/character to byte offset and back

8.4.3. Architectural Differences from Server Mode

The LSP adapter takes a fundamentally different approach from the legacy editor server:

Aspect Server Mode LSP Mode

Protocol

Custom text protocol (command-line options over pipe)

Standard LSP (JSON-RPC over Content-Length framing)

Initialization

Relies on pre-existing .cx database files

Builds in-memory reference database from scratch via parsing

File handling

Editor sends preloads; server uses file table scheduling

textDocument/didOpen loads content and parses immediately

Entry point

parseInputFile() via server dispatch

parseToCreateReferences() — the clean parsing entry point

8.4.4. Current Limitations

The LSP adapter is a proof-of-concept. Key gaps:

  • Only textDocument/definition partially works — and only for files that have been opened (parsed) in the current session

  • No incremental updates: modifying a file after opening does not re-parse

  • No project-wide indexing: symbols from unopened files are invisible

  • The ReferenceDatabase abstraction is minimal and separate from the legacy in-memory reference table

8.4.5. Interface

// Top-level entry: detect -lsp flag and run the LSP message loop
bool want_lsp_server(ArgumentsVector args);
int lsp_server(FILE *input);

// Adapter: bridge LSP requests to c-xrefactory operations
JSON *findDefinition(const char *uri, JSON *position);

8.5. Refactory

8.5.1. Responsibilities

Coordinate refactoring operations: rename, move, extract, and argument manipulation. The refactory component receives a refactoring request, uses the parser and reference database to analyze the code, performs safety checks, and produces a sequence of edits for the editor to apply.

8.5.2. Invocation Model

Refactoring operations run as a separate c-xref invocation, not through the long-lived editor server. The editor starts a new c-xref process with -refactory and the specific refactoring flag (e.g. -rfct-rename, -rfct-extract-function). This separate process communicates results back via the protocol and exits when done.

This design means the refactoring process has its own option state (refactoringOptions), separate from the server’s. The check options.refactoringRegime == RegimeRefactory gates refactoring-specific code paths.

8.5.3. Operation Flow

A typical refactoring (e.g. rename):

  1. Editor invokes c-xref -refactory -rfct-rename -renameto=NEW_NAME -olcursor=POSITION FILE

  2. Refactory parses the file to identify the symbol at the cursor

  3. Safety checks verify the rename is valid (no collisions, scope analysis)

  4. For each occurrence: sends <goto> + <precheck> to verify the editor’s file matches

  5. Sends <replacement> instructions for each occurrence

  6. Editor applies the edits

8.5.4. Safety Checks

Before applying a refactoring, the refactory performs safety checks (OP_INTERNAL_SAFETY_CHECK). These use the reference database to verify that the transformation is semantically valid — for example, that a rename won’t collide with an existing symbol in scope.

8.6. Extract

8.6.1. Responsibilities

Analyze control flow for extract-function/macro/variable refactoring. The extract component operates in two phases: collection during parsing (registers synthetic labels, gathers references) and analysis after parsing (classifies variables, generates output).

8.6.2. Operation

Extraction uses a specialized parse operation (PARSE_TO_EXTRACT) that tracks:

  • Block structure and nesting

  • Variable definitions and uses within the selection

  • Control flow (return, break, continue) crossing the extraction boundary

After parsing, it classifies variables as inputs (passed as parameters), outputs (returned), or local (moved into the extracted function), and generates the function signature, call site, and body.

8.7. Cxref

8.7.1. Responsibilities

Manage the in-memory symbol reference tables and the browsing session stack. The cxref component is the runtime engine for symbol lookup: it loads references from the on-disk database (via cxfile), merges them with freshly parsed references, and provides the data structures that navigation and refactoring operations query.

The component boundary is not yet clean. Parts of this functionality are spread across cxref.c, session.c, navigation.c, and referenceableitemtable.c. The description below reflects the logical responsibilities, not a single module.

8.7.2. Architecture Overview

c-xrefactory’s core functionality relies on a symbol database that stores cross-references, definitions, and usage information for all symbols in a project. The database has two forms:

  • On-disk (.cx files) — persistent, hash-partitioned symbol records managed by the Cxfile component

  • In-memory (reference tables) — runtime tables populated by parsing and by loading from .cx files

8.7.3. Key Data Structures

Browsing Stack (sessionData.browsingStack)

The browsing stack is the runtime data structure for symbol navigation. Each push operation (triggered by a navigation request) creates a new session entry containing the references for the symbol under the cursor.

Referenceable Items

A ReferenceableItem represents a symbol (function, variable, macro, type, etc.) with its attributes: link name, type, storage class, scope, and visibility. Each referenceable item has a linked list of Reference entries recording every usage position.

References

A Reference records a single occurrence of a symbol: file, line, column, and usage kind (defined, declared, used, etc.). References are the fundamental unit that navigation, unused-symbol detection, and refactoring operate on.

8.7.4. Symbol Resolution Flow

When the user requests "go to definition" for a symbol:

  1. Parse the current file to identify the symbol at cursor position

  2. Load symbol data from .cx files (via cxfile) into the browsing stack

  3. Merge with any in-memory references from recently parsed files

  4. Order references by usage priority (definition > declaration > usage)

  5. Navigate to the best definition position

8.7.5. Database Operations

  • Create (-create): Parse all project files, generate reference items, write to .cx files

  • Update (-update): Re-parse modified files (with include-closure expansion for full updates), merge into existing database

  • Query (server operations): Load symbol data from .cx files into the browsing stack for navigation

8.8. Main

8.9. Memory

8.9.1. Responsibilities

The Memory module provides arena-based allocation for performance-critical and request-scoped operations:

  • Fast allocation for macro expansion and lexical analysis

  • Bulk deallocation for request-scoped cleanup

  • Multiple specialized arenas for different data lifetimes

  • Overflow detection and optional dynamic resizing

8.9.2. Design Rationale

Historical Context

In the 1990s when c-xrefactory originated, memory was scarce. The design had to:

  1. Minimize allocation overhead (no malloc/free per token)

  2. Support large projects despite limited RAM

  3. Allow overflow recovery via flushing and reuse

  4. Enable efficient bulk cleanup

Most memory arenas use statically allocated areas. Only cxMemory supports dynamic resizing to handle out-of-memory situations by discarding, flushing and reusing memory. This forced implementation of a complex caching strategy since overflow could happen mid-file.

Modern Benefits

Even with abundant modern memory, arena allocators provide:

  • Performance: Bump pointer allocation is ~10x faster than malloc

  • Cache locality: Related data allocated contiguously

  • Automatic cleanup: Bulk deallocation prevents leaks

  • Request scoping: Natural fit for parsing/expansion operations

8.9.3. Arena Types and Lifetimes

Arena Purpose Lifetime

cxMemory

Symbol database, reference tables, cross-reference data

File or session

ppmMemory

Preprocessor macro expansion buffers (temporary allocations)

Per macro expansion

macroBodyMemory

Macro definition storage

Session

macroArgumentsMemory

Macro argument expansion

Per macro invocation

fileTableMemory

File metadata and paths

Session

optMemory

Command-line and config option strings (with special pointer adjustment)

Session

8.9.4. Key Design Patterns

Marker-Based Cleanup

Functions save a marker before temporary allocations:

char *marker = ppmAllocc(0);   // Save current index
// ... temporary allocations ...
ppmFreeUntil(marker);          // Bulk cleanup

Buffer Growth Pattern

Long-lived buffers that may need to grow:

// Allocate initial buffer
bufferDesc.buffer = ppmAllocc(initialSize);

// ... use buffer, may need growth ...

// Free temporaries FIRST
ppmFreeUntil(marker);

// NOW buffer can grow (it's at top-of-stack)
expandPreprocessorBufferIfOverflow(&bufferDesc, writePointer);

Overflow Handling

The cxMemory arena supports dynamic resizing:

bool cxMemoryOverflowHandler(int n) {
    // Attempt to resize arena
    // Return true if successful
}

memoryInit(&cxMemory, "cxMemory", cxMemoryOverflowHandler, initialSize);

When overflow occurs, handler can:

  1. Resize the arena (if within limits)

  2. Flush old data and reset

  3. Signal failure (fatal error)

8.9.5. Interface

Key functions (see memory.h):

// Initialization
void memoryInit(Memory *memory, char *name,
                bool (*overflowHandler)(int n), int size);

// Allocation
void *memoryAlloc(Memory *memory, size_t size);
void *memoryAllocc(Memory *memory, int count, size_t size);

// Reallocation (only for most recent allocation)
void *memoryRealloc(Memory *memory, void *pointer,
                    size_t oldSize, size_t newSize);

// Bulk deallocation
size_t memoryFreeUntil(Memory *memory, void *marker);

// Guards
bool memoryIsAtTop(Memory *memory, void *pointer, size_t size);

8.9.6. Common Pitfalls

See the "Arena Allocator Lifetime Violations" section in the Development Environment chapter for:

  • Attempting to resize buffers not at top-of-stack

  • Calling FreeUntil() too late

  • Mixing arena lifetimes

8.9.7. Future Directions

Modern systems have abundant virtual memory. Possible improvements:

  1. Simplify overflow handling - Allocate larger initial arenas

  2. Separate lifetime management - Don’t mix temporary and long-lived allocations

  3. Consider alternatives - Linear allocators for some use cases

  4. Add debug modes - Track allocation patterns and detect violations

The experimental FlushableMemory type explores some of these ideas but hasn’t replaced current implementation.

8.10. Content Buffers (EditorBuffer)

8.10.1. Responsibilities

Content buffers provide an in-memory cache of file content that transparently overrides disk file reading during parsing. When a content buffer exists for a file, the parser uses its in-memory content instead of reading from disk.

8.10.2. Three Roles

Content buffers (the EditorBuffer struct) serve three distinct roles:

Role preLoadedFromFile How content arrives

Editor content

Non-NULL (tmp file path)

Client writes unsaved buffer to a tmp file; server reads the tmp file. Content represents the editor’s in-memory state, which may differ from the original file on disk.

Disk file cache

NULL

Created on demand by findOrCreateAndLoadEditorBufferForFile() when the lexer encounters an #include or other code needs file content. Avoids re-reading the same file from disk during a single parse or across multiple parses.

LSP document state

NULL

Content arrives in a textDocument/didOpen JSON message. No tmp file, no disk read. Lives from didOpen to didClose.

8.10.3. How Parsing Uses Buffers

The lexer reads from a character buffer (currentFile.characterBuffer), which is the common layer regardless of content source. Where the character buffer gets its data depends on whether a content buffer exists:

  • Content buffer path: initInputFromEditorBuffer() points the character buffer directly into the content buffer’s text memory. All content is available immediately — no file I/O.

  • File path: initInputFromFile() uses currentFile.characterBuffer.chars — a separate fixed-size buffer. Data is read from the FILE handle incrementally as the lexer consumes characters.

When the lexer processes an #include, it calls findOrCreateAndLoadEditorBufferForFile() for the included file. This function:

  1. Checks if a content buffer already exists (preloaded or cached) — if so, returns it

  2. Otherwise reads the file from disk into a new content buffer

If a content buffer is found, the character buffer is set up to read from it. If not (file doesn’t exist or is a directory), the fallback openFile() path reads from disk via a FILE handle. This means preloaded editor content transparently overrides disk content — the parser doesn’t know or care where the content came from.

8.10.4. Lifetime

Buffer lifetime depends on the server mode and buffer role:

Context Behavior

Server mode, editor content

Currently destroyed after each request (closeAllEditorBuffers). ADR-0023 proposes making these persistent across requests.

Server mode, disk cache

Destroyed after each request. Appropriate since different requests may parse different files.

Xref mode, disk cache

Preserved across file processing (closeAllEditorBuffersIfClosable). A header like commons.h included by many CUs is read once and reused.

LSP mode, document state

Persistent from didOpen to didClose. Not affected by request boundaries.

The existing bufferIsCloseable() predicate distinguishes these cases: it returns false for preloaded buffers (preserving editor content) and true for loaded, unmodified, marker-free buffers (allowing cache cleanup).

8.10.5. Staleness Detection

For preloaded buffers, staleness is detected by comparing fileItem→lastParsedMtime with buffer→modificationTime. After parsing a preloaded buffer, lastParsedMtime is set to the buffer’s modification time. If the buffer is re-sent with a new tmp file (new mtime), the file appears stale and is reparsed.

See the Server component’s Entry-Point Reparse section for how staleness drives the sync phase.

8.11. Cxfile

8.11.1. Responsibilities

Read and write the CXref database (.cx files) in a compact text format. Cxfile is the persistence layer: it serializes symbol references to disk after parsing and loads them back during navigation and refactoring operations.

8.11.2. Database Structure

The database uses a hash-partitioned file structure:

cxrefs/
├── files        # File metadata and paths
├── 0000         # Symbol data for hash bucket 0
├── 0001         # Symbol data for hash bucket 1
└── ...          # Additional hash buckets (count set by -refnum)

All information about a symbol is stored in exactly one file, determined by hashing its link name. This means a single file read suffices to look up any symbol.

8.11.3. File Format

Records use the general format <number><key>[<value>]. The encoding uses single-character markers listed at the top of cxfile.c.

The coding often starts with a number followed by a character key: 4l means line 4, 23c means column 23. References are optimized to avoid repeating fields that haven’t changed — so 15l3cr7cr means two references on line 15, one at column 3, the other at column 7 (using fsulc fields: file, symbol index, usage, line, column).

Some fields carry a length prefix: filenames use <length>:<path> (e.g. 84:/home/…​/file.c), version strings use <length>v.

Example file information line:

32571f  1715027668m 21:/usr/include/ctype.h
  • 32571f — file number 32571

  • 1715027668m — modification time (to detect stale entries)

  • 21:/usr/include/ctype.h — filename (21 characters)

8.11.4. Reading

Reading is controlled by scanFunctionTable arrays. Each entry maps a record key to a handler function. As the reader encounters a key in the file, it looks up the handler and calls it. This table-driven approach allows different consumers to register for different record types — for example, loading only symbol names vs. loading full reference lists.

8.12. Editor Extension

The Emacs editor extension is implemented in these components/files:

  • c-xref.el

  • c-xrefactory.el

  • c-xrefprotocol.el (auto-generated)

8.12.1. Responsibilities

The Emacs client extension provides the user-facing interface for navigation, refactoring, and completion. It starts the c-xref process on the first user interaction and communicates with the server process over stdin/stdout using a text protocol.

8.12.2. Preloading

To give the server access to unsaved content the client sends -preload <file> <tmpfile> for modified editor. On PUSH, NEXT, and PREVIOUS, all modified buffers (including the current one if modified) are preloaded. Unmodified buffers are not preloaded — the server reads the disk file directly.

The preload mechanism writes the buffer content to a temporary file and passes both the original filename and the temp file path. The server creates an EditorBuffer with the content and the temp file’s modification time. After each request, all editor buffers are closed (closeAllEditorBuffers), so preloads are re-sent on every request. == Implementation Notes

This chapter collects cross-cutting implementation details that don’t belong to a single component: editor-server protocol, file processing orchestration, multi-pass configuration, and other observations about how the subsystems interact.

8.13. Commands

The editorExtension calls the server using command line options. These are then converted to first a command enum starting in OP ("operation") or AVR ("available refactoring").

Some times the server needs to call the crossreferencer which is performed in the same manner, command line options, but this call is internal so the wanted arguments are stored in a vector which is passed to the xref() in the same manner as main() passes the actual argc/argv.

Many of the commands require extra arguments like positions/markers which are passed in as extra arguments. E.g. a rename requires the name to rename to which is sent in the renameto= option, which is argparsed and stored in the option structure.

Some of these extra arguments are fairly random, like -olcxparnum= and -olcxparnum2=. This should be cleaned up.

A move towards "events" with arguments would be helpful. This would mean that we need to:

  • List all "events" that c-xref need to handle

  • Define the parameters/extra info that each of them need

  • Clean up the command line options to follow this

  • Create event structures to match each event and pass this to server, xref and refactory

  • Rebuild the main command loop to parse command line options into event structures

8.14. Passes

c-xrefactory makes it possible to parse the analyzed source multiple passes in case you compile the project sources with different C defines. In the project configuration file you specify `-passN' followed by the settings, typically C PreProcessor defines, that are to be applied for this pass over the sources.

8.15. File Processing Orchestration

The file processing architecture differs significantly between Server mode, Xref mode, and LSP mode, with confusing global state and naming inconsistencies that make the code hard to follow.

The current architecture is being restructured toward a unified server flow where lightweight file structure scanning replaces the legacy -create/callXref() pattern. See Target: Unified Server Flow (ADR 22) below and ADR 22 for the target design. Annotations marked [TARGET] indicate planned changes.

8.15.1. File Scheduling - How Files Get Marked for Processing

All modes begin by marking files for processing using the isScheduled flag in the file table.

Initial Scheduling (All Modes)

Called from:

  • Server: initServer()processFileArguments() [server.c:151]

  • Xref: mainTaskEntryInitialisations()processFileArguments() [startup.c:706]

Flow:

processFileArguments() [options.c:1893]
  │
  └─> FOR each file in options.inputFiles
       │
       └─> processFileArgument(filename) [options.c:1864]
            │
            └─> dirInputFile(...) [options.c:465]
                 ├─ If directory: recursively map over files
                 ├─ If file: scheduleCommandLineEnteredFileToProcess(filename)
                 │           └─ SETS: fileItem->isScheduled = true [line 450]
                 └─ If wildcard: expand and recurse

Result: All command-line files (and directory contents if -r flag) are marked with isScheduled = true.

[TARGET] processFileArguments() will be replaced by scanProjectForFilesAndIncludes(), which discovers CUs by globbing the project directory rather than relying on command-line file lists. This also builds the include graph transitively by text-scanning #include lines and resolving paths via options.includeDirs. See Target: Unified Server Flow (ADR 22).

Additional Update Scheduling (Xref Mode Only)

Called from: callXref()scheduleModifiedFilesToUpdate() [xref.c:296]

Flow:

scheduleModifiedFilesToUpdate(isRefactoring) [xref.c:207]
  │
  ├─> mapOverFileTable(schedulingToUpdate, isRefactoring)
  │    └─ For each file: if modified, SETS: fileItem->scheduledToUpdate = true
  │
  ├─> If UPDATE_FULL: makeIncludeClosureOfFilesToUpdate()
  │    └─ Expands scheduledToUpdate to include all files that #include updated files
  │
  └─> mapOverFileTable(schedulingUpdateToProcess)
       └─ For each file: if (scheduledToUpdate && isArgument)
           SETS: fileItem->isScheduled = true [line 115]

Result: In update mode, only modified files (and their includers) get isScheduled = true.

8.15.2. Server Mode Flow

SINGLE FILE PER REQUEST - Server processes one file per request.

server() - Infinite request loop
  │
  └─> FOR EACH REQUEST:
       │
       └─> callServer(baseArgs, requestArgs, &firstPass)
            │
            ├─> loadAllOpenedEditorBuffers()
            │
            ├─> prepareInputFileForRequest()
            │    └─ SETS: requestFileNumber
            │
            ├─> IF NOT projectContextInitialized (one-time):
            │    ├─ OP_ACTIVE_PROJECT: initializeProjectContext()
            │    │   → loadFileNumbersFromStore()
            │    │   → Legacy: processFileArguments()
            │    │             + parseDiscoveredCompilationUnits()
            │    └─ SETS: projectContextInitialized = true
            │
            ├─> Config-change-aware scan (auto-detect only):
            │    ├─ IF config changed: reloadProjectConfig()
            │    │   → re-reads .c-xrefrc, updates options.includeDirs
            │    └─ IF !scanDone OR config changed:
            │         scanProjectForFilesAndIncludes()
            │         + markMissingFilesAsDeleted()
            │         SETS: scanDone = true
            │
            ├─> Entry refresh (if projectContextInitialized):
            │    ├─ Pass 1: reparse stale CUs directly
            │    ├─ Pass 2: find CUs that include stale headers
            │    │           (via TypeCppInclude refs), reparse those
            │    └─ Pass 3: find unparsed CUs that share headers with
            │               the request file, parse those
            │
            └─> Dispatch operation
                 ├─ requiresProcessingInputFile: processFile()
                 └─ other: unschedule file

The entry refresh (see Terminology in Principles) ensures the reference table is up-to-date before each operation. Pass ordering is critical: Pass 1 reparses stale CUs, which updates their TypeCppInclude references. Pass 2 then queries those freshened references to find which CUs include stale headers. Pass 3 uses the include structure from the lightweight scan to find CUs sharing headers with the request file that have never been parsed. A user cannot create a new include edge without editing the includer, so the includer is always stale when the edge is new.

The per-request dispatch still follows the legacy pattern:

            └─> processFile(baseArgs, requestArgs, &firstPass) [server.c:199]
                 ├─ SETS: inputFileName = fileItem->name [line 205]
                 │
                 └─> singlePass(args, nargs, &firstPass) [server.c:155]
                      │
                      ├─> initializeFileProcessing(args, nargs, &firstPass) [startup.c:490]
                      │    ├─ READS: fileName = inputFileName [line 502]
                      │    ├─ USES: parsingConfig.fileNumber = currentFile.characterBuffer.fileNumber [line 161]
                      │    │
                      │    └─> computeAndOpenInputFile(inputFileName) [startup.c:112]
                      │         ├─ Gets EditorBuffer or opens file
                      │         │
                      │         └─> initInput(inputFile, inputBuffer, "\n", fileName) [yylex.c]
                      │              └─ Sets up currentFile global with CharacterBuffer
                      │
                      ├─> parseInputFile() [server.c:131]
                      │    ├─ USES: currentFile.fileName [line 133]
                      │    ├─ Calls setupParsingConfig(requestFileNumber) [line 136]
                      │    │
                      │    └─> callParser(parsingConfig.fileNumber, parsingConfig.language) [line 140]
                      │
                      └─> SPECIAL CASE: Completion in macro body [lines 183-196]
                           ├─ SETS: inputFileName = getFileItemWithFileNumber(...)->name [line 189]
                           ├─> initializeFileProcessing(args, nargs, &firstPass) [again]
                           └─> parseInputFile() [again]

8.15.3. Xref Mode Flow

PROCESSES ALL SCHEDULED FILES - Xref creates a list of all scheduled files and processes them in a loop.

xref(args) [xref.c:354]
  │
  └─> callXref(args, isRefactoring) [xref.c:283]
       │
       ├─> IF options.update:
       │    └─> scheduleModifiedFilesToUpdate(isRefactoring) [line 296]
       │         └─ Adds modified files to scheduled list
       │
       ├─> fileItem = createListOfInputFileItems() [line 298]
       │    └─ Creates linked list of ALL scheduled files (sorted by directory)
       │
       └─> FOR LOOP over fileItem list [line 314]
            │
            └─> oneWholeFileProcessing(args, fileItem, &firstPass, ...) [xref.c:179]
                 ├─ SETS: inputFileName = fileItem->name [line 181]
                 │
                 └─> processInputFile(args, &firstPass, &atLeastOneProcessed) [xref.c:149]
                      │
                      ├─> initializeFileProcessing(args, nargs, &firstPass) [startup.c:490]
                      │    ├─ READS: fileName = inputFileName [line 502]
                      │    │
                      │    └─> computeAndOpenInputFile(inputFileName) [startup.c:112]
                      │         └─> initInput(inputFile, inputBuffer, "\n", fileName) [yylex.c]
                      │              └─ Sets currentFile.characterBuffer.fileNumber
                      │
                      ├─ SETS: parsingConfig.fileNumber = currentFileNumber [line 160]
                      │         (NOTE: currentFileNumber is DIFFERENT global!)
                      │
                      └─> parseToCreateReferences(inputFileName) [parsing.c:165]
                           ├─ Gets EditorBuffer using fileName parameter
                           │
                           ├─> initInput(NULL, buffer, "\n", fileName) [line 181]
                           │    └─ DUPLICATE call! Already called in initializeFileProcessing!
                           │
                           ├─ Calls setupParsingConfig(currentFile.characterBuffer.fileNumber) [line 183]
                           │
                           └─> callParser(parsingConfig.fileNumber, parsingConfig.language) [line 190]

8.15.4. LSP Mode Flow (New, Simplified)

parseToCreateReferences(fileName) [parsing.c:165]
  ├─ Takes fileName as PARAMETER (not from global!)
  ├─ Gets EditorBuffer
  │
  ├─> initInput(NULL, buffer, "\n", fileName) [line 181]
  │    └─ Sets currentFile.characterBuffer.fileNumber
  │
  ├─> setupParsingConfig(currentFile.characterBuffer.fileNumber) [line 183]
  │
  └─> callParser(parsingConfig.fileNumber, parsingConfig.language) [line 190]

8.15.5. Key Observations

Assignments to `inputFileName`

Server Mode: Set once per file processed

  • processFile() line 205 - sets the file for the current request

  • Special case: macro completion may parse a different file (line 189) to resolve symbols in unexpanded macro bodies

Xref Mode: Set ONCE per file

  • oneWholeFileProcessing() line 181

`requestFileNumber` Only Used in Server Mode

  • Set in prepareInputFileForRequest() (lines 104, 121)

  • Used in parseInputFile() line 136: setupParsingConfig(requestFileNumber)

  • NOT used in Xref mode at all

Confusion: THREE Different File Number Globals!

inputFileName           // The file name being processed
requestFileNumber       // Server: file number from scheduled file (server.c)
currentFileNumber       // Xref: file number after parsing starts (parsing.c:26)

In xref.c line 160:

parsingConfig.fileNumber = currentFileNumber;

But currentFileNumber is defined in parsing.c:26:

int currentFileNumber = -1;     /* Currently parsed file, maybe a header file */

The comment reveals the distinction: currentFileNumber can change DURING parsing when entering #include files, while requestFileNumber stays constant as "the file we were asked to process."

Double initInput() Call in Xref Mode

In Xref mode, initInput() is called TWICE for the same file:

  1. First in initializeFileProcessing()computeAndOpenInputFile() [startup.c:128]

  2. Second in parseToCreateReferences() [parsing.c:181]

This appears to be a bug or wasteful duplication.

`initializeFileProcessing` is Heavy Orchestration

This 500-line function does five major phases:

  • Phase 1: Project discovery (find .c-xrefrc)

  • Phase 2: Options processing

  • Phase 3: Compiler interrogation (expensive! runs gcc -v)

  • Phase 4: Memory checkpointing (to skip Phase 3 for same-project files)

  • Phase 5: Finally calls computeAndOpenInputFile()initInput()

The firstPass parameter gates Phase 4’s memory checkpoint save/restore.

[TARGET] initializeProjectContext will be eliminated — it duplicates phases 1-4 of initializeFileProcessing. With the unified flow, all initialization goes through loadProjectSettings() once at startup. The multi-project fast-path optimization (checkpoint restore when same project) becomes dead code with single-project servers (ADR 21).

Naming Inconsistency

  • inputFileName - used in both Server and Xref modes, but set in different places

  • requestFileNumber - only Server mode, represents the file from the request

  • currentFileNumber - only Xref mode(?), set by initInput() after file opened

If they represent the same concept (the file being processed), they should have parallel names.

8.15.6. Summary: Multi-File vs Single-File Processing

Server Mode - Single File Per Request

  • Scheduling: All files scheduled ONCE in initServer()processFileArguments()

  • Processing: Each request picks ONE file via prepareInputFileForRequest()

    • Uses getNextScheduledFile() to get first scheduled file

    • FLAWED: unschedules all higher-numbered files (works because c-xref . schedules all)

    • Sets both inputFileName and requestFileNumber

  • Loop: Infinite request loop in server() - different file per request

Xref Mode - All Files Per Invocation

  • Scheduling: All files scheduled in mainTaskEntryInitialisations()processFileArguments()

  • Additional: In update mode, scheduleModifiedFilesToUpdate() adds modified files

  • Processing: createListOfInputFileItems() creates list of ALL scheduled files

    • Loops over entire list in callXref() [line 314]

    • Each file processed via oneWholeFileProcessing()

    • Sets inputFileName for each file

    • Uses currentFileNumber (different global!) instead of requestFileNumber

  • Loop: Single invocation processes all files

LSP Mode - Single File Per Request

  • No scheduling: Takes fileName as direct parameter

  • No global state: parseToCreateReferences(fileName) - clean interface

  • Processing: Direct call, no file table lookup needed

  • Modern design: Avoids the legacy scheduling/global state complexity

8.15.7. Target: Unified Server Flow (ADR 22)

The current architecture has two initialization paths (initializeProjectContext for OP_ACTIVE_PROJECT, initializeFileProcessing for legacy per-file) and relies on the disk database as the source of truth for project structure. The target design unifies these into a single path where the in-memory database is authoritative.

server() loop:
  │
  FOR EACH REQUEST:
  │
  ├─ 1. IF NOT initialized:
  │     ├─ Find project (.c-xrefrc), read options, discover compiler
  │     │  → loadProjectSettings(): options + compiler interrogation
  │     │  → Provides include paths (options.includeDirs)
  │     │
  │     ├─ Load disk db into memory
  │     │  → loadFileNumbersFromStore(): CU entries + lastParsedMtime
  │     │  → Cache/optimization, not source of truth
  │     │
  │     └─ projectContextInitialized = true
  │
  ├─ 2. Config-change-aware scan (re-runs when .c-xrefrc changes)
  │     ├─ IF config changed: reloadProjectConfig()
  │     │  → re-reads .c-xrefrc, updates options.includeDirs + savedOptions
  │     └─ IF !scanDone OR config changed:
  │        → scanProjectForFilesAndIncludes(): glob CUs (.c, .y),
  │          text-scan #include lines, resolve paths transitively
  │          (CU dir + options.includeDirs), populate file table +
  │          TypeCppInclude refs
  │        → markMissingFilesAsDeleted(): CUs only (glob-discoverable)
  │        → scanDone = true
  │
  ├─ 3. Update what changed (same two-pass as ADR 20)
  │     ├─ Pass 1: reparse stale CUs
  │     └─ Pass 2: reparse CUs that include stale headers
  │
  ├─ 4. Execute operation
  │
  └─ 5. Repeat from 2

Key principles:

  • "Cold start" is not a separate path. Same code, different amount of work. With disk db: most CUs are fresh, few need reparsing. Without disk db: all CUs are unknown, all get parsed.

  • Include structure and symbol references are separate layers. Include structure is cheap (text scanning for #include). Symbol references are expensive (full parsing). The lightweight scan provides the first; full parsing adds the second incrementally, on demand.

  • Conditional includes are conservative. Text scanning sees all #include lines regardless of #ifdef guards — a superset of the true include graph. This matches the multi-pass philosophy and is correct: a false include edge only causes one extra reparse on staleness.

This design eliminates the need for -create (replaced by the scan) and for callXref() before refactoring (replaced by scan + incremental reparse of stale files). It also makes the server viable for LSP, where blocking on a full project parse at startup is not acceptable.

8.15.8. Opportunities for Alignment

The Server and Xref paths do similar things but with different names and structures, creating cognitive overhead. Potential improvements:

  • Naming consistency: processFile() (server) vs oneWholeFileProcessing() (xref) could both use consistent naming

  • Eliminate duplication: Fix the double initInput() call in xref path

  • Extract common logic: The inner "process one file" logic should be identical between modes

  • Make differences explicit:

    • Server: for (each request) { process one file }

    • Xref: for (each scheduled file) { process one file }

The paths are intertwined but different, making it hard to keep in your head which one you’re modifying. Making them more similar where possible would reduce cognitive load during refactoring work.

8.16. Parsers

See the Parsing component in the Components chapter.

8.17. Integrated Preprocessor

See the Parsing component’s Internal Structure section in the Components chapter.

8.18. Refactoring and the parsers

Some refactorings need more detailed information about the code, maybe all do?

One example, at least, is parameter manipulation. Then the refactorer calls the appropriate parser (serverEditParseBuffer()) which collects information in the corresponding semantic actions. This information is stored in various global variables, like parameterBeginPosition.

The parser is filling out a ParsedInfo structure which conveys information that can be used e.g. when extracting functions etc.

At this point I don’t understand exactly how this interaction is performed, there seems to be no way to parse only appropriate parts, so the whole file need to be re-parsed.

Findings:

  • some global variables are set as a result of command line and arguments parsing, depending on which "command" the server is acting on

  • the semantic rules in the parser(s) contains code that matches these global variables and then inserts special lexems in the lexem stream

One example is how a Java 'move static method' was performed. It requires a target position. That position is transferred from command line options to global variables. When the Java parser was parsing a class or similar it (or rather the lexer) looks at that "ddtarget position information" and inserts a OL_MARKER_TOKEN in the stream.

TODO: What extra "operation" the parsing should perform and return data for should be packaged into some type of "command" or parameter object that should be passed to the parser, rather than relying on global variables.

8.19. Reading Files

Here are some speculations about how the complex file reading is structured.

Each file is identified by a filenumber, which is an index into the file table, and seems to have a lexBuffer tied to it so that you can just continue from where ever you were. That in turn contains a CharacterBuffer that handles the actual character reading.

And there is also an "editorBuffer"…​

The intricate interactions between these are hard to follow as the code here are littered with short character names which are copies of fields in the structures, and infested with many macros, probably in an ignorant attempt at optimizing. ("The root of all evil is premature optimization" and "Make it work, make it right, make it fast".)

It seems that everything start in initInput() in yylex.c where the only existing call to fillFileDescriptor() is made. But you might wonder why this function does some initial reading, this should be pushed down to the buffers in the file descriptor.

8.19.1. Lexing/scanning

Lexing/scanning is performed in two layers, one in lexer.c which seems to be doing the actual lexing into lexems which are put in a lexembuffer. This contains a sequence of encoded and compressed symbols which first has a LexemCode which is followed by extra data, like Position. These seems to always be added but not always necessary.

The higher level "scanning" is performed, as per ususal, by yylex.c. lexembuffer defines some functions to put and get lexems, chars (identifiers and file names?) as well as integers and positions.

At this point the put/get lexem functions take a pointer to a pointer to chars (which presumably is the lexem stream in the lexembuffer) which it also advances. This requires the caller to manage the LexemBuffer’s internal pointers outside and finally set them right when done.

It would be much better to call the "putLexem()"-functions with a lexemBuffer but there seems to be a few cases where the destination (often dd) is not a lexem stream inside a lexemBuffer. These might be related to macro handling.

This is a work-in-progress. Currently most of the "normal" usages are prepared to use the LexemBuffer’s pointers. But the handling of macros and defines are cases where the lexems are not put in a LexemBuffer. See the TODO.org for current status of this Mikado sequence.

8.19.2. Semantic information

As the refactoring functions need some amount of semantic information, in the sense of information gathered during parsing, this information is collected in various ways when c-xref calls the "sub-task" to do the parsing required.

Two structures hold information about various things, among which are the memory index at certain points of the parsing. Thus it is possible to verify e.g. that a editor region does not cover a break in block or function structure. This structure is, at the point of writing, called parsedInfo and definitely need to be tidied up.

8.20. Reference Database

See the Cxref component (in-memory reference tables and symbol resolution) and the Cxfile component (on-disk database format and reading) in the Components chapter.

The architectural direction for cxfile.c is documented in Chapter 17: Incremental cxfile.c Cleanup. The high-level "Memory as Truth" vision is in Chapter 15: Roadmap.

8.21. Editor Plugin

The editor plugin has three different responsibilities:

  • serve as the UI for the user when interacting with certain c-xref related functions

  • query c-xref server for symbol references and support navigating these in the source

  • initiate source code operations ("refactorings") and execute the resulting edits

Basically Emacs (and probably other editors) starts c-xref in "server-mode" using -server which connects the editor with c-xref through stdout/stdin. If you have (setq c-xref-debug-mode t) this command is logged in the *Messages* buffer with the prefix "calling:".

Commands are sent from the editor to the server on its standard input. They looks very much like normal command line options, and in fact c-xref will parse that input in the same way using the same code. When the editor sends an end-of-options line, the server will start executing whatever was sent, and return some information in the file given as an -o option when the editor starts the c-xref server process. The file is named and created by the editor and usually resides in /tmp. With c-xref-debug-mode set to on this is logged as "sending:". If you (setq c-xref-debug-preserve-tmp-files t) Emacs will also not delete the temporary files it creates so that you can inspect them afterwards.

When the server has finished processing the command and placed the output in the output file it sends a <sync> reply.

The editor can then pick up the result from the output file and do what it needs to do with it ("dispatching:").

8.21.1. Invocations

The editor invokes a new c-xref process for the following cases:

  • Refactoring

    Each refactoring operation calls a new instance of c-xref?

  • Create Project

    When a c-xref function is executed in the editor and there is no project covering that file, an interactive "create project" session is started, which is run by a separate c-xref process.

8.21.2. Buffers

There is some magical editor buffer management happening inside of c-xref which is not clear to me at this point. Basically it looks like the editor-side tries to keep the server in sync with which buffers are opened with what file…​

At this point I suspect that -preload <file1> <file2> means that the editor has saved a copy of <file1> in <file2> and requests the server to set up a "buffer" describing that file and use it instead of the <file1> that recides on disk.

This is essential when doing refactoring since the version of the file most likely only exists in the editor, so the editor has to tell the server the current content somehow, this is the -preload option.

8.22. Editor Server

When serving an editor the c-xrefactory application is divided into the server, c-xref and the editor part, at this point only Emacs:en are supported so that’s implemented in the editor/Emacs-packages.

8.22.1. Interaction

The initial invocation of the edit server creates a process with which communication is over stdin/stdout using a protocol which from the editor is basically a version of the command line options.

When the editor has delivered all information to the server it sends 'end-of-option' as a command and the edit server processes whatever it has and responds with <sync> which means that the editor can fetch the result in the file it named as the output file using the '-o' option.

As long as the communication between the editor and the server is open, the same output file will be used. This makes it hard to catch some interactions, since an editor operation might result in multiple interactions, and the output file is then re-used.

Setting the emacs variable c-xref-debug-mode forces the editor to copy the content of such an output file to a separate temporary file before re-using it.

For some interactions the editor starts a completely new and fresh c-xref process, see below. And actually you can’t do refactorings using the server, they have to be separate calls. (Yes?) I have yet to discover why this design choice was made.

There are many things in the sources that handles refactorings separately, such as refactoring_options, which is a separate copy of the options structure used only when refactoring.

8.22.2. Protocol

Communication between the editor and the server is performed using text through standard input/output to/from c-xref. The protocol is defined in src/protocol.tc and must match editor/emacs/c-xrefprotocol.el.

The definition of the protocol only caters for the server→editor part, the editor→server part consists of command lines resembling the command line options and arguments, and actually is handled by the same code.

The file protocol.tc is included in protocol.h and protocol.c which generates definitions and declarations for the elements through using some macros.

There is a similar structure with c-xrefprotocol.elt which includes protocol.tc to wrap the PROTOCOL_ITEMs into defvars.

There is also some Makefile trickery that ensures that the C and elisp impementations are in sync.

One noteable detail of the protocol is that it carries strings in their native format, utf-8. This means that lengths need to indicate characters not bytes.

8.22.3. Invocation of server

The editor fires up a server and keeps talking over the established channel (elisp function 'c-xref-start-server-process'). This probably puts extra demands on the memory management in the server, since it might need to handle multiple information sets and options (as read from a .cxrefrc-file) for multiple projects simultaneously over a longer period of time. (E.g. if the user enters the editor starting with one project and then continues to work on another then new project options need to be read, and new reference information be generated, read and cached.)

TODO: Figure out and describe how this works by looking at the elisp-sources.

FINDINGS:

  • c-xref-start-server-process in c-xref.el

  • c-xref-send-data-to-running-process in c-xref.el

  • c-xref-server-call-refactoring-task in c-xref.el

8.22.4. Communication Protocol

The editor server is started using the appropriate command line option and then it keeps the communication over stdin/stdout open.

The editor part sends command line options to the server, which looks something like (from the read_xrefs test case):

-encoding=european -olcxpush -urldirect  "-preload" "<file>" "-olmark=0" "-olcursor=6" "<file>" -xrefrc ".c-xrefrc" -p "<project>"
end-of-options

In this case the "-olcxpush" is the operative command which results in the following output

<goto>
 <position-lc line=1 col=4 len=66>CURDIR/single_int1.c</position-lc>
</goto>

As we can see from this interaction, the server will handle (all?) input as a command line and manage the options as if it was a command line invocation.

This explains the intricate interactions between the main program and the option handling.

The reason behind this might be that a user of the editor might be editing files on multiple projects at once, so every interrogation/operation needs to clearly set the context of that operation, which is what a user would do with the command line options.

8.23. Refactoring

See the Refactory and Extract components in the Components chapter for the architecture and operation flow.

8.23.1. Refactoring Protocol Details

The refactoring protocol messages exchanged between the refactory process and the editor:

Rename Example

Invocation: -rfct-rename -renameto=NEW_NAME -olcursor=POSITION FILE

Result: A sequence of precheck/replacement pairs:

<goto>
 <position-off off=POSITION len=N>FILE</position-off>
</goto>
<precheck len=N>STRING</precheck>

followed by:

<goto>
 <position-off off=POSITION len=N>FILE</position-off>
</goto>
<replacement>
 <str len=N>ORIGINAL</str>  <str len=N>REPLACEMENT</str>
</replacement>

Extraction Example

Extraction (-rfct-extract-function) returns an <extraction-dialog> with three <str> parts: the call site replacement, the text to insert before the region (function header), and the text to insert after (closing brace). The editor applies these and then initiates a rename for the new function name.

Protocol Messages

<goto>{position-off}</goto>

Move cursor to the indicated position.

<precheck len={int}>{string}</precheck>

Verify that the text under the cursor matches the string.

<replacement>{str}{str}</replacement>

Replace string1 under cursor with string2.

<position-off off={int} len={int}>{path}</position-off>

Position in file ('off' is character offset).

8.24. Memory handling

c-xrefactory uses custom memory management via arena allocators rather than malloc/free for performance-critical operations.

See the Components chapter for the design and architecture of the Memory module, and the Data Structures chapter for details on the arena allocator data structure and allocation model.

For debugging memory issues, especially arena lifetime violations, see the Development Environment chapter.

8.24.1. The Memory Type

Memory allocation is managed through the Memory structure, which implements an arena/bump allocator. Different memory arenas serve different purposes:

  • cxMemory - Cross-reference database (with overflow handling)

  • ppmMemory - Preprocessor macro expansion

  • macroBodyMemory - Macro definition storage

  • macroArgumentsMemory - Macro argument expansion

  • fileTableMemory - File table entries

See Components chapter for detailed description of each arena’s purpose and lifetime.

8.24.2. Option Memory

The optMemory arena requires special handling because Options structures are copied during operation. When copying, all pointers into option memory must be adjusted to point into the target structure’s memory area, not the source’s.

Functions like copyOptions() perform this pointer adjustment through careful memory arithmetic, traversing a linked list of all memory locations that need updating.

The linked list nodes themselves are allocated in the Options structure’s dynamic memory.

8.25. Configuration

The legacy c-xref normally, in "production", uses a common configuration file in the users home directory, .c-xrefrc. When a new project is defined its options will be stored in this file as a new section.

It is possible to point to a specific configuration file using the command line option -xrefrc which is used extensively in the tests to isolate them from the users configuration.

Each "project" or "section" requires a name of the "project", which is the argument to the -p command line option. And it may contain most other command line options on one line each. These are always read, unless -no-stdop is used, before anything else. This allows for different "default" options for each project.

8.25.1. Options

There are three possible sources for options.

  • Configuration files (~/.c-xrefrc)

  • Command line options at invocation, including server

  • Piped options sent to the server in commands

Not all options are relevant in all cases.

All options sources uses exactly the same format so that the same code for decoding them can be used.

8.25.2. Logic

When the editor has a file open it needs to "belong" to a project. The logic for finding which is very intricate and complicated.

In this code there is also checks for things like if the file is already in the index, if the configuration file has changed since last time, indicating there are scenarios that are more complicated (the server, obviously).

But I also think this code should be simplified a lot.

9. Data Structures

There are a lot of different data structures used in c-xrefactory. This is a first step towards visualising some of them.

9.1. ReferenceableItem and Reference: Core Domain Concepts

These are the fundamental cross-reference data structures that represent the "what" and "where" of code entities.

9.1.1. ReferenceableItem

A ReferenceableItem represents a referenceable entity in the codebase - something that can be referenced from multiple locations:

  • Functions and variables

  • Types (structs, unions, enums, typedefs)

  • Macros

  • Include directives (special case: TypeCppInclude)

  • Yacc non-terminals and rules

Each ReferenceableItem contains:

  • linkName - Fully qualified name (e.g., "MyClass::method")

  • type - What kind of entity (function, variable, type, etc.)

  • storage, scope, visibility - Language properties

  • includeFile - For TypeCppInclude items, which file is being included

  • references - Linked list of all Reference (occurrences) of this entity

ReferenceableItems are stored in the referenceableItemTable (hash table) and persisted to .cx files.

9.1.2. Reference (Occurrence)

A Reference represents a single occurrence of a ReferenceableItem at a specific location:

  • position - File, line, and column where this occurrence appears

  • usage - How it’s used (definition, declaration, usage, etc.)

  • next - Pointer to next occurrence in the list

Each ReferenceableItem maintains a linked list of all its References, allowing you to find every place that entity appears in the codebase.

Note: The term "Reference" in this context means "occurrence" - one specific use of an entity at one location. This is distinct from C++ references or reference semantics.

9.2. Symbol (Parser Symbol Table)

There is also a structure called Symbol. This is separate from ReferenceableItem and serves a different purpose:

Symbol - Parser-level symbol table entry (temporary, exists only during parsing):

  • Used by the C/Yacc parser for semantic analysis

  • Contains type information, position, storage class

  • Lives in symbolTable (hash table) during file parsing

  • Not persisted - discarded after parsing completes

ReferenceableItem - Cross-reference entity (persistent across entire codebase):

  • Created FROM Symbol properties when a referenceable construct is found

  • Stored in referenceableItemTable and saved to .cx files

  • Accumulates References from all files in the project

Relationship: During parsing, when the parser encounters a referenceable symbol (function, variable, etc.), it:

  1. Creates a Symbol in symbolTable for semantic analysis

  2. Creates or finds a ReferenceableItem by copying Symbol properties

  3. Adds a Reference to that ReferenceableItem’s list

  4. Discards the Symbol when parsing completes

This separation allows the parser to maintain its own temporary symbol table while building the persistent cross-reference database.

Symbol structures
Figure 1. Symbol and Reference Data Structures

9.3. Files and Buffers

Many strange things are going on with reading files so that is not completely understood yet.

Here is an initial attempt at illustrating how some of the file and text/lexem buffers are related.

Buffer structures
Figure 2. File Descriptor and Buffer Relationships
It would be nice if the LexemStream structure could point to a LexemBuffer instead of holding separate pointers which are impossible to know what they actually point to…​
This could be achieved if we could remove the CharacterBuffer from LexemBuffer and make that a reference instead of a composition. Then we’d need to add a CharacterBuffer to the structures that has a LexemBuffer as a component (if they use it).

9.4. Modes

c-xrefactory operates in different modes ("regimes" in original c-xref parlance):

  • xref - batch mode reference generation

  • server - editor server

  • refactory - refactory browser

The default mode is "xref". The command line options -server and -refactory selects one of the other modes. Branching is done in the final lines in main().

The code for the modes are intertwined, probably through re-use of already existing functionality when extending to a refactoring browser.

One evidence for this is that the refactory module calls the "main task" as a "sub-task". This forces some intricate fiddling with the options data structure, like copying it. Which I don’t fully understand yet.

TODO?: Strip away the various "regimes" into more separated concerns and handle options differently.

9.5. Options

The Options datastructure is used to collect options from the command line as well as from options/configuration files and piped options from the editor client using process-to-process communication.

It consists of a collection of fields of the types

  • elementary types (bool, int, …​)

  • string (pointers to strings)

  • lists of strings (linked lists of pointers to strings)

9.5.1. Allocation & Copying

Options has its own allocation using optAlloc which allocates in a separate area, currently part of the options structure and utilizing "dynamic allocation" (dm_ functions on the Memory structure).

The Options structure are copied multiple times during a session, both as a backup (savedOptions) and into a separate options structure used by the Refactorer (refactoringOptions).

Since the options memory is then also copied, all pointers into the options memory need to be updated. To be able to do this, the options structure contains lists of addresses that needs to by "shifted".

When an option with a string or string list value is modified the option is registered in either the list of string valued options or the list of string list valued options. When an options structure is copied it must be performed using a deep copy function which "shifts" those options and their values (areas in the options memory) in the copy so that they point into the memory area of the copy, not the original.

After the deep copy the following point into the option memory of the copy

  • the lists of string and string list valued options (option fields)

  • all string and string valued option fields that are used (allocated)

  • all list nodes for the used option (allocated)

  • all list nodes for the string lists (allocated)

9.6. Arena Allocators (Memory)

Arena allocators (also called region-based or bump allocators) are the fundamental memory management strategy used throughout c-xrefactory for performance-critical operations like macro expansion and lexical analysis.

9.6.1. The Memory Structure

typedef struct memory {
    char   *name;              // Arena name for diagnostics
    bool  (*overflowHandler)(int n); // Optional resize handler
    int     index;             // Next allocation offset (bump pointer)
    int     max;               // High-water mark
    size_t  size;              // Total arena size
    char   *area;              // Actual memory region
} Memory;

9.6.2. Allocation Model

Arena allocators use bump pointer allocation:

  1. Allocation: Return &area[index], then index += size

  2. Deallocation: Bulk rollback via FreeUntil(marker)

  3. Reallocation: Only possible for most recent allocation

This is extremely fast (O(1) allocation) but requires stack-like discipline for deallocation.

9.6.3. Stack-Like Discipline

Arenas follow LIFO (last-in-first-out) cleanup:

marker = ppmAllocc(0);        // Save current index
temp1 = ppmAllocc(100);       // Allocate
temp2 = ppmAllocc(200);       // Allocate
// Use temp1, temp2...
ppmFreeUntil(marker);         // Free both temp1 and temp2

9.6.4. Key Constraint: Top-of-Stack Reallocation

Only the most recent allocation can be resized:

buffer = ppmAllocc(1000);     // Allocate buffer
temp = ppmAllocc(500);        // Allocate temporary
ppmReallocc(buffer, ...);     // ❌ FAILS - buffer not at top

This constraint is enforced by guards in memory.c (see Development Environment chapter).

9.6.5. Memory Arena Types

c-xrefactory uses specialized arenas for different purposes (see Components chapter for details):

  • cxMemory - Cross-reference database and symbol tables

  • ppmMemory - Preprocessor macro expansion (temporary)

  • macroBodyMemory - Macro body buffers

  • macroArgumentsMemory - Macro argument expansion

  • fileTableMemory - File table entries

  • optMemory - Option strings (with special pointer adjustment)

9.7. Preload Mechanism

The preload mechanism allows the server to work with editor buffer contents that haven’t been saved to disk. This is essential for providing real-time symbol navigation and completion while the user is actively editing.

9.7.1. How It Works

When an editor buffer is modified but not yet saved:

  1. Editor Action: The Emacs client writes the current buffer content to a temporary file

  2. Server Request: The client sends a request with -preload <filename> <tmpfile> options

  3. Buffer Association: The server creates an EditorBuffer structure linking the on-disk filename to the temporary file containing the actual content

  4. Transparent Parsing: When the server needs to parse the file, it transparently reads from the temporary file instead of the on-disk file

9.7.2. Why It’s Needed

Without preload, the server would only see the last saved version of the file. The preload mechanism ensures that:

  • Symbol navigation works with the current buffer state

  • Completion suggests symbols based on what’s actually typed

  • Refactorings operate on the current code, not stale saved content

  • Users get immediate feedback without having to save constantly

9.7.3. Reference Management

When a file is preloaded, the server must handle reference updates carefully:

  • Old references from the previous file version must be removed from the reference table before parsing

  • This prevents duplicate references (one set at old positions, another at new positions)

  • The removal happens in removeReferencesForFile() when preloaded content is detected

9.8. Browser Stack

The browser stack maintains navigation history for symbol references, allowing users to browse through code by pushing symbol lookups and navigating back through previous queries.

9.8.1. Structure

The browser stack is a linked list of OlcxReferences entries, where each entry represents a symbol lookup session:

  • Stack entries contain complete symbol information and reference lists for one navigation session

  • Top pointer indicates the current active entry being navigated

  • Root pointer tracks the base of the stack (most recent entry still available)

  • Entries between root and top are "future" navigation states that can be returned to via "next"

9.8.2. Lifecycle

  1. Push: When user requests symbol references (e.g., -olcxpush), a new empty entry is created on the stack

  2. Population: After parsing, the entry is filled with BrowsingMenu structures containing references

  3. Navigation: Commands like -olcxnext and -olcxprevious move through references in the current entry

  4. Pop: User can pop back to previous entries to return to earlier symbol lookups

9.8.3. Relationship to Parsing

The browser stack is populated in two stages:

  1. Parse-time: References are collected in the referenceableItemTable during file parsing

  2. Menu Creation: ReferenceableItems are wrapped in BrowsingMenu structures and added to the browser stack entry via putOnLineLoadedReferences()

This separation means that browser stack entries can become stale if files are reparsed (e.g., with preloaded content) without refreshing the stack. Users typically need to pop and re-push to get fresh reference lists after edits.

9.9. Browser Menu

A browser menu is a navigable list of referenceable items with their occurrences, organized for presentation to the user in Emacs. Multiple items may appear in a single menu when name resolution finds several candidates (e.g., symbols with the same name in different scopes).

9.9.1. BrowsingMenu Structure

Each BrowsingMenu entry is a menu item wrapping a ReferenceableItem with UI presentation state:

  • referenceable: The embedded ReferenceableItem (the entity being browsed)

    • Contains linkName, type, storage, scope, visibility

    • Contains the list of references (occurrences) for this item

  • selected: Whether this item is currently selected for operations

  • visible: Whether this item passes current visibility filters

  • defaultPosition: The "best" occurrence to jump to (usually the definition)

  • defaultUsage: The usage type of the default occurrence

  • outOnLine: Display line number in the Emacs menu

  • markers: Editor markers for refactoring operations

  • next: Pointer to next menu item in the list

Key insight: BrowsingMenu is not just a menu - it’s a menu item. A collection of BrowserMenu items forms the actual menu shown to the user.

9.9.2. Multiple Menu Items in One Session

A single browser stack entry (OlcxReferences) can contain multiple BrowsingMenu items:

  • hkSelectedSym: Menu items that matched at the cursor position (after disambiguation)

  • symbolsMenu: Complete menu including related items (same name, similar signatures)

This allows users to:

  • See all candidates when a symbol is ambiguous

  • Navigate between related definitions (different scopes, include files)

  • Select specific items for refactoring operations

9.9.3. Menu Population

Browser menus are populated by scanning the referenceableItemTable:

  1. Symbol lookup: Find all ReferenceableItem entries matching the requested symbol

  2. Menu item creation: Wrap each matching item in a BrowsingMenu structure

  3. Reference collection: References are already in the ReferenceableItem

  4. Sorting and filtering: Order items by relevance and apply visibility filters

  5. Selection: Mark items that best match the cursor context (e.g., same file)

9.10. Putting It All Together: Domain Model Summary

Understanding the complete flow from parsing to browsing:

9.10.1. During Parsing (Building the Cross-Reference Database)

  1. Parser creates Symbols: As the C/Yacc parser processes source code, it creates Symbol entries in symbolTable for semantic analysis

  2. ReferenceableItems are created/found: When encountering referenceable constructs (functions, variables, types, etc.):

    • Create a ReferenceableItem from Symbol properties

    • Check if it already exists in referenceableItemTable

    • If new, add it to the table; if exists, reuse the existing one

  3. References are recorded: Add a Reference (occurrence) to the ReferenceableItem’s list, recording position and usage

  4. Symbols are discarded: After parsing completes, the symbolTable is cleared (Symbols are temporary)

  5. Database is persisted: ReferenceableItems and their References are saved to .cx files

Result: A persistent database mapping each entity (ReferenceableItem) to all its occurrences (References) across the entire codebase.

9.10.2. During Browsing (Interactive Navigation)

  1. User requests symbol info: User places cursor on a symbol and invokes a command (e.g., "push to symbol")

  2. Symbol lookup: Server finds matching ReferenceableItem(s) in referenceableItemTable

  3. BrowsingMenu creation: Each matching ReferenceableItem is wrapped in a BrowserMenu structure

    • Adds UI state (selected, visible, display position)

    • Marks best-fit match (e.g., same file as cursor)

  4. Stack push: BrowsingMenu items are added to the browser stack (OlcxReferences entry)

  5. Display to user: Menu sent to Emacs showing all matching items and their occurrences

  6. Navigation: User can browse through references, select items, invoke refactorings

Result: Interactive navigation through the cross-reference database with selection and filtering.

9.10.3. Key Relationships

Symbol (parser)
    ↓ (creates during parsing)
ReferenceableItem (persistent entity)
    ├─→ references: Reference* (list of occurrences)
    └─→ stored in: referenceableItemTable
         ↓ (wrapped for browsing)
      BrowsingMenu (UI wrapper)
         └─→ stored in: OlcxReferences.symbolsMenu (browser stack)

This architecture separates concerns:

  • Parser symbols - Temporary, for semantic analysis

  • Cross-reference database - Persistent, for finding all uses

  • Browser menus - Presentation layer, for user interaction

10. Algorithms

The code does not always explain the algorithms that it implements. This chapter will ultimately be a description of various algorithms used by c-xrefactory.

10.1. How is an Extract refactoring performed?

The region (mark and point/cursor positions) is sent to the c-xref server in a -refactory -rfct-extract command.

The server parses the relevant file and sets some information that can be used in some prechecks that are then performed, such as structure check, and then the server answers with

<extraction-dialog>
    <str .... /str>
    <str .... /str>
    <str .... /str>
</extraction-dialog>

The first string is the code that will replace the extracted code, such as a call to the extracted function. The second string is the header part that will preceed the extracted code ("preamble"), and the third is then of course any code that needs to go after the extracted code ("postamble").

The actual code in the region is never sent to, or returned from, the server. This is handled completely by the editor extension, and used verbatim (except if it is a macro that is extracted, in which case each line is terminated by the backslash) so no changes to that code can be made.

The pre- and post-ambles might be of varying complexity. E.g. when extracting a macro, the postamble can be completely empty. When extracting a function both may contain code to transfer and restore parameters into local variables to propagate in/out variables as required.

  1. The editor then requests a name from the user that it will use in a rename operation that renames the default named function/macro/variable.

Two-Phase Architecture

Extraction operates in two phases:

  1. Collection Phase (during parsing): Parser semantic hooks track control flow by registering synthetic labels and collecting references within the marked region.

  2. Analysis Phase (after parsing): The extraction module analyzes the collected control flow data, classifies variables by usage patterns, and generates the refactored code (call site, function definition, postamble).

This separation keeps the parser clean and makes extraction logic independently testable.

10.2. How does lexem stream management work?

Lexical analysis uses a stack of LexemStream structures to handle nested macro expansions. The key insight is that the stream type acts as a discriminator for buffer ownership and cleanup strategy.

10.2.1. The Stream Types

typedef enum {
    NORMAL_STREAM,              // File or local buffer
    MACRO_STREAM,               // Macro expansion
    MACRO_ARGUMENT_STREAM,      // Macro argument expansion
} LexemStreamType;
Historically, there was also a CACHED_STREAM type when the caching mechanism was still active. This confirms that stream types are fundamentally about buffer ownership and refill strategy - each type encodes where the buffer came from and how to handle it when exhausted.
NORMAL_STREAM

Buffer from file’s lexemBuffer or a local temporary. Not allocated from arena memory, so no cleanup needed when stream exhausted.

MACRO_STREAM

Buffer allocated from macroBodyMemory arena during macro expansion. Must call mbmFreeUntil(stream.begin) when popping from stack to free the arena allocation.

MACRO_ARGUMENT_STREAM

Buffer allocated from ppmMemory arena during macro argument expansion. Signals END_OF_MACRO_ARGUMENT_EXCEPTION when exhausted (cleanup handled by caller).

10.2.2. The Refill Algorithm

When currentInput runs out of lexems (read >= write), refillInputIfEmpty() uses the stream type to decide what to do:

while (currentInput.read >= currentInput.write) {
    LexemStreamType inputType = currentInput.streamType;

    if (insideMacro()) {  // Stack not empty
        if (inputType == MACRO_ARGUMENT_STREAM) {
            return END_OF_MACRO_ARGUMENT_EXCEPTION;
        }
        // Only free MACRO_STREAM buffers (allocated from macroBodyMemory)
        if (inputType == MACRO_STREAM) {
            mbmFreeUntil(currentInput.begin);
        }
        currentInput = macroInputStack[--macroStackIndex];  // Pop
    }
    else if (inputType == NORMAL_STREAM) {
        // Refill from file
        buildLexemFromCharacters(&currentFile.characterBuffer, ...);
    }
}

10.2.3. Key Invariant

Stream type must match buffer allocation:

  • MACRO_STREAM → buffer allocated from macroBodyMemory

  • NORMAL_STREAM → buffer NOT from macro arenas

  • MACRO_ARGUMENT_STREAM → buffer from ppmMemory

Violating this invariant causes fatal errors when trying to free buffers from the wrong arena.

10.2.4. Common Bug Pattern

Pushing a NORMAL_STREAM onto the macro stack, then trying to free it as if it were MACRO_STREAM:

// WRONG: Blindly freeing without checking type
mbmFreeUntil(currentInput.begin);  // Fails if currentInput is NORMAL_STREAM!

// CORRECT: Check type first
if (inputType == MACRO_STREAM) {
    mbmFreeUntil(currentInput.begin);
}

10.3. Editor Buffers and Incremental Updates

This section describes how c-xrefactory handles the reality that source code exists in two places: on disk as files, and in memory as editor buffers. It also explains the different update strategies and how references flow through the system.

10.3.1. Editor Buffer Abstraction

The Duality of Source Code

When a user edits code in Emacs, the source code exists in two forms:

  • Disk files: The saved state on the filesystem

  • Editor buffers: The current (possibly unsaved) state in the editor

For code analysis to be useful during active editing, c-xrefactory must treat editor buffers as the source of truth when they exist.

The Preloading Mechanism

When the Emacs client sends a command to the c-xref server (like PUSH or NEXT), it uses the -preload option to transmit modified buffers:

-preload <original-file> <temp-file>

For example:

-olcxnext -olcursor=5 /project/foo.c -preload /project/foo.c /tmp/emacs-xxx.tmp

The process:

  1. Emacs creates a temporary file containing the current buffer content

  2. The temp file’s modification time represents when the buffer was last modified

  3. The server loads this into an EditorBuffer structure

  4. When parsing, the server reads from the temp file (buffer content) instead of the disk file

This ensures that the server analyzes what the user sees in the editor, not the potentially stale disk file.

EditorBuffer Lifecycle

EditorBuffer {
    char *name;              // Original filename
    char *preLoadedFromFile; // Path to temp file with buffer content
    time_t modificationTime; // When buffer was last modified
    ...
}
  • Created by loadAllOpenedEditorBuffers() at the start of each server operation

  • Lives only for the duration of that operation

  • Destroyed by closeAllEditorBuffers() at operation end

10.3.2. Modification Time Tracking

To implement incremental updates, c-xrefactory tracks when each file/buffer was last parsed.

The Fields

Each FileItem in the file table has:

  • lastParsedMtime - The modification time when we last parsed this file (any update mode)

  • lastFullUpdateMtime - The modification time when we last did a FULL update (including header propagation)

These are time_t values (seconds since epoch).

Dual Semantics

The lastParsedMtime field has dual semantics depending on context:

  • For disk files: Stores the file’s mtime when it was parsed

  • For editor buffers: Stores the buffer’s mtime (from the preloaded temp file)

This works because editorFileModificationTime() abstracts over both:

time_t editorFileModificationTime(char *filename) {
    EditorBuffer *buffer = getEditorBuffer(filename);
    if (buffer != NULL && buffer->preLoadedFromFile != NULL) {
        return buffer->modificationTime;  // Buffer time
    } else {
        return fileModificationTime(filename);  // Disk time
    }
}

The abstraction is seamless: code can check "has this file changed?" without caring whether it’s a disk file or editor buffer.

Change Detection

To determine if a file/buffer needs reparsing:

if (editorFileModificationTime(fileItem->name) != fileItem->lastParsedMtime) {
    // File/buffer has changed since we last parsed it
    reparse(fileItem);
    fileItem->lastParsedMtime = editorFileModificationTime(fileItem->name);
}

This pattern appears in:

  • schedulingToUpdate() - Marks files needing update before batch processing

  • processModifiedFilesForNavigation() - Detects modified buffers during navigation

10.3.3. Update Strategies

C-xrefactory has two update strategies that trade off speed against completeness.

Fast Update (Default)

When used:

  • Automatic updates before PUSH operations (if enabled)

  • Explicit -fastupdate command

What it does:

  1. Checks which source files (.c) have changed (compares modification times)

  2. Reparses only those changed source files

  3. Updates the references database

Trade-off:

  • Fast: Only reparses files that actually changed

  • Incomplete: Doesn’t detect when header files change

Example:

foo.h modified → fast update → foo.h not reparsed
foo.c unchanged → foo.c not reparsed
Result: foo.c still has stale information about symbols from foo.h

Full Update

When used:

  • Explicit -update command

  • When -exactpositionresolve is enabled (forces full update)

What it does:

  1. Checks which files (source OR headers) have changed

  2. If a header changed, finds all source files that include it (transitively)

  3. Reparses all affected source files

  4. Updates the references database

The algorithm (makeIncludeClosureOfFilesToUpdate):

  1. Mark all changed files as scheduledToUpdate

  2. For each marked file:

    • Find all files that #include it (by looking up include references)

    • Mark those files as scheduledToUpdate too

  3. Repeat until no new files are added (transitive closure)

  4. Reparse all marked source files

Trade-off:

  • Complete: Catches header changes and propagates to all users

  • Slower: Can trigger reparsing of many source files if a common header changes

Example:

common.h modified → full update → finds 50 files that include it
Result: Reparses all 50 source files to pick up header changes

When Does the Difference Matter?

With modern CPUs and SSDs, the performance difference is often negligible for small to medium projects. The fast update’s header-blindness can lead to subtle bugs where changes don’t propagate. Full update is generally safer and more correct.

10.3.4. Reference Lifecycle

References flow through multiple stages in c-xrefactory, with different storage locations and ownership models at each stage.

Stage 1: Parsing

When a file is parsed:

  1. Symbols are discovered (functions, variables, types, etc.)

  2. For each symbol, a ReferenceableItem is created or looked up in the referenceableItemTable

  3. Each usage of that symbol creates a Reference with a position

  4. The reference is added to the referenceable’s reference list

Storage: cxMemory (a custom arena allocator)

Lifetime: Lives until the next update that reparses that file

Stage 2: The ReferenceableItemTable

The canonical storage for all references.

ReferenceableItem {
    char *linkName;        // Symbol identifier
    Type type;             // Function, variable, macro, etc.
    Reference *references; // Linked list of all uses
    ...
}

Key properties:

  • Allocated in cxMemory

  • Persistent across multiple server operations

  • Updated incrementally as files are reparsed

  • References are not individually free’d - they’re arena-allocated

Stage 3: Session Stacks

When a user performs a PUSH operation (browsing a symbol), a session is created:

SessionStackEntry {
    BrowsingMenu *menu;      // Selected symbols
    Reference *references;   // COPY of references for navigation
    Reference *current;      // Current position in navigation
    ...
}

Key properties:

  • The references list is a copy (via malloc) of references from the referenceableItemTable

  • Each reference is individually allocated with malloc (see addReferenceToList)

  • When the session is destroyed, references are individually free’d (see `freeReferences)

  • Sessions are snapshots in time - they don’t automatically update when the table changes

Why Separate Storage?

Memory ownership:

  • Table references: Arena-allocated, freed in bulk

  • Session references: Individually allocated, individually freed

If sessions pointed directly to table references, we’d have:

  • Dangling pointers when the table is updated

  • Double-free errors when sessions are destroyed

  • Memory corruption from mixed allocation strategies

Snapshots vs. live data:

  • The table is the "live" source of truth

  • Sessions are working copies for a specific browsing operation

  • Users expect their navigation stack to be stable during browsing

The Staleness Problem

The separation causes a problem: session references can become stale.

Scenario:

  1. User PUSHes symbol foo → Session created with references at lines 10, 50, 100

  2. User edits a file, adding lines

  3. User navigates with NEXT → Session still points to lines 10, 50, 100 (wrong!)

Solution: processModifiedFilesForNavigation()

When NEXT/PREVIOUS operations occur, the server:

  1. Detects which editor buffers have changed (modification time check)

  2. Reparses those buffers (updates referenceableItemTable)

  3. Rebuilds the current session’s reference list from the updated table

  4. Preserves the user’s navigation position by index

// Find user's position in old list
int currentIndex = 0;
Reference *ref = session->references;
while (ref != NULL && ref != session->current) {
    currentIndex++;
    ref = ref->next;
}

// Free old list and rebuild from table
freeReferences(session->references);
session->references = NULL;
for (BrowsingMenu *menu = session->menu; menu != NULL; menu = menu->next) {
    if (menu->selected) {
        ReferenceableItem *updatedItem = lookupInTable(&menu->referenceable);
        addReferencesFromFileToList(updatedItem->references, ANY_FILE,
                                    &session->references);
    }
}

// Restore position by index
ref = session->references;
for (int i = 0; i < currentIndex && ref->next != NULL; i++) {
    ref = ref->next;
}
session->current = ref;

This keeps navigation working correctly with live-edited code.

Trade-offs of the Incremental Approach

Advantages:

  • Minimal latency - only reparses changed buffers

  • Uses editor buffer content (user’s current view)

  • Preserves navigation position naturally

Limitations:

  • Like fast update: doesn’t reparse includers of changed headers

  • Only updates the current session (other sessions on the stack remain stale)

  • Only happens during NEXT/PREVIOUS (not other operations)

For typical usage (navigating within files being actively edited), these limitations rarely matter. A fresh PUSH creates a new session with fresh references.

10.3.5. Summary

The key insights:

  1. Editor buffers are the source of truth when they exist (via preloading)

  2. Modification times are tracked uniformly for files and buffers

  3. Fast update trades completeness for speed (doesn’t chase headers)

  4. Full update is more thorough but can reparse many files

  5. References live in two places: canonical table (arena memory) and session copies (malloc)

  6. Sessions are snapshots that can become stale, requiring incremental rebuilding during navigation

10.4. How does …​

TBD.

11. Development Environment

11.1. Developing, here be dragons…​

First the code is terrible, lots of single and double character variables (cc, ccc, ..) and lost of administration on local variables rather than the structures that are actually there. And there are also a lot of macros. Unfortunately macros are hard to refactor to functions. (But I’m making progress…​)

As there is no general way to refactor a macro to a function, various techniques must be applied. I wrote a blog post about one that have been fairly successful.

But actually it’s rather fun to be able to make small changes and see the structure emerge, hone your refactoring and design skills, and working on a project that started 20 years ago which still is valuable, to me, and I hope, to others.

There should probably be a whole section on how to contribute and develop c-xrefactory but until then here’s a short list of what you need:

  • C development environment (GNU/Clang/Make/…​)

  • Unittests are written using Cgreen

  • Clean code and refactoring knowledge (to drive the code to a better and cleaner state)

Helpful would be:

  • Compiler building knowledge (in the general sense, Yacc, but AST:s and symbol table stuff are heavily used)

11.2. Setup

TBD.

11.3. Building

You should be able build c-xref using something like (may have changed over time…​)

cd src
make
make unit
make test

But since the details of the building process are somewhat contrieved and not so easy to see through, here’s the place where that should be described.

One step in the build process was generating initialization information for all the things in standard include files, which of course became very dependent on the system you are running this on. This has now moved into functions inside c-xref itself, like finding DEFINEs and include paths.

The initial recovered c-xrefactory relied on having a working c-xref for the current system. I don’t really know how they managed to do that for all the various systems they were supporting.

Modern thinking is that you should always be able to build from source, so this is something that needed change. We also want to distribute c-xref as an el-get library which requires building from source and should generate a version specific for the current system.

The strategy selected, until some better idea comes along, is to try to build a c-xref.bs, if there isn’t one already, from the sources in the repository and then use that to re-generate the definitions and rebuild a proper c-xref. See Bootstrapping.

We have managed to remove the complete bootstrapping step, so c-xrefactory now builds like any other project.

11.4. Versions

The current sources are in 1.6.X range. This is the same as the orginal xrefactory and probably also the proprietary C++ supporting version.

There is an option, "-xrefactory-II", that might indicate that something was going on. But currently the only difference seems to be if the edit server protocol output is in the form of non-structured fprintf:s or using functions in the ppc-family (either calling ppcGenRecord() or `fprint`ing using some PPC-symbol). This, and hinted to in how the emacs-part starts the server and some initial server option variables in refactory.c, indicates that the communication from the editor and the refactory server is using this. It does not look like this is a forward to next generation attempt.

What we should do is investigate if this switch actually is used anywhere but in the editor server context, and if so, if it can be made the default and the 'non-xrefactory-II' communication removed.

11.5. Coding

11.5.1. Naming

C-xref started (probably) as a cross-referencer for the languages supported (C, Java, C++), orginally had the name "xref" which became "xrefactory" when refactoring support was added. And when Mariàn released a "C only" version in 2009 most of all the "xref" references and names was changed to "c-xref". So, as most software, there is a history and a naming legacy to remember.

11.5.2. Modules and Include Files

The source code for c-xrefactory was using a very old C style with a separate proto.h where all prototypes for all externally visible functions were placed. Definitions are all over the place and it was hard to see where data is actually declared. This must change into module-oriented include-strategy.

Of course this will have to change into the modern x.h/x.c externally visible interface model so that we get clean modules that can be unittested.

The function prototypes have been now moved out to header files for each "module". Some of the types have also done that, but this is still a work in progress.

11.6. Debugging

TBD. Attachning gdb, server-driver…​

yaccp from src/.gdbinit can ease the printing of Yacc semantic data fields…​

A helpful option is the recently added -commandlog=…​ which allows you to capture all command arguments sent to the server/xref process to a file. This makes it possible to capture command sequences and "replay" them. Useful both for debugging and creating tests.

11.6.1. Arena Allocator Lifetime Violations

The preprocessor macro expansion code uses arena allocators (ppmMemory, macroBodyMemory) with stack-like discipline. Arena allocators are fast (pointer bumping) but require careful lifetime management.

The Problem Pattern

Arena allocators can only resize the most recent allocation ("top-of-stack"). A common violation occurs when trying to resize a buffer after other allocations:

buffer = ppmAllocc(size);           // Allocate buffer
marker = ppmAllocc(0);              // Save marker for cleanup
temp = ppmAllocc(tempSize);         // Allocate temporary
ppmReallocc(buffer, newSize, ...);  // ❌ FAILS - buffer not at top!
ppmFreeUntil(marker);               // Free temporaries

The correct pattern frees temporaries before growing the buffer:

buffer = ppmAllocc(size);           // Allocate buffer
marker = ppmAllocc(0);              // Save marker
temp = ppmAllocc(tempSize);         // Allocate temporary (use it)
ppmFreeUntil(marker);               // Free temporaries FIRST
ppmReallocc(buffer, newSize, ...);  // ✅ Works - buffer now at top

Lifetime Violation Guards

The arena allocator includes guards that catch lifetime violations with detailed diagnostics:

Guard 1: Buffer Resize Guard (memory.c in memoryRealloc())

Checks buffer is at top-of-stack before resizing. Provides expected vs actual locations and suggests moving ppmFreeUntil() earlier.

Guard 2: FreeUntil Bounds Guard (memory.c in memoryFreeUntil())

Ensures marker is within valid allocated range. Catches corrupted or wrong-arena markers.

Guard 3: Top-of-Stack Helper (memoryIsAtTop())

Allows explicit verification before operations requiring top-of-stack:

assert(memoryIsAtTop(&ppmMemory, buffer, oldSize));
ppmReallocc(buffer, newSize, sizeof(char), oldSize);

Example: test_collation_long_expansion

This test case triggered the violation guard when macro expansion created a very large output (19 FLAG_STRING invocations). The collate() function was calling ppmFreeUntil() after copyRemainingLexems(), which needed to grow the caller’s buffer.

The fix: move ppmFreeUntil() to before copyRemainingLexems(). By this point, temporary allocations from macro expansion were already used and could be freed, allowing the buffer to become top-of-stack again.

Debugging With Guards

When a guard triggers:

  1. Read the fatal error messages - they explain what went wrong

  2. Look for the assertion location in the stack trace

  3. Check if ppmFreeUntil() is being called too late

  4. Verify buffer growth happens after temporaries are freed

The guards turn subtle crashes into clear diagnostics that point to the fix.

11.7. Testing

11.7.1. Unittests

There are not very many unittests at this point, only covering a quarter of the code. The "units" in this project are unclear and entangled so creating unittests is hard since it was not build to be tested, test driven or even clearly modularized.

All unittests use Cgreen as the unittest framework. If you are unfamiliar with it the most important point is that it can mock functions, so you will find mock implementations of all external functions for a module in a corresponding <module>.mock file.

Many modules are at least under test, meaning there is a <module>_tests.c in the unittest directory. Often only containing an empty test.

11.7.2. Acceptance Tests

In the tests directory you will find tests that exercise the external behaviour of c-xref, "acceptance tests" or "system tests". Some tests actually do only that, they wouldn’t really count as tests as there are no verification except for the code being executed.

There are two basic strategies for the tests:

  • run a c-xref command, catch its output and verify

  • run a series of command using the EDIT_SERVER_DRIVER, collect output and results and verify

Some tests do not even test its output in any meaningful way and only provide coverage.

Some tests do a very bad job at verifying, either because my understanding at that time was very low, or because it is hard to verify the output. E.g. a "test" for generating references might only grepping the CXrefs files for some strings, not verifying that they actually point to the correct place.

Hopefully this will change as the code gets into a better state and the understanding grows.

11.7.3. Test Structure

Tests live in the tests directory and are auto-discovered by name: any directory starting with test_ will be recognized as a test case.

Each test typically includes:

  • source.c (or similar) - the code under test

  • expected - the expected output

  • Makefile - test runner that uses the boilerplate

Most tests use tests/Makefile.boilerplate which provides common macros:

include ../Makefile.boilerplate

$(TEST):
	$(COMMAND) source.c -o output.tmp
	$(NORMALIZE) output.tmp > output
	$(VERIFY)

The key macros are:

$(COMMAND)

Runs c-xref with standard options and the test’s .c-xrefrc

$(NORMALIZE)

Removes timestamps and other variable output

$(VERIFY)

Compares output with expected, removes output on success

When $(VERIFY) passes, the output file is removed. This means you can easily identify failing tests by looking for test directories that still contain an output file. The utils/failing script lists these.

To suspend a test (skip it during test runs), create a .suspended file in the test directory.

11.7.4. General Setup

Since all(?) c-xref operation rely on an options file which must contain absolute file paths (because the server runs as a separate process) it must be generated whenever the tests are to be run in a different location (new clone, test was renamed, …​).

This is performed by using a common template in tests and a target in tests/Maefile.boilerplate.

Each test should have a clean target that removes any temporary and generated files, including the .c-xrefrc file and generated references. This way it is easy to ensure that all tests have updated .c-xrefrc files.

11.7.5. Edit Server Driver Tests

Since many operations are performed from the editor, and the editor starts an "edit server" process, many tests need to emulate this behaviour.

The edit server session is mostly used for navigation. Refactorings are actually performed as separate invocations of c-xref.

In utils there is a server_driver.py script, which will take as input a file containing a sequence of commands. You can use this to start an edit, refactory or reference server session and then feed it with commands in the same fashion as an editor would do. The script also handles the communication through the buffer file (see [Editor Interface](./Design:-Editor-Interface)).

11.7.6. Creating More Edit Server Tests

You can relatively easy re-create a sequence of interactions by using the sandboxed Emacs in tests/sandboxed_emacs.

There are two ways to use it, "make spy" or "make pure". With the "spy" an intermediate spy is injected between the editor and the edit server, capturing the interaction to a file.

With "pure" you just get the editor setup with c-xref-debug-mode and c-xref-debug-preserve-tmp-files on. This means that you can do what ever editor interactions you want and see the communication in the *Messages* buffer. See [Editor Interface](./Design:-Editor-Interface) for details.

Once you have figure out which part of the *Messages* buffer are interesting you can copy that out to a file and run utils/messages2commands.py on it to get a file formatted for input to server_driver.py.

the messages2commands script converts all occurrences of the current directory to CURDIR so it is handy to be in the same directory as the sources when you run the conversion.
the messages2commands script removes any -preload so you need to take care that the positions inside the buffers are not changed between interactions lest the -olcursor and -olmark will be wrong. (You can just undo the change after a refactoring or rename). Of course this also applies if you want to mimic a sequence of refactorings, like the jexercise move method example. Sources will then change so the next refactoring works from content of buffers, so you have to handle this specifically.
-preload is the mechanism where the editor can send modified buffers to c-xref so thay you don’t have to save between refactorings, which is particularly important in the case of extract since the extraction creates a default name which the editor then does a rename of.

11.8. Utilities

11.8.1. Covers

utils/covers.py is a Python script that, in some enviroments, can list which test cases execute a particular line.

This is handy when you want to debug or step through a particular part of the code. Find a test that covers that particular line and run it using the debugger (usually make debug in the test directory).

Synopsis:

covers.py <file> <line>

11.8.2. Sandboxed

utils/sandboxed starts a sandboxed Emacs that uses the current elisp code and the c-xref from src. This allows you to run a test changes without having to polute your own setup.

This actually runs the tests/sandboxed_emacs pure version, which also sets up a completely isolated Emacs environment with its own packages loaded, configuration etc. See below.

Synopsis:

sandboxed

11.9. Debugging the protocol

There is a "pipe spy" in tests/sandboxed_emacs. You can build the spy using

make spy

and then start a sandboxed Emacs which invokes the spy using

make

This Emacs will be sandboxed to use its own .emacs-files and have HOME set to this directory.

The spy will log the communication between Emacs and the real c-xref (src/c-xref) in log files in /tmp.

NOTE that Emacs will invoke several instanced of what it believes is the real c-xref so there will be several log files to inspect.

12. Deployment

TBD.

13. About the Decision Log

13.1. Overview

The decision log documents choices that have shaped the architecture, implementation, and direction of c-xrefactory. Most decisions from the original 1990s development are lost to history, but as they can be deduced from the codebase and commit history, they are being retroactively documented.

All architectural decisions are recorded using the Architecture Decision Record (ADR) format and stored in the adr/ directory. These ADRs are automatically integrated into the Structurizr documentation system.

13.2. Viewing Decision Records

The ADRs can be accessed in several ways:

  1. Via Structurizr: When viewing the Structurizr documentation, navigate to the "Decisions" section to see all ADRs with cross-references and visualizations.

  2. Directly in the repository: Browse the adr/ directory for markdown files containing individual decision records.

  3. Command line: Use ls doc/adr/*.md from the project root to list all decisions.

13.3. Decision Categories

Current ADRs cover several categories:

  • Simplification decisions: Removing unused features (Java support, HTML generation, etc.)

  • Tooling decisions: Choice of ADR format, documentation system

  • Configuration decisions: Automatic config file discovery

  • Format decisions: Reference data storage format

For the complete list of decisions and their rationale, see the ADR directory or the Decisions section in the Structurizr documentation.

13.4. Creating New ADRs

When making significant architectural decisions:

  1. Copy the template from adr/templates/

  2. Number it sequentially (e.g., 0012-description.md)

  3. Fill in the context, decision, and consequences

  4. Commit it alongside the implementation

  5. Reference it in commit messages and pull requests

See ADR-0007 for details on the ADR format and process.

14. Roadmap

This chapter outlines the high-level architectural and feature goals for c-xrefactory. For implementation details, see Chapter 17: Major Codebase Improvements.

14.1. Guiding Principles

  • Incremental improvement: Each step should provide immediate value while moving toward long-term goals

  • Test-driven modernization: Maintain 85%+ test coverage to enable confident refactoring

  • Backward compatibility: Preserve existing Emacs workflows while enabling modern IDE integration

  • Architectural simplification: Replace artificial mode distinctions with unified, smart on-demand behavior

  • Legacy code respect: Work with the existing 1990s codebase thoughtfully, not against it

14.2. Architectural Vision: Memory as Truth

14.2.1. The Goal

In-memory references become the single source of truth. Currently, the system has two sources (in-memory referenceableItemTable and .cx files on disk) with different code paths for navigation vs. refactoring. This creates complexity, bugs, and makes preloaded editor buffers work inconsistently.

14.2.2. Dependency Chain

The following diagram shows how the remaining architectural pieces depend on each other. Work proceeds from bottom to top.

                  Memory is truth
                  (convergence point)
                   /              \
                  v                v
    No callXref in           .cx becomes
    refactoring           startup snapshot
           |                      |
           v                      v
    ADR 22: Lightweight     Index-based sessions
    file structure scan    (keys + positions,
    (replaces -create        not copies)
     and callXref)                |
           |                      |
           v                      v
    ┌─────────────────────────────────────┐
    │          FOUNDATION (done)          │
    │  - ADR-0020: Sync before dispatch   │
    │  - .cx loaded as startup snapshot   │
    │  - Client sends only dirty buffers  │
    │  - Single-project server (ADR-0021) │
    │  - Project-local config (ADR-0005)  │
    │  - Unified startup (shared init)    │
    └─────────────────────────────────────┘

14.2.3. Component Summary

Component Description

Separate sync from dispatch (done)

Server-side restructuring (ADR-0020): the server entry point reparses stale files before dispatching any operation. Pass 1 reparses stale CUs, Pass 2 walks the reverse-include graph (via TypeCppInclude references) to find and reparse CUs that include stale headers. At startup, the server loads the .cx snapshot (loadFileNumbersFromStore), discovers CUs by globbing the project directory, and parses only stale CUs (mtime changed since snapshot). The Emacs client only preloads modified buffers, so unmodified files don’t trigger unnecessary reparsing. Operations can assume the in-memory table is current.

Lightweight file structure scanning

Replace the expensive full-project -create and callXref() with lightweight scanning (ADR-0022): discover CUs by globbing the project directory, extract include structure from #include lines, populate the same TypeCppInclude references that full parsing creates. The scan/mtime-check/incremental-reparse pipeline built for ADR 20 is the same pipeline needed here — just triggered at a different point (server startup or before refactoring, instead of per-request).

Index-based sessions

Session stores symbol key + current position instead of copying all references. Navigation reads directly from in-memory table, eliminating the "stale session copies" problem. Depends on upfront refresh (ADR 20) so that the table is always current when sessions read from it.

No callXref in refactoring

Refactoring uses same in-memory table as navigation, with lightweight scanning replacing callXref(). The existing graduated update strategy in computeUpdateOptionForSymbol() already decides the right scope per refactoring: no update for local symbols, fast update for multi-file globals, full update for symbols in headers. This logic maps directly onto the lightweight scan approach — only the update mechanism changes (scan + incremental reparse instead of callXref()). Include closure (ADR-0013) finds affected files; parse on demand. See Chapter 18: RefactoryMode Internally Calls XrefMode for the current architecture this replaces.

Disk snapshot strategy (read side operational)

The .cx file transitions from a live database to a startup snapshot. The read side is operational: at startup, the server loads the snapshot into the file table and skips CUs whose disk mtime matches the stored mtime. Remaining work: snapshot write strategy (see below) and initial creation without a prior -create. See below for target semantics.

Search reads from in-memory table

OP_SEARCH currently bypasses the in-memory referenceableItemTable entirely — scanForSearch() reads directly from disk .cx files. This is the last operation that treats the disk database as a live query target rather than a startup snapshot. Once memory is truth, search should query the in-memory table with the same pattern matching (shellMatch, substring fitness). This aligns with the Chapter 17 goal of converting scanForSearch to a visitor over in-memory data. No new infrastructure needed — the data is already there after entry refresh.

Snapshot write triggers

The save mechanism exists (saveReferences() in xref.c) but is only called from the legacy -create/callXref() path. It needs to be wired to triggers during normal server operation. Three triggers identified:

All buffers saved: when no editor buffers are modified, the in-memory state is consistent with disk — a clean moment to snapshot. The snapshot must only persist disk-file-derived references (not modified-buffer data), so "all saved" is the natural condition. This is the primary trigger during normal workflow.

Clean exit: the -exit option exists server-side but just calls exit() — no snapshot save. The Emacs client currently never sends -exit on kill-emacs (it sets set-process-query-on-exit-flag nil so the process is just killed). A kill-emacs-hook should send -exit, and the server’s exit handling should save the snapshot.

Project change: c-xref-server-dispatch-project-mismatch (c-xref.el:2414) kills the server process via c-xref-kill-xref-process when the user switches projects. Currently this is an abrupt delete-process — the server gets no chance to save. Should send -exit before killing, or handle SIGHUP.

Dirty kills (crash, kill -9, terminal closed) lose in-memory state — the next cold start reparses from scratch. Acceptable if triggers 1-3 fire during normal workflow.

14.2.4. Incremental Implementation Path

Both navigation and refactoring already use the same in-memory ReferenceableItemTable (see Chapter 18: Unified In-Memory Table Discovery for details). The infrastructure for memory-as-truth is largely already in place:

  • Include structure is already tracked via TypeCppInclude references in the reference table — the same mechanism that full parsing creates. Lightweight scanning (ADR 22) would populate these without full parsing.

  • Staleness detection is already working: mtime checks on editor buffers, entry-point reparse with transitive include-graph walking (ADR 20).

  • Graduated update scope already exists: computeUpdateOptionForSymbol() decides per-refactoring whether no update, fast update, or full update is needed based on symbol visibility and scope.

The path forward is:

  1. Lightweight scanning replaces -create — discover CUs and include structure without full parsing. Populate the same TypeCppInclude references. No disk db write needed.

  2. Same scanning replaces callXref() before refactoring — scan + mtime check + reparse stale files in the include closure. The graduated scope decision (computeUpdateOptionForSymbol) stays the same.

  3. .cx becomes startup snapshot — loaded once at startup for fast warm start, never consulted during operation. The in-memory table is authoritative.

14.2.5. .cx File: Startup Snapshot

Once memory is truth, the .cx file is a startup snapshot — a serialized copy of previously-computed references that avoids a full project parse on server start.

Startup sequence:

  1. Load snapshot from .cx (if it exists)

  2. Compare each entry’s stored mtime against the current disk file

  3. Re-parse any files that changed since the snapshot was written

  4. Client connects, sends preloaded buffers — those are parsed into memory, overriding disk state

After startup, the .cx file is not consulted. All operations read from the in-memory table.

Writing the snapshot:

The snapshot is written from disk-file-derived references only. Modified-buffer data must never be persisted — the snapshot reflects disk state, so that startup validation (step 2) remains correct.

Cold start (no .cx file): All project source files are parsed. This is slower but self-healing — no manual action needed to recover from a missing or corrupt snapshot.

  • ADR-0005: Automatically find config files — enables project discovery by upward search

  • ADR-0013: No archaic extern patterns — limits symbol visibility to include-based, making include closure complete

  • ADR-0014: Adopt on-demand parsing architecture — formalizes this direction

  • ADR-0020: Separate sync from dispatch — implemented, entry-point reparse with include-graph walking

  • ADR-0021: Single-project server policy — foundation for index-based sessions

  • ADR-0022: Lightweight file structure scanning — replaces -create and callXref()

  • Chapter 17: Incremental cxfile.c cleanup, parseBufferUsingServer bridge removal

14.3. Optimization

Once the core architectural path (lightweight scan, entry refresh, memory-as-truth) is functional, targeted optimizations can reduce cold-start latency and improve responsiveness on large projects.

Optimization Description

Header-filtered sibling parsing

Entry refresh pass 3 currently parses all CUs sharing any header with the request file. On large projects this pulls in a large fraction of all CUs (e.g. on ffmpeg, navigating AudioFIRContext from af_afir.c triggers 2481 sibling CUs because of widely-shared utility headers). When the operation is navigation, only CUs sharing the header that defines the navigated symbol are relevant. Filtering to that header reduces the scope dramatically: AudioFIRContext is defined in af_afir.h, which only 3 CUs include — so 2481 → 2 sibling CUs to parse. Challenge: pass 3 runs during sync (before dispatch), but the symbol is only known during the operation. The solution must bridge this gap — e.g. by triggering targeted sibling parsing from the operation after the symbol’s defining header is identified. Measured cold-start PUSH on ffmpeg af_afir.c: scan 2.7s, pass 3 round 1 19s, pass 3 round 2 19s (from <…​> includes discovered by full parse), total 42s. Parsing dominates — the scan is not the bottleneck. Worst case: core API headers like avcodec.h (614 CUs) or internal.h (1171 CUs) — parallel parsing may be needed for these.

Extend scan to capture <…​> includes

The lightweight scan currently only matches #include "…​". Projects like ffmpeg use angle-bracket includes for internal headers (with -I paths), so the scan misses roughly half the include structure. Extending to <…​> requires header-filtered sibling parsing first — without that filter, ubiquitous system headers (string.h, stdlib.h) would cause pass 3 to pull in nearly every CU in the project. Also needs care to avoid recursively scanning system headers outside the project tree.

Lexing cache re-introduction

The lexing cache (pre-tokenized file content) was present in the original codebase but lost during restructuring. Re-enabling it avoids repeated lexing of the same file across multiple parse passes.

Parallel parsing

Even with header-filtered sibling parsing, core API headers in large projects can fan out to hundreds of CUs (e.g. avcodec.h in ffmpeg is directly included by 614 CUs, internal.h by 1171). Parallel parsing of independent CUs could reduce wall-clock time roughly linearly with core count. The main challenges are: the global mutable state throughout the parser and symbol table (would require per-thread working sets with merge), CX memory allocation (currently a single arena), and the file table (shared, needs synchronization or partitioning). This is a significant engineering effort but may be the only way to achieve acceptable latency for navigation targets in widely-included headers.

14.4. Known Limitations

The following are known consequences of the current entry refresh implementation. Each is deliberately deferred because it either has low practical impact or dissolves naturally with planned architectural changes.

Limitation Impact and reasoning Dissolves with

Re-scan doesn’t clear old include refs

When a config change triggers a re-scan, new TypeCppInclude / UsageUsed refs are created but old ones are not removed. For the common case — adding -I flags — this is harmless: old refs are still valid, the scan just adds missing ones. The problematic case is removing or changing flags, where phantom include edges could cause Pass 2 or Pass 3 to chase stale headers, resulting in unnecessary (but not incorrect) reparsing. Low practical impact since mid-session config edits are almost always additive.

Could be fixed independently by clearing outgoing include refs per CU before re-scanning. Not urgent enough to justify the API addition.

Double progress bar on cold start

The lightweight scan only captures #include "…​", not #include <…​>. On the first request, Pass 3 finds siblings via "…​" edges. On the second request, Pass 1 reparses the request CU with the full parser, which discovers <…​> edges too — Pass 3 then finds additional siblings, producing a second progress bar. Both rounds are roughly equal (ffmpeg: ~40s total, ~20s each). Functionally correct — all siblings get parsed, just in two steps.

Extending scan to <…​> (requires header-filtered sibling parsing first, otherwise system headers cause near-full-project explosion)

Pass 3 parses too many siblings

Pass 3 parses all CUs sharing any header with the request file. On large projects with widely-shared utility headers this pulls in a large fraction of all CUs (e.g. ffmpeg aac_ac3_parser.c: 978 siblings via shared headers, but only 7 share the relevant aac_ac3_parser.h). This is correct but wasteful — cold-start latency is dominated by this over-parsing.

Header-filtered sibling parsing: identify the navigated symbol’s defining header during dispatch, then parse only CUs sharing that header. Challenge: Pass 3 runs during sync (before dispatch), so this requires bridging the sync/dispatch boundary.

Stale sessions after POP

Sessions hold copies of references at the time the session was created. Only the top session is refreshed by entry refresh. After editing a header and navigating back (POP), the previous session may show stale positions — e.g. an old location kept in parallel with the new one. Functionally annoying but not data-corrupting.

Index-based sessions (keys + positions instead of copies) — sessions would read directly from the in-memory table, which is always current after entry refresh.

14.5. Feature Vision: Full LSP Support

14.5.1. The Goal

Enable VS Code, Neovim, and other LSP-capable editors to use c-xrefactory’s refactoring and navigation features with full feature parity to the Emacs integration.

14.5.2. Current State

  • Go-to-definition for functions, global variables, types (single file only)

  • Files parsed on didOpen only, populating the global referenceableItemTable

  • No project initialization — no .c-xrefrc, no compiler discovery, no disk db

  • No didChange or didSave handlers

  • Known bug: file table size changes cause wrong positions (hash distribution)

14.5.3. Architectural Insight: LSP Document Events as Entry Refresh

The native Emacs path has a mature sync/execute separation (ADR 20): before each operation, reparse stale files in three passes, then execute with a current in-memory table. The LSP protocol provides the same separation naturally — document lifecycle events (didOpen, didChange, didSave) are sync points, and requests (definition, references) are operations that assume the table is current.

This means the LSP path does not need its own architecture. It can reuse the same initialization and reparsing infrastructure as the native path, triggered by different events.

14.5.4. Dependency Chain

              Multi-file navigation
              (definition, references,
               rename across files)
                      |
                      v
          Use referenceableItemTable
          instead of separate db (done)
             /                \
            v                  v
    Reparse on              Project init
    document events         on LSP startup
    (didOpen/didChange      (find project,
     → parse file,           load .c-xrefrc,
     didSave → pass 2       compiler discovery,
     includer reparse)      disk db + scan)
            |                      |
            v                      v
    ┌─────────────────────────────────────┐
    │          FOUNDATION (done)          │
    │  - parseToCreateReferences()        │
    │  - referenceableItemTable           │
    │  - Lightweight scan (ADR 22)        │
    │  - Entry refresh passes 1-3         │
    │  - initializeProjectContext()       │
    │  - loadFileNumbersFromStore()       │
    └─────────────────────────────────────┘

14.5.5. Component Summary

Component Description

Use referenceableItemTable (step 1, done)

The separate referenceDatabase (reference_database.h/c) has been removed. findDefinition() in lsp_adapter.c now queries referenceableItemTable directly via mapOverReferenceableItemTableWithPointer. The LSP didOpen handler calls parseToCreateReferences(), which populates the same global table used by the native path.

Project initialization on LSP startup (step 2)

On initialize or first didOpen, run the same project setup as the native path: initializeProjectContext(fileName, emptyArgs, emptyArgs)loadFileNumbersFromStore()scanProjectForFilesAndIncludes(). This gives the LSP server: project config from .c-xrefrc, compiler-discovered include paths and predefined macros, the disk db snapshot (references from all previously-parsed files), and include structure from the lightweight scan. The ArgumentsVector parameters accept empty vectors — command-line and piped options are a native-path concern. Care needed: initializeParsingSubsystem() and initializeProjectContext() initialize overlapping state (options, file table); the LSP init sequence must be reconciled so they don’t conflict.

Reparse on document events (step 3)

didOpen and didChange: reparse the changed file via parseToCreateReferences() (already done for didOpen). didSave: trigger pass 2 — walk the reverse-include graph (TypeCppInclude refs) to find and reparse CUs that include the saved file as a header. This is the same reparseStalePreloadedFiles() logic from the native path. The LSP path is strictly better here: changes arrive in real-time instead of being discovered at request time.

Multi-file navigation (step 4)

With steps 1-3, the referenceableItemTable contains references from all project files (disk db snapshot + freshly parsed opens/changes). definition requests query this table — multi-file lookup works without any new query mechanism. Pass 3 (sibling CU parsing) can trigger lazily on the first didOpen in a cold area, or on a definition request that finds no result.

14.5.6. Planned Capabilities

Once the foundation above is in place, the following LSP features become possible:

Feature Enables

Multi-file navigation

Jump to definitions in other files without opening them first

Find all references

Show all usages of a symbol across the project

Hover information

Display type and documentation on mouse hover

Code completion

Context-aware symbol completion

Rename refactoring

Safe rename across all files

Workspace search

Find symbols by name across project

14.5.7. Dependencies

Steps 1-3 above are independent of the "Memory as Truth" convergence — they reuse existing infrastructure as-is. Full feature parity (rename, extract) additionally depends on:

  • Rename refactoring requires the "no callXref" path

  • All features benefit from single-project server simplicity (ADR 21)

14.6. Feature Vision: Modern Refactorings

14.6.1. The Goal

Provide refactoring capabilities that understand C semantics, going beyond what generic text-based tools can offer.

14.6.2. Current Capabilities

  • Rename (symbols, parameters, macros)

  • Extract function/macro/variable

  • Add/delete/reorder parameters

  • Move function between files (Phase 1 MVP)

14.6.3. Planned Improvements

Feature Value

Move function Phase 2

Automatically add function declaration to target header file

Move function Phase 3

Detect and optionally move tightly-coupled static helper functions

Smart include management

Automatically add/remove #include directives based on dependencies

Refactoring preview

Show what will be changed before applying

14.6.4. Details

See Chapter 16: Move Function Between Files for implementation details and phase breakdown.

15. Planned Features

This chapter documents planned user-facing features—new refactorings, navigation capabilities, and editor integrations. These represent functionality users will interact with directly.

For internal architectural improvements and code quality work, see Chapter 17: Major Codebase Improvements.

For detailed specifications of refactoring operations (both existing and planned), see Chapter 19: Refactoring Recipes.

15.1. Move Function Between Files

Status: Phase 1 complete (December 2024)

15.1.1. Use Case

Reorganize code by moving function definitions between source files while maintaining correctness. Essential for refactoring large codebases into better module structure.

Example: Extract a utility function from main.c to utils.c, automatically handling visibility changes and header declarations.

15.1.2. What Works Now

  • Move a function from one source file to another

  • Automatically removes possible static keyword (makes function externally accessible)

  • Adds extern declaration to target file’s header

  • Preserves comments and function decorations

  • Works for both C and Yacc files

15.1.3. Next steps

Remove extern declaration: Remove the moved functions extern declaration from the header file for the source, if the function was not static.

Include management: Automatically add necessary #include directives based on dependencies.

Helper function detection: Identify and optionally move static helper functions that the moved function depends on. Prevents broken builds.

Smarter header placement: Automatically find the right location in header files based on existing declarations and dependencies.

Preview: Show what will be changed before applying the refactoring.

15.2. LSP Integration

Status: EXPERIMENTAL (January 2026)

15.2.1. Use Case

@startuml
Bob -> Alice : hello
@enduml

Language Server Protocol (LSP) integration enables c-xrefactory to work with modern editors and IDEs (VS Code, vim/neovim with LSP plugins, etc.) beyond Emacs. This opens c-xrefactory’s refactoring and navigation capabilities to a much wider audience.

15.2.2. What Works

Go to Definition (textDocument/definition):

  • Functions - Jump to function definition from any call site

  • Global variables - Navigate to variable declarations

  • Types - Find typedef and struct definitions

File Parsing:

  • Files are parsed as they’re opened in the editor

  • Symbols become available for navigation immediately

15.2.3. Current Limitations

Single-file scope: Only symbols in the currently opened file are accessible. To navigate to a function in another file, you must first open that file.

No local variables: Go-to-definition doesn’t work for local variables or function parameters. This works in Emacs mode but requires architectural changes for LSP.

Missing LSP features: Only textDocument/definition is implemented. Coming features:

  • Find all references

  • Hover information

  • Code completion

  • Rename refactoring

  • Organize imports

15.2.4. How to Use

See the README for LSP client configuration examples. Basic setup:

  1. Build c-xrefactory with make

  2. Configure your editor’s LSP client to use c-xref -lsp as the server command

  3. Open C or Yacc files

  4. Use your editor’s "go to definition" command

15.3. Rename function handles expect

Many unittest frameworks have a feature for isolating a unit, often referred to as "mocks", in C the "unit" of mocking/stubbing/doubling is almost always a function. These mocks need to understand how they should respond to a call. As they have no logic they are "programmed" by expressing conditions and responses.

In particular, Cgreen, has an expect() which takes the function name as the first parameter, and then constraints to apply and values to return. This is implemented using CPP macros so the function name is just text, and cannot be detected using normal C-parsing/analysis.

Renaming a function that appears in an expect will not rename that reference. This will usually cause the unittest to fail, because there is no call to that, no longer existin, function.

A handy extension of the "Rename Function" would be to find these using some special magic and replace them too.

15.4. Migrate to project local config

Previously c-xrefactory promoted a user-central config file, ~/.c-xrefrc which contained the config for all projects by containing a section for each project. The "New Project" wizard in the Emacs client created new sections in this file.

Since 1.9 the promoted model is a project local config file, a .c-xrefrc in the root of the project tree. The main advantage is that it can be checked in to repo and it will not contain absolute file paths.

Legacy project configs will continue to work, but a user might want to migrate a project. It’s fairly easy to do by hand, but as a polishing touch, providing a client operation to do that would be nice.

There is a natural trigger for this: since the Emacs client no longer sends -xrefrc and the server’s upward search for .c-xrefrc stops before the home directory, a legacy user with only ~/.c-xrefrc will get a "No project found" prompt. Instead of only offering "Create new project?", the client could check ~/.c-xrefrc for a section matching the current file, and if found, offer to migrate that section to a project-local .c-xrefrc.

15.5. Indexing Log Buffer

Status: Planned

15.5.1. Background

When the server does a cold start (no disk database), it parses all discovered compilation units to populate in-memory references. During this parsing, warnings and errors may occur (e.g., missing include files, syntax errors from missing -D defines). These are expected and harmless — c-xrefactory continues best-effort — but users benefit from seeing them to tune their .c-xrefrc (adding -I paths, -D defines, etc.).

The legacy -create command in Emacs offered "View log file?" after completion, showing a file with all parsing messages. The cold start server path needs an equivalent.

There are also other situations when a re-parse or re-discovery might throw the same kind of errors.

15.5.2. Problem

The current protocol has no non-modal message channel. Every message type either shows a modal dialog (PPC_WARNING, PPC_ERROR, PPC_INFORMATION) requiring "Press a key to continue", or writes a transient line to the minibuffer (PPC_BOTTOM_INFORMATION). Sending per-file warnings during cold start floods the user with one modal per warning, effectively locking up the editor.

15.5.3. Design Options

Option A: New protocol record type (recommended) — Add a PPC_LOG record that the Emacs client silently appends to a c-xref-log buffer. After cold start completes, send a PPC_BOTTOM_INFORMATION saying "Indexing done: N warnings (see c-xref-log)". Clean separation, no modals, inspectable at leisure. Touches protocol definition, C server code, and Emacs client.

Option B: Server-side batching — Collect all warnings during cold start into a single string, send as one PPC_INFORMATION after parsing completes. One modal instead of many. Simpler but still modal.

Option C: Summary only — Send a PPC_BOTTOM_WARNING summary to the minibuffer (e.g., "3 files could not be opened during indexing"). Details go only to the log file. Minimal change but no inspectable buffer in Emacs.

15.6. Browsing includes

15.6.1. Background

The #include preprocessor statments are referenceable items just like typenames and variables. That means it is logical to expect that placing the cursor on an #include line and navigation would follow the same UX as for a variable: PUSH moves cursor to the definition (in this case the file itself) and NEXT to the next "reference", which in this case would be the next #include statement for that file.

This does not happen.

15.6.2. Design

This needs to be investigated and understood before an attempt to design a solution can be made.

16. Major Codebase Improvements

This chapter documents internal architectural changes and technical debt reduction efforts that improve code quality, maintainability, and performance. These are improvements to c-xrefactory’s own implementation, not features users interact with directly.

For planned user-facing features, see Chapter 16: Planned Features.

16.1. Incremental cxfile.c Cleanup

16.1.1. Background

The cxfile module was designed when the disk database was the source of truth. The .cx files were not just a cache — they were the reference database. Operations like "show all references to symbol X" or "find unused symbols" were implemented as table-driven scans over the .cx file format: hash the symbol name to find the right partition, stream through it, apply a per-operation filter callback. This was efficient for the batch cross-referencer that c-xrefactory evolved from, where the workflow was c-xref -createc-xref -update → query the on-disk database.

With the shift toward memory-as-truth (Chapter 15: Roadmap), the in-memory referenceableItemTable is becoming the authoritative source, and .cx files are becoming a startup snapshot. But cxfile.c still carries the old design: it combines disk I/O with operation-specific filtering logic, and several operations still bypass the in-memory table to read from disk directly.

16.1.2. Current Interface

The public interface (cxfile.h) reflects this mixed heritage:

// Generic persistence — correct level
extern bool loadFileNumbersFromStore(void);
extern void ensureReferencesAreLoadedFor(char *symbolName);
extern void saveReferencesToStore(bool updating, char *name);

// Operation-specific scanning — wrong level
extern void scanReferencesToCreateMenu(char *symbolName);
extern void scanForMacroUsage(char *symbolName);
extern void scanForGlobalUnused(char *cxrefFileName);
extern void scanForSearch(char *cxrefFileName);

// Implementation details leaked
extern int cxFileHashNumberForSymbol(char *symbol);
extern void searchSymbolCheckReference(ReferenceableItem *item, Reference *ref);

The four scan* functions read directly from disk .cx files and merge results into the in-memory referenceableItemTable. Each encodes a specific use case (menu creation, macro completion, unused detection, symbol search) that a storage module shouldn’t know about.

The ensureReferencesAreLoadedFor function is the right pattern: load from disk into memory, then let the caller query memory. The scan* functions bypass this by combining load + filter in one step.

16.1.3. What’s Already Changing

With the memory-as-truth direction (Chapter 15), cxfile.c is already shrinking in importance:

  • loadFileNumbersFromStore() — used at startup, stays as-is

  • saveReferencesToStore() — needed for snapshot writes, stays as-is

  • ensureReferencesAreLoadedFor() — called 4 times (server.c, xref.c, move_function.c) to load include refs from disk db; becomes unnecessary once all include refs come from lightweight scan + full parse

  • scanReferencesToCreateMenu() — called from cxref.c for browsing menus; currently reads from disk, should read from in-memory table instead

  • scanForMacroUsage() — called from server.c for macro completion; same issue

  • scanForGlobalUnused() — called from cxref.c; bulk scan, reasonable to keep in cxfile but should use visitor pattern

  • scanForSearch() — called from cxref.c; same as above

16.1.4. Incremental Steps

These are independent improvements, not a phased plan:

Make cxFileHashNumberForSymbol static. It’s a partitioning implementation detail. Only called internally and from one external site that can be rerouted.

Make searchSymbolCheckReference static. Move the search matching logic to the caller (search.c already exists). The storage layer shouldn’t know about search string matching.

Replace scanReferencesToCreateMenu and scanForMacroUsage with in-memory queries. Both load a symbol’s references from disk and then filter them. With entry refresh populating the in-memory table, these should query referenceableItemTable directly — the data is already there. ensureReferencesAreLoadedFor can serve as the fallback if the symbol hasn’t been loaded yet.

Convert scanForGlobalUnused and scanForSearch to visitor pattern. These need to scan all symbols, not just one — a bulk operation that makes sense as a generic scanAllReferences(visitor, context) function. The visitor decides what to do with each item (check for unused, match search pattern). The operation-specific logic moves to the callers.

16.1.5. References

  • src/cxfile.h, src/cxfile.c — current implementation

  • src/search.c — already exists, natural home for search matching logic

  • src/cxref.c — main caller of scan functions (lines 863, 1515, 1709, 1938)

  • src/server.c — calls scanForMacroUsage (line 83) and ensureReferencesAreLoadedFor (lines 238, 330)

16.2. Extract Macro Expansion Module

16.2.1. Problem Statement

The yylex.c file is 2353 lines and combines multiple responsibilities:

  • Lexical analysis and token reading

  • File and buffer management

  • Preprocessor directive processing

  • Macro expansion system (~800 lines)

The macro expansion code is a substantial, cohesive subsystem that would benefit from extraction into its own module. Currently, it’s deeply embedded in yylex.c, making both lexing and macro expansion harder to understand and test in isolation.

16.2.2. Current Architecture

The macro expansion system in yylex.c comprises:

Core Responsibilities (~800 lines)

  • Macro call expansion - Main orchestration (expandMacroCall())

  • Argument processing - Collection and recursive expansion

  • Token collation - ## operator implementation

  • Stringification - # operator implementation

  • Memory management - Separate arenas for macro bodies (MBM) and arguments (PPM)

  • Cyclic detection - Preventing infinite macro recursion

Key State

int macroStackIndex;  // Current macro expansion depth
static LexemStream macroInputStack[MACRO_INPUT_STACK_SIZE];
static Memory macroBodyMemory;      // Long-lived: macro definitions
static Memory macroArgumentsMemory; // Short-lived: expansion temporaries

Memory Lifetime Separation

The system uses two distinct memory arenas with different lifetimes:

  • MBM (Macro Body Memory): Persistent storage for macro definitions throughout compilation

  • PPM (PreProcessor Memory): Temporary storage for expansion, collation, and argument processing

This separation is fundamental and should be preserved in any refactoring.

16.2.3. Proposed Solution

Extract macro expansion into a new module: macroexpansion.c/h

Public Interface

The new module would expose a minimal, focused API:

// Initialization
void initMacroExpansion(void);
int getMacroBodyMemoryIndex(void);
void setMacroBodyMemoryIndex(int index);

// Core expansion
bool expandMacroCall(Symbol *macroSymbol, Position position);
bool insideMacro(void);
int getMacroStackDepth(void);

// Memory allocation (exposed for macro definition processing)
void *macroBodyAlloc(size_t size);
void *macroBodyRealloc(void *ptr, size_t oldSize, size_t newSize);
void *macroArgumentAlloc(size_t size);

Module Boundaries

What moves to macroexpansion.c:

  • Macro call expansion and argument processing

  • Token collation (collate() and helpers)

  • Stringification (macroArgumentsToString())

  • Cyclic call detection

  • MBM/PPM memory management

  • Buffer expansion utilities (expandPreprocessorBufferIfOverflow(), etc.)

What remains in yylex.c:

  • Lexing and file input

  • Preprocessor directive processing (#define, #ifdef, etc.)

  • Include file handling

  • Main yylex() function

  • Macro symbol table operations

Dependencies:

The macro module would depend on:

  • Lexem stream operations (reading/writing)

  • Symbol lookup (findMacroSymbol())

  • Cross-referencing (for collation and expansion references)

  • Current input state (via accessor functions)

16.2.4. Benefits

Architectural

  • Separation of concerns: Lexing vs. preprocessing clearly separated

  • Reduced file size: yylex.c drops from 2353 → ~1550 lines (34% reduction)

  • Testability: Macro expansion can be unit tested independently

  • Clearer ownership: Macro state and memory management centralized

Maintainability

  • Focused modules: Each file has a single, clear purpose

  • Easier reasoning: Macro behavior isolated from lexer concerns

  • Better documentation: Module-level documentation for macro system

Future flexibility

  • Could support different macro systems (C vs. C++)

  • Easier to add macro debugging/tracing

  • Independent optimization of macro expansion

16.2.5. Implementation Strategy

Phase 1: Preparation (Already Complete)

✓ Create LexemBufferDescriptor type for buffer management
✓ Refactor buffer expansion functions to use descriptor
✓ Eliminate return values for size updates

Phase 2: Create Module Structure

  • Create macroexpansion.h with public interface

  • Create macroexpansion.c with initial implementations

  • Move LexemBufferDescriptor to appropriate header

  • Create accessor functions for currentInput state

Phase 3: Incremental Function Migration

Move functions in this order (lowest risk first):

  1. Memory management - MBM/PPM allocation functions

  2. Buffer expansion - expandPreprocessorBufferIfOverflow(), expandMacroBodyBufferIfOverflow()

  3. Support utilities - cyclicCall(), prependMacroInput()

  4. Token processing - collate(), resolveMacroArgument(), etc.

  5. Core expansion - expandMacroCall(), createMacroBodyAsNewStream(), etc.

Phase 4: Integration and Cleanup

  • Update yylex.c to use new interface

  • Run full test suite after each migration step

  • Add focused unit tests for macro expansion

  • Update build system

  • Document the new architecture

16.2.6. Risks and Mitigation

Risk: Complex dependencies

Mitigation:

  • Create clear accessor functions for shared state

  • Use incremental approach - one function group at a time

  • Validate with tests after each step

Risk: Performance overhead

Mitigation:

  • Keep critical functions inline where necessary

  • Profile before/after migration

  • Current code already has abstraction layers

Assessment: Low risk - macro operations are complex enough that function call overhead is negligible

Risk: Breaking existing tests

Mitigation:

  • Run test suite after every migration step

  • Keep interface behavior identical

  • Use compiler to catch interface mismatches

16.2.7. Success Metrics

  • All existing tests pass

  • yylex.c reduced to ~1550 lines

  • New focused tests for macro expansion added

  • No performance regression (< 5% overhead acceptable)

  • Code review confirms improved clarity

16.2.8. Open Questions

  1. Should findMacroSymbol() move to the macro module or stay in yylex.c?

    • It’s used by both lexer (for expansion triggering) and macro module (for nested expansions)

    • Probably belongs in a shared location or as part of symbol table operations

  2. How to handle currentInput global state?

    • Options: Pass explicitly, use accessor functions, or provide context structure

    • Accessor functions likely cleanest: getCurrentInput(), setCurrentInput()

  3. Should we extract preprocessor directives at the same time?

    • No - keep changes focused

    • Could be a future refactoring after macro extraction proves successful

16.2.9. References

  • Current code: src/yylex.c lines 1327-2089 (macro expansion system)

  • Memory management: src/memory.h, src/memory.c

  • Symbol operations: src/symbol.h

  • Related refactoring: [LexemStream API Improvements] addresses buffer management patterns


This refactoring is independent of the LexemStream API improvements but would benefit from them being completed first, as they simplify buffer management patterns throughout the macro expansion code.

16.3. Remove parseBufferUsingServer Bridge

16.3.1. Problem

Refactoring operations that need structural information from parsing (function boundaries, move target validation, parameter positions, etc.) do not call the parser directly. Instead, they go through parseBufferUsingServer() — a function that constructs magic string arguments (like "-olcxmovetarget", "-olcxgetfunctionbounds") and runs a full initServer() + callServer() cycle. This is the internal equivalent of the editor sending a command over the protocol.

// refactory.c — 7 calls like this:
parseBufferUsingServer(refactoringOptions.project, point, NULL, "-olcxpush", NULL);
parseBufferUsingServer(refactoringOptions.project, point, NULL, "-olcxsafetycheck", NULL);
parseBufferUsingServer(refactoringOptions.project, point, mark, "-olcxextract", "-olexmacro");

// parsing.c — 2 bridge calls:
parseBufferUsingServer(options.project, target, NULL, "-olcxmovetarget", NULL);
parseBufferUsingServer(options.project, marker, NULL, "-olcxgetfunctionbounds", NULL);

This makes the control flow circular and hard to follow: a refactoring operation (already running inside the server) constructs fake arguments, re-enters the server, which dispatches the operation, which calls the parser, which produces a side-effect in global state, which the original caller then reads.

16.3.2. What’s Done

A clean parser API in parsing.h/c already exists and handles all call sites in server.c:

  • ParserOperation enum — type-safe operations (PARSE_TO_CREATE_REFERENCES, PARSE_TO_GET_FUNCTION_BOUNDS, PARSE_TO_VALIDATE_MOVE_TARGET, PARSE_TO_EXTRACT, PARSE_TO_TRACK_PARAMETERS, PARSE_TO_COMPLETE)

  • ParsingConfig struct — centralizes what was scattered across global flags (cursorOffset, markOffset, extractMode, targetParameterIndex, etc.)

  • Operation predicatesneedsReferenceAtCursor(), allowsDuplicateReferences() replace implicit per-operation behavior

  • parseToCreateReferences(fileName) — the completed pattern: direct parsing without any bridge, used by entry refresh (ADR 20) and LSP

Two convenience functions have clean APIs but still bridge internally:

  • isValidMoveTarget(target) — sets up ParsingConfig, then calls parseBufferUsingServer with "-olcxmovetarget"

  • getFunctionBoundaries(marker) — sets up ParsingConfig, then calls parseBufferUsingServer with "-olcxgetfunctionbounds"

These demonstrate the migration pattern: the public API is already clean, only the implementation still bridges.

16.3.3. What Remains

9 bridge calls to eliminate (2 in parsing.c, 7 in refactory.c):

Location Magic string Purpose

parsing.c:99

-olcxmovetarget

Validate move target position

parsing.c:124

-olcxgetfunctionbounds

Find function boundaries around cursor

refactory.c:294

-olcxpush (with rename/safety flags)

Push reference for rename safety check

refactory.c:309

-olcxsafetycheck

Verify rename safety

refactory.c:909

-olcxpush (with parameter flags)

Navigate to parameter for add/delete

refactory.c:934

-olcxpush (with parameter flags)

Get parameter coordinates

refactory.c:1252

-olcxextract

Extract function

refactory.c:1256

-olcxextract -olexmacro

Extract macro

refactory.c:1260

-olcxextract -olexvariable

Extract variable

Each call follows the same pattern: set global flags → call parseBufferUsingServer → read results from global state. The migration for each is: set ParsingConfig fields → call parser directly → read results from the same global state.

16.3.4. Migration Pattern

The two parsing.c bridges show the pattern clearly. Today:

/* isValidMoveTarget — bridge still present */
syncParsingConfigFromOptions(options);
parsingConfig.operation = PARSE_TO_VALIDATE_MOVE_TARGET;
parsingConfig.positionOfSelectedReference = makePositionFromEditorMarker(target);
parsedInfo.moveTargetAccepted = false;
parseBufferUsingServer(options.project, target, NULL, "-olcxmovetarget", NULL);
result.valid = parsedInfo.moveTargetAccepted;

The ParsingConfig is already set up — the bridge call is redundant, it just re-enters the server which re-reads the same config. Replacing it with a direct callParser() (similar to parseToCreateReferences) is the straightforward next step.

The refactory.c calls are slightly more involved because they also need buffer preloading (the editor marker’s buffer must be set as input for parsing) and some set additional global flags. But the mechanism is the same.

16.3.5. Independence

This refactoring is independent of the memory-as-truth architecture. The bridge removal is about how refactoring operations invoke parsing internally — not about where references come from or how they’re stored. It can proceed at any time, one call site at a time.

16.3.6. Code Locations

  • src/parsing.h — public API (ParserOperation, ParsingConfig, convenience functions)

  • src/parsing.c — implementation, 2 remaining bridges

  • src/refactory.cparseBufferUsingServer definition (line 199) and 7 call sites

16.4. Remove Hashtab by Migrating to Hashlist

16.4.1. Problem Statement

The codebase has two hash table implementations generated by C macro templates:

  • hashtab (hashtab.th/hashtab.tc): Open addressing with linear probing (HASH_SHIFT = 211). No deletion support. Overflow at ~89% fill triggers a fatal error.

  • hashlist (hashlist.th/hashlist.tc): Chaining with linked lists. Supports deletion (Delete, DeleteExact). No overflow risk.

Only two tables still use hashtab:

Table Module Element Type

fileTable

filetable.c

FileItem

macroArgumentTable

macroargumenttable.c

MacroArgumentTableElement

All other tables (editorBufferTable, symbolTable, referenceableItemTable) already use hashlist.

16.4.2. Why This Matters Now

The lightweight file structure scanning work (ADR 22) exposed hashtab’s limitations:

  • No deletion: We added an isDeleted bitfield to FileItem and a markFileAsDeleted() function as a workaround because hashtab cannot remove entries. With hashlist, actual deletion would be possible.

  • Overflow risk: fileTable is initialized with a fixed size. As projects grow or the lightweight scan discovers more files, the open-addressing table can overflow. Hashlist’s chaining has no such limit.

  • FileItem already has next: The FileItem struct already declares struct fileItem *next — currently ignored by hashtab but exactly what hashlist requires for chaining.

16.4.3. Proposed Change

Migrate fileTable and macroArgumentTable from hashtab to hashlist, then remove hashtab entirely.

fileTable migration:

  • Change #include "hashtab.th" / #include "hashtab.tc" to hashlist equivalents in filetable.c

  • FileItem.next is already present — no struct change needed

  • Replace isDeleted workaround with actual Delete() where appropriate

  • The file number concept (index into array) changes semantics — hashlist doesn’t provide stable array indices. This needs careful thought: file numbers are used pervasively as compact identifiers. Options include maintaining a separate index array alongside the hash, or keeping the array and using hashlist only for lookup.

macroArgumentTable migration:

  • MacroArgumentTableElement currently has no next field — one must be added

  • Simpler table with straightforward usage, lower risk

Remove hashtab:

  • Delete hashtab.th and hashtab.tc

  • One fewer template pattern to understand and maintain

16.4.4. File Number Semantics — The Key Challenge

The file table serves a dual purpose: hash lookup by name AND array indexing by file number. File numbers are used throughout the codebase as compact identifiers (stored in references, positions, sessions, etc.).

Hashtab provides both naturally — the hash slot index is the file number. Hashlist doesn’t — elements live in chains, not at stable array positions.

Possible approaches:

  • Parallel array: Keep a FileItem *filesByNumber[] array for O(1) index lookup, use hashlist for name-based lookup. Two data structures, but each is simple.

  • Sequence number on FileItem: Add an int fileNumber field, allocate sequentially. Hashlist for lookup, linear scan or secondary array for index access.

  • Keep current array, add chaining: Use the existing array as the primary storage, but add chain pointers for collision resolution instead of open-addressing probing. Essentially a custom hybrid.

The parallel array approach is likely cleanest — it separates the two concerns (name lookup vs number lookup) explicitly.

16.4.5. Benefits

  • One hash pattern instead of two: Reduces cognitive load and maintenance surface

  • Real deletion: No more isDeleted workarounds — entries can be properly removed

  • No overflow risk: Chaining grows naturally; no fatal error on high fill

  • Consistent codebase: All hash tables use the same pattern

16.4.6. Notes

  • The isDeleted field and markFileAsDeleted() function were added during TDD of projectstructure.c. They work as an interim solution but add complexity to every iteration over the file table (must check isDeleted).

  • Migration should be done after the lightweight scanning integration is complete, since it touches the same code.

16.5. Split the Editor Module

16.5.1. Problem Statement

The editor.c module (630+ lines) bundles at least four unrelated responsibilities under the misleading name "editor". The name suggests it’s about the external editor (Emacs), but most of the code has nothing to do with editor integration — it’s internal text manipulation infrastructure used by refactoring.

16.5.2. Current Responsibilities

1. Text memory allocator (editorMemory[], allocateNewEditorBufferTextSpace, freeTextSpace)

A power-of-2 block allocator for buffer text content. Custom malloc pool with free lists indexed by size class. No relationship to "editing" — this is a memory management concern.

2. In-buffer text editing (replaceStringInEditorBuffer, moveBlockInEditorBuffer, removeBlanksAtEditorMarker)

The refactoring engine’s hands — insert, delete, replace, and move text within buffers, with undo tracking and marker adjustment. This is the actual "editing" code, but it operates on the server’s internal buffers, not the external editor.

3. Reference/marker coordinate conversion (convertReferencesToEditorMarkers, convertEditorMarkersToReferences)

Bridges two coordinate systems: the reference world (file number, line, column) and the buffer world (buffer pointer, byte offset). Walks buffer text to convert line/col to offset and vice versa. Used by refactoring to go from "where are the references" to "where do I edit".

4. Content buffer lifecycle (loadFileIntoEditorBuffer, loadAllOpenedEditorBuffers, closeAllEditorBuffers, quasiSaveModifiedEditorBuffers, editorFileModificationTime)

Buffer loading, closing, quasi-save (mtime bookkeeping for refactoring), and file-stat wrappers that check buffers before falling back to filesystem. This is the content buffer management that belongs with the ContentBuffer type.

16.5.3. The Naming Problem

The "Editor" prefix pervades the codebase beyond just this module:

  • EditorMarker / EditorMarkerList — positional bookmarks into mutable buffer text

  • EditorRegion — a span between two markers

  • EditorUndo — undo records for text operations

None of these are about the external editor. They’re the server’s internal text manipulation primitives. TextMarker, TextRegion, TextUndo would be more accurate names, but renaming has cascading effects throughout the codebase.

16.5.4. Proposed Split

New module Responsibility Current functions

textalloc.c

Power-of-2 block allocator for buffer text

allocateNewEditorBufferTextSpace, freeTextSpace, editorMemory[]

textedit.c

In-buffer text manipulation with undo and marker tracking

replaceStringInEditorBuffer, moveBlockInEditorBuffer, removeBlanksAtEditorMarker

markerconvert.c

Reference ↔ marker coordinate conversion

convertReferencesToEditorMarkers, convertEditorMarkersToReferences

contentbuffer.c (existing)

Buffer lifecycle, loading, closing, quasi-save

loadFileIntoEditorBuffer, loadAllOpenedEditorBuffers, closeAll*, quasiSave*, editorFile*

editorInit() would move to wherever initialization belongs (it currently just calls initEditorBufferTable()).

editorMapOnNonExistantFiles is a completion helper that iterates all buffers to find files existing in buffers but not on disk. It belongs with completion or content buffer management.

16.5.5. Notes

  • The split is purely mechanical — no behavior change, no renaming. Each group of functions has minimal coupling to the others.

  • The misleading "Editor" prefix on types (EditorMarker, EditorUndo, etc.) is a separate concern. Renaming those types has large cascading effects and is not worth doing until the refactoring performance issue is resolved (currently each rename requires two full project parses via callXref).

  • editor.h currently re-exports headers for contentbuffer.h, editormarker.h, undo.h — after the split, callers would include the specific headers they need.

16.6. Rename Server Operations

16.6.1. Problem Statement

Server operations use the prefix olcx (likely "on-line cross-references"), a legacy abbreviation that conveys no meaning to anyone reading the code or protocol today. Option names like -olcxpush, -olcxtagsearch, -olcxgetprojectname are opaque compared to what they could be: -push, -search, -get-project-name.

The word "tag" in search-related names (-olcxtagsearch, c-xref-search-in-tag-file, c-xref-tag-results-buffer) is a leftover from ctags and friends. There is no ctags compatibility — the term is purely misleading. The two menu entries "Search Definition in Tags" and "Search Symbol" both trigger the same OP_SEARCH operation, differing only in a filter flag (-searchdef). The word "tag" suggests they are fundamentally different, when they are not.

16.6.2. Scope

The rename touches three layers:

Protocol options (server C code + Emacs client): ~60 operations prefixed with -olcx* in options.c. These become plain descriptive names: -push, -pop, -goto, -complete, -search, -rename, -extract, -filter, -get-project-name, etc.

Emacs function and variable names: Functions like c-xref-search-in-tag-file, c-xref-get-tags, c-xref-interactive-tag-search-* and variables like c-xref-tag-results-buffer, c-xref-tag-search-mode-map.

System tests: Every commands.input file uses these option names. Mechanical update.

16.6.3. Design Decisions

Shortest clear name: Prefer -push over -browse-push. Stack operations (push, pop) are domain concepts — without understanding them you can’t understand browsing anyway. No grouping prefix needed.

Two kinds of filter: The browsing UI has two filter concepts that should remain distinct:

  • Reference filter (-filter): narrows the reference list for the selected symbol by usage kind (all → exclude reads → definitions only)

  • Menu filter (-menu-filter): narrows the symbol menu when a name is ambiguous (all with name → exact match → exact match same file)

These could become -reference-filter and -symbol-filter for extra clarity, but -filter and -menu-filter are adequate.

No compatibility shim needed: Distribution is via git repo updates, and the Emacs upgrade menu entry kills the server, reloads elisp, and starts a new server on the next operation. Old and new never need to coexist.

16.6.4. Implementation

Support both old and new names in the option parser temporarily (two strcmp entries per option). Rename everywhere else (client, tests, docs). Remove the old names in a later commit. This eliminates any breakage window.

16.6.5. Priority

Low urgency, high value for readability. Can be done incrementally — start with the "tag" removal (search-related names only), then expand to the full olcx rename later.

17. Insights

This chapter contains notes of all insights, large and small, that I make as I work on this project. These insights should at some point be moved to some other, more structured, part of this document. But rather than trying to find a structure where each new finding fits, I’m making it easy to just dump them here. We can refactor these into a better and better structure as we go.

17.1. Yacc semantic data

As per usual a Yacc grammar requires each non-terminal to have a type. Those types are named after which types of data they collect and propagate. The names always starts with ast_ and then comes the data type. For example if some non-terminal needs to propagate a Symbol and a Position that structure would be called ast_symbolParameterPair ("Pair" being thrown in there for good measure…​).

Each of those structures also always carries a begin and end position for that structure. That means that any "ast" struct has three fields, begin, end and the data. The data are sometimes a struct, like in this case, but can also be a single value, like an int or a pointer to a Symbol.

ast

17.2. Navigation Architecture and Preloading

Date: 2025-12-22, updated 2026-02-21

17.2.1. How Symbol Navigation Works

Symbol navigation (PUSH/NEXT/PREVIOUS/POP) merges references from two sources:

  1. Disk CXrefs Database - Reflects saved files on disk

  2. In-Memory ReferenceableItem Table - Reflects current server state including preloaded editor buffers

When you PUSH on a symbol, the navigation menu creation (createSelectionMenuForOperation) does:

  1. Load from disk - Scans CXrefs files for the symbol (via scanReferencesToCreateMenu)

    • Creates menu items with disk-based line numbers

  2. Merge from memory - Maps over the in-memory table (via putOnLineLoadedReferences)

    • Adds references from parsed/preloaded buffers with current line numbers

    • Duplicate detection prevents the same reference appearing twice

  3. Build session - Copies merged references to the navigation session

This dual-source approach allows navigation without full project parse while providing updated positions for modified files.

Client preloading behavior and stale file detection are now documented in Chapter 8: Components under Server (Entry-Point Reparse) and Editor Extension (Preloading).

17.2.2. Architectural Invariants

MUST be maintained:

  • Disk CXrefs = State of files on disk (from last tags generation)

  • ReferenceableItem Table = Disk state + preloaded editor buffers

  • Session references = Snapshot at PUSH time, refreshed on stale detection during NEXT/PREVIOUS

17.3. RefactoryMode Internally Calls XrefMode

Date: 2025-01-10

This section describes the refactoring architecture. The callXref() pattern described here is still used by refactoring operations. However, the server mode description below is partially outdated — the server now reparses stale preloaded files at entry and parses discovered CUs at startup (ADR-0020). See Chapter 15: Roadmap for the path toward eliminating callXref() from refactoring as well.

17.3.1. The Key Architectural Insight (Current Architecture)

RefactoryMode runs XrefMode as an internal sub-task to update the reference database before performing refactorings.

This is critical to understand because:

  • Refactorings need current cross-reference data to find all symbol occurrences

  • Server mode never creates/updates references (it only serves queries)

  • RefactoryMode is a separate process invocation, not part of the interactive server

17.3.2. How It Works

When a refactoring is invoked:

refactory() [refactory.c:1337]
  │
  ├─ 1. Compute update strategy based on symbol scope/visibility
  │    updateOption = computeUpdateOptionForSymbol(point)
  │      • Local symbols: "" (no update - not in database)
  │      • Header symbols: "-update" (full update required)
  │      • Multi-file global: "-fastupdate" (incremental)
  │      • Single-file global: "" (no update needed)
  │
  ├─ 2. Perform the refactoring operation
  │    renameAtPoint() / parameterManipulation() / etc.
  │
  └─ 3. Update references via internal XrefMode call
       ensureReferencesAreUpdated(project) [line 146]
         ├─ Save current options
         ├─ Build argument vector with updateOption
         ├─ mainTaskEntryInitialisations(args)
         └─ callXref(args, true)  ← NESTS XREF MODE!

17.3.3. Why Refactorings Are Separate Processes

The Emacs client spawns a separate RefactoryMode process rather than using the interactive server because:

  1. Xref update can be slow - Would block the interactive server for user operations

  2. Options isolation - The nested callXref() "messes all static variables including options" (code comment, line 143-145)

  3. Memory requirements - Refactorings need more memory than interactive operations (see mainTaskEntryInitialisations allocation logic)

17.3.4. The Three Modes Relationship

ServerMode (c-xref -server)
  - Loads .cx snapshot at startup, parses stale CUs
  - Reparses stale preloaded files before each request (ADR 20)
  - Long-running interactive process
  - Handles: completion, goto-definition, push/pop navigation

XrefMode (c-xref -create / -update)
  - Creates and updates .cx reference files
  - Batch operation over all scheduled files
  - Exits when complete

RefactoryMode (c-xref -refactory ...)
  - Separate one-shot process per refactoring
  - INTERNALLY calls XrefMode (via callXref) to update references
  - Applies source code edits
  - Exits when complete

17.3.5. Code Evidence

// From refactory.c:143-145
// be very careful when calling this function as it is messing all static variables
// including options, ...
// call to this function MUST be followed by a pushing action, to refresh options
void ensureReferencesAreUpdated(char *project) {
    // ...
    deepCopyOptionsFromTo(&options, &savedOptions);

    // Build xref arguments including update option
    argumentVector[argumentCount++] = updateOption;

    // Re-initialize as if starting fresh xref task
    mainTaskEntryInitialisations(args);

    // THE KEY CALL: Run XrefMode nested inside RefactoryMode
    callXref(args, true);

    // Restore options after the nested task
    deepCopyOptionsFromTo(&savedOptions, &options);
}

17.3.6. Implications

  • You cannot do refactorings in ServerMode - the architecture doesn’t support it

  • ServerMode never receives -update or -create - those are XrefMode/RefactoryMode only

  • Refactorings are always consistent - they get fresh reference data before executing

  • Performance trade-off - Smart update strategies (local vs header vs multi-file) minimize update cost

  • Process isolation - Separate RefactoryMode process prevents server state corruption

17.4. The lastParsedMtime Optimization and Refactoring Conflict

Date: 2025-01-21

17.4.1. Problem Discovery

When implementing staleness detection for navigation (refreshing references when files are modified), we added an optimization to server.c:

// After parsing with preload, update lastParsedMtime
if (buffer != NULL && buffer->preLoadedFromFile != NULL) {
    FileItem *fileItem = getFileItemWithFileNumber(parsingConfig.fileNumber);
    fileItem->lastParsedMtime = buffer->modificationTime;
}

This optimization prevents navigation from thinking a file is "stale" immediately after parsing it with preloaded content.

17.4.2. The Conflict

This broke refactoring operations (test_delete_parameter_with_preload). The problem:

  1. Server parses with preload → In-memory references have correct positions (e.g., line 3)

  2. lastParsedMtime updated → File marked as "freshly parsed"

  3. Refactoring triggers callXref() → xref checks timestamps

  4. xref sees editorFileModificationTime() == lastParsedMtime → Concludes "file already indexed"

  5. xref skips re-indexing → Persistent .cx keeps old positions (e.g., line 2)

  6. Refactoring uses stale .cx data → Wrong positions, operation fails

17.4.3. Root Cause: Two Sources of Truth

The architecture has two sources of references:

  1. In-memory references (ReferenceableItemTable) - populated by parsing, used by navigation

  2. Persistent .cx database - populated by xref, used by refactoring

The lastParsedMtime optimization conflated these: * It correctly told navigation "in-memory is current" * But incorrectly told xref "persistent database is current" (when it wasn’t)

17.4.4. The Workaround

Exclude refactoring operations from the optimization:

if (buffer != NULL && buffer->preLoadedFromFile != NULL
    && options.serverOperation != OP_RENAME
    && options.serverOperation != OP_ARGUMENT_MANIPULATION
    && options.serverOperation != OP_SAFETY_CHECK) {
    fileItem->lastParsedMtime = buffer->modificationTime;
}

This allows: * Navigation to use the optimization (in-memory is sufficient) * Refactoring to trigger xref re-indexing (reads preload, updates .cx correctly)

17.4.5. Architectural Lesson

This workaround highlights the fundamental tension: navigation and refactoring use different sources of truth. The proper fix is ADR-0014's unified on-demand architecture where both use the same in-memory reference database, eliminating the need for callXref() entirely.

  • ADR-0014: Adopt On-Demand Parsing Architecture

  • Section 17.2: Unified Symbol Database Architecture

  • Section 18.3: RefactoryMode Internally Calls XrefMode (describes current architecture)

17.5. Stale File Refresh and Cross-File References

Date: 2025-01-29 (Updated: 2026-01-29)

17.5.1. Stale Detection

When navigating (PUSH/NEXT), the server detects if the current file is "stale" by comparing:

  • fileItem→lastParsedMtime (from disk database or last parse)

  • buffer→modificationTime (from preloaded editor buffer)

If buffer→modificationTime > lastParsedMtime, the file is considered stale and refreshStaleReferencesInSession() is called.

17.5.2. The Refresh Algorithm

refreshStaleReferencesInSession() in cxref.c must preserve cross-file references while updating references from the modified file:

  1. removeReferenceableItemsForFile() - remove old refs from in-memory table

  2. parseToCreateReferences() - re-parse with preload, add fresh refs to in-memory table

  3. Remove stale-file refs from menu - preserves cross-file refs, removes outdated positions

  4. extendBrowsingMenuWithReferences() - merge fresh refs from in-memory table

  5. Mark file as freshly parsed (lastParsedMtime = buffer→modificationTime)

  6. recomputeSelectedReferenceable() - rebuild session’s reference list from updated menu

Key insights:

  • The menu already has cross-file refs from the original PUSH operation (which scanned disk)

  • We do NOT clear menu refs and re-scan from disk during refresh

  • Scanning from disk creates problems: addReferenceableToBrowsingMenu creates NEW menu items when includeFileNumber differs, and these new items have selected=false

  • Only SELECTED menu items contribute refs to the session’s reference list

  • So scanning would add refs to new (unselected) items, leaving the original (selected) item empty

17.5.3. Why Scanning From Disk Fails

During PUSH, createSelectionMenu creates menu items and adds them to sessionEntry→menu. These items are marked as selected. When you NEXT, the menu’s reference list is what gets used for navigation (via recomputeSelectedReferenceable).

If during refresh we: 1. Clear menu refs 2. Scan from disk

The scan calls createSelectionMenu which in turn calls addReferenceableToBrowsingMenu. This function compares incoming items by (linkName, includeFileNumber, type). If the includeFileNumber from the disk scan differs from the existing menu item’s value, a new menu item is created with selected=false.

The refs are then added to this new unselected item, while the original selected item has empty refs. recomputeSelectedReferenceable only processes selected items, so navigation loses most of its references.

17.5.4. Architectural Note

This is a manifestation of the "two sources of truth" architecture. The long-term fix (per roadmap) is to parse all project files at startup, keeping complete references in memory. Then stale refresh would just re-parse the single file and the in-memory table would already have all cross-file references.

17.6. Unified In-Memory Table Discovery

Date: 2025-01-25

Investigation revealed that both XrefMode and ServerMode use the same in-memory ReferenceableItemTable. This is the key enabler for the incremental path to "Memory as Truth" (see Chapter 15: Roadmap).

handleFoundSymbolReference() in cxref.c handles references found during parsing. It calls isMemberInReferenceableItemTable() to check if a symbol exists; if not found, creates a new entry; if found, adds a reference to the existing entry. This same table is used by both modes — there is no separate "server table" vs "xref table".

The disk scan operations (ensureReferencesAreLoadedFor, scanReferencesToCreateMenu, etc.) read from .cx files on disk and also call isMemberInReferenceableItemTable() to check/merge with existing entries. If parsing has already populated the table, these operations find data already present — making them effective no-ops once all files are parsed.

18. Refactoring Recipes

This chapter documents mechanical steps for refactoring operations. Each recipe describes the algorithmic steps that an automated refactoring tool would perform.

For detailed discussions of refactoring feature architecture and implementation phases, see Chapter 16a: Planned Refactoring Features.

18.1. Existing Refactorings

Refactorings that are already implemented and available.

18.1.1. Rename Symbol

Implemented. TBD.

18.1.2. Extract Function

Implemented. TBD.

18.1.3. Reorder Parameters

Implemented. TBD.

18.1.4. Make Function Static

Purpose: Convert functions that are only used within their compilation unit to static storage class for better encapsulation and compiler optimization.

When to use:

  • Function has external linkage but no external callers

  • Want to make implementation details explicit

  • Enable compiler optimizations and reduce global namespace pollution

Input:

  • Non-static function definition

  • All references to that function

Availability:

When cursor is on a non-static function definition where all callers are in the same file.

Algorithm:

  1. Check current storage class - Skip if already static

  2. Find definition and all references

    • Locate function definition (not declarations)

    • Collect all call sites across project

  3. Verify all references are local

    • For each reference (excluding definition itself):

      • Check if in same file as definition

      • If any reference is in different file, abort

    • Check if declared in header files (public API), abort if yes

  4. Apply transformation

    • Find beginning of function definition

    • Insert "static " before return type

Output:

  • Function marked as static

  • Compiler can optimize more aggressively

  • Clear signal that function is internal

Example:

// Before - helper function with external linkage
int helperCompare(const void *a, const void *b) {
    return *(int*)a - *(int*)b;
}

void publicSort(int *array, size_t n) {
    qsort(array, n, sizeof(int), helperCompare);
}

// After - explicitly internal
static int helperCompare(const void *a, const void *b) {
    return *(int*)a - *(int*)b;
}

void publicSort(int *array, size_t n) {
    qsort(array, n, sizeof(int), helperCompare);
}

Benefits:

  • Better encapsulation and code clarity

  • Enables inlining and other compiler optimizations

  • Smaller symbol tables, no name collisions

  • Safe to refactor (can’t break external code)

Notes:

  • Similar to "Unused Symbols" detection but finds LOCAL-ONLY usage instead of NO usage

  • Cannot handle functions used via function pointers passed externally (requires manual verification)

18.2. Suggested Refactorings

Refactorings that have been proposed, designed, or partially implemented but are not yet available.

18.2.1. Move Type to New File

Input:

  • Type name to move

  • Source file containing type definition

  • Target file (new or existing)

Algorithm:

  1. Availability

    • Available when the selected symbol is a type, that symbol is the type to move

  2. Identify dependencies

    • Determine what types/macros the definition references/uses

  3. Create/update target file:

    • If new file: create with include guards and appropriate includes/forward declarations

    • If existing: open the file and find a suitable insertion location

  4. Move definition

    • Copy type definition to target file

    • Add necessary includes and forward declarations

  5. Replace in source file:

    • If target is new file: Replace type definition with #include "targetfile.h"

    • If target is existing file: Remove type definition, add #include "targetfile.h" if not already present in the source file

Output:

  • Type definition moved to target file

  • Source file includes the target file

  • Clean compilation

Notes:

  • For new header files, steps 3-5 are particularly simple: create the new header with the type and replace the definition in source with an include

  • For existing headers, must check if include is already present before adding

  • Forward declarations (e.g., struct foo;) are sufficient for pointer-only dependencies

  • Full type definitions or includes needed for non-pointer members


18.3. Introduce Semantic Type Aliases

Purpose: Make implicit semantic distinctions explicit by introducing type aliases for a single struct used for multiple purposes.

When to use:

  • A single struct/type is reused for semantically different purposes

  • Different usage contexts use different subsets of fields

  • Want to clarify intent without changing implementation

  • Want to prepare for future type divergence

Input:

  • Original type name (e.g., OlcxReferencesStack)

  • List of semantic contexts where type is used (e.g., "browser", "completion", "retrieval")

  • Target file for type aliases (new or existing header)

Algorithm:

  1. Analyze usage patterns - Identify distinct semantic contexts where type is used

    • Group usage sites by purpose/domain

    • Note which fields are used in each context

    • Verify that contexts are truly semantically different

  2. Create type aliases - In appropriate header file, define semantic aliases: c typedef OriginalType SemanticName1; typedef OriginalType SemanticName2; // etc.

  3. Update structure declarations - Change struct/variable declarations to use semantic types:

    • Data structure fields

    • Global variables

    • Static variables

  4. Update function signatures - Change function parameters to use semantic types:

    • Functions operating on specific context → specific alias

    • Generic functions operating on any context → generic alias (if created)

  5. Update call sites - Verify all usages compile with new types

  6. Verify - Compile to ensure type compatibility

Output:

  • Multiple type aliases for same underlying type

  • Declarations and signatures use semantic types

  • Intent clarified through type system

  • Foundation for future divergence

Example:

Given a "kitchen sink" struct used for three purposes:

// Before - single type for everything
typedef struct OlcxReferencesStack {
    OlcxReferences *top;
    OlcxReferences *root;
} OlcxReferencesStack;

typedef struct SessionData {
    OlcxReferencesStack browserStack;      // Uses: references, symbolsMenu
    OlcxReferencesStack completionsStack;  // Uses: completions
    OlcxReferencesStack retrieverStack;    // Uses: completions
} SessionData;

void pushEmptySession(OlcxReferencesStack *stack);  // Generic

After introducing semantic aliases:

// After - semantic aliases make intent clear
typedef struct OlcxReferencesStack {
    OlcxReferences *top;
    OlcxReferences *root;
} OlcxReferencesStack;

// Semantic aliases
typedef OlcxReferencesStack ReferencesStack;   // Generic
typedef OlcxReferencesStack BrowserStack;      // For navigation
typedef OlcxReferencesStack CompletionStack;   // For completion
typedef OlcxReferencesStack RetrieverStack;    // For search

typedef struct SessionData {
    BrowserStack    browserStack;
    CompletionStack completionsStack;
    RetrieverStack  retrieverStack;
} SessionData;

void pushEmptySession(ReferencesStack *stack);  // Generic operation

Benefits:

  • Intent is immediately clear from type names

  • No runtime or ABI changes (aliases compile to same type)

  • Can add domain-specific operations per type later

  • Enables gradual migration toward separate types if needed

Notes:

  • Particularly useful in C where classes/interfaces are unavailable

  • Type aliases are compile-time only - no runtime overhead

  • Can coexist with original type name during migration

  • Common pattern when refactoring legacy C code


18.3.1. Rename Included File

Purpose: Rename a file appearing in an include and update all the include directives

When to use:

  • A (header) file is inappropriately named

  • In the process of renaming a complete C "module" this is one step (until c-xrefactory can do all of that)

Input:

  • The old and new file names

  • All #include locations for the old file

Availability

When the cursor is on an #include directive. The file it references will be the "source".

Algorithm:

  1. Rename the source file to the destination

  2. Update all include locations

    • This will often include multiple locations

Output:

  • New header file

  • All #include directives updated


18.3.2. Move Function to Different File

See Chapter 16a: Planned Refactoring Features for detailed design and implementation status.

Proposed refactoring to move a function definition from one C source file to another while automatically managing visibility (static vs extern) and potentially adding necessary declarations and includes.

Status: Phase 1 MVP complete.


18.3.3. Turn include guard into pragma once

Tentative.


18.3.4. Change return type

Tentative.

Purpose: Convert functions that are only used within their compilation unit to static storage class for better encapsulation and compiler optimization.

When to use:

  • Function has external linkage but no external callers

  • Want to make implementation details explicit

  • Enable compiler optimizations and reduce global namespace pollution

Input:

  • Non-static function definition

  • All references to that function

Availability:

When cursor is on a non-static function definition where all callers are in the same file.

Algorithm:

  1. Check current storage class - Skip if already static

  2. Find definition and all references

    • Locate function definition (not declarations)

    • Collect all call sites across project

  3. Verify all references are local

    • For each reference (excluding definition itself):

      • Check if in same file as definition

      • If any reference is in different file, abort

    • Check if declared in header files (public API), abort if yes

  4. Apply transformation

    • Find beginning of function definition

    • Insert "static " before return type

Output:

  • Function marked as static

  • Compiler can optimize more aggressively

  • Clear signal that function is internal

Example:

// Before - helper function with external linkage
int helperCompare(const void *a, const void *b) {
    return *(int*)a - *(int*)b;
}

void publicSort(int *array, size_t n) {
    qsort(array, n, sizeof(int), helperCompare);
}

// After - explicitly internal
static int helperCompare(const void *a, const void *b) {
    return *(int*)a - *(int*)b;
}

void publicSort(int *array, size_t n) {
    qsort(array, n, sizeof(int), helperCompare);
}

Benefits:

  • Better encapsulation and code clarity

  • Enables inlining and other compiler optimizations

  • Smaller symbol tables, no name collisions

  • Safe to refactor (can’t break external code)

Notes:

  • Similar to "Unused Symbols" detection but finds LOCAL-ONLY usage instead of NO usage

  • Cannot handle functions used via function pointers passed externally (requires manual verification)

  • Estimated complexity: ~0.3× Move Function Phase 1

19. Archive

In this section you can find some descriptions and saved texts that described how things were before. They are no longer true, since that quirk, magic or bad coding is gone. But it is kept here as an archive for those wanting to do backtracking to original sources.

19.1. Memory strategies

There were a multitude of specialized memory allocation functions. In principle there where two types, static and dynamic. The dynamic could be exteded using a overflow handler.

Also one type had a struct where the actual area was extended beyond the actual struct. This was very confusing…​

19.1.1. Static memory allocation

Static memory (SM_ prefix) are static areas allocated by the compiler which is then indexed using a similarly named index variable (e.g. ftMemory and ftMemoryIndex), something the macros took advantage of. These are

  • ftMemory

  • ppmMemory

  • mbMemory

One special case of static memory also exist:

  • stackMemory - synchronous with program structure and has CodeBlock markers, so there is a special stackMemoryInit() that initializes the outermost CodeBlock

These areas cannot be extended, when it overruns the program stops.

19.2. Trivial Prechecks

The refactorer can call the server using parseBufferUsingServer() and add some extra options (in text form). One example is setMovingPrecheckStandardEnvironment() where it calls the server with -olcxtrivialprecheck.

However parseBufferUsingServer() uses callServer() which never answerEditAction().

In answerEditAction() the call to (unused) olTrivialRefactoringPreCheck() also requires an options.trivialPreCheckCode which is neither send by setMovingPrecheckStandardEnvironment() nor parsed by processOptions().

The only guess I have is that previously all prechecks where handled by the -olcxtrivialprecheck option in calls to the server, and have now moved to their respective refactorings.

This theory should be checked by looking at the original source of the precheck functions and compare that with any possible checks in the corresponding refactoring code.

19.3. Caching System

The caching system described below has been archived as it is no longer part of the current architecture.

The c-xrefactory included a sophisticated caching system that enabled incremental parsing by caching parsed input streams, parser state, and file modification tracking. This optimization allowed for faster re-analysis when only portions of source files had changed. It also allowed the system to detect out-of-memory situations, discard, flush and re-use memory during file processing.

19.3.1. Core Design Principles

Cache Point Model: The system placed strategic snapshots of parser state at external definition boundaries (functions, global variables, etc.). When files were re-processed, the system could validate cache integrity, recover from cache points, and resume parsing only from the first changed definition onward.

Separation of Concerns: Recent refactoring had separated file tracking from cache validation:

  • updateFileModificationTracking() - Updated file timestamps without side effects

  • isFileModifiedSinceCached() - Pure validation function for cache integrity

19.3.2. Key Components

Cache Point Management (caching.c):

  • placeCachePoint(bool) - Placed strategic parser state snapshots

  • recoverFromCache() - Restored parser state from cache points

  • recoverCachePointZero() - Reset to initial cache state

File Modification Tracking:

The FileItem structure maintained multiple timestamp fields for tracking file modification:

struct FileItem {
    time_t lastModified;    // File's actual modification time
    time_t lastInspected;   // When we last checked the file
    // ... other fields
}

Input Stream Caching:

  • cacheInput() - Cached tokenized input from lexer

  • cachingIsActive() - Checked if caching was currently enabled

  • activateCaching() / deactivateCaching() - Controlled caching state

19.3.3. Parser Integration

C Parser Integration: Both C and Yacc parsers placed cache points after each external_definition, but only when not processing include files (includeStack.pointer == 0).

Parser-Specific Behavior:

  • C Parser: Full caching enabled with regular cache point placement

  • Yacc Parser: Explicitly deactivated caching via deactivateCaching() but still placed strategic cache points

  • Include Files: Cache points skipped during include processing

19.3.4. System Dependencies

The caching system was deeply integrated throughout the parsing pipeline:

Component Functions Used Purpose

main.c

initCaching(), activateCaching(), recoverCachePointZero()

Lifecycle control

lexer.c

cacheInput(), cachingIsActive(), deactivateCaching()

Input processing

yylex.c

updateFileModificationTracking()

File tracking

filetable.c

updateFileModificationTracking()

File management

xref.c

recoverFromCache(), recovery functions

Cross-reference coordination

c_parser.y

placeCachePoint()

C grammar integration

yacc_parser.y

deactivateCaching(), placeCachePoint()

Yacc grammar integration

19.3.5. Performance Characteristics

Cache Hit Scenarios:

  1. Full Cache Hit: No file modifications since last parse - parser state recovered from cache point zero with minimal re-processing

  2. Partial Cache Hit: File modified after Nth definition - recovery from cache point N with re-parsing only from point of change onward

  3. Cache Miss: File structure changed or timestamps invalid - full re-parse with new cache points placed

Optimization Benefits:

  • Memory usage scales with number of definitions, not file size

  • File modification checking minimizes unnecessary re-reads

  • Input stream caching reduces lexer overhead

  • Strategic cache point placement enables clean recovery at definition boundaries

19.4. HUGE Memory

Previously a HUGE model was also available (by re-compilation) to reach file numbers, lines and columns above 22 bits. But if you have more than 4 million lines (or columns!) you should probably do something radical before attempting cross referencing and refactoring.

19.5. Bootstrapping

19.5.1. BOOTSTRAP REMOVED!

Once the FILL-macros was removed, we could move the enum-generation to use the actual c-xref. So from now on we build c-xref directly from the sources in the repo. Changes to any enums will trigger a re-generation of the enumTxt-files but since the enumTxt-files are only conversion of enum values to strings any mismatch will not prevent compilation, and it would even be possible to a manual update. This is a big improvement over the previous situation!

19.5.2. FILLs REMOVED!

As indicated in FILL macros the bootstrapping of FILL-macros has finally and fully been removed.

Gone is also the compiler_defines.h, which was just removed without any obvious adverse effects. Maybe that will come back and bite me when we move to more platforms other than linux and MacOS…​

Left is, at this point, only the enumTxt generation, so most of the text below is kept for historical reasons.

19.5.3. Rationale

c-xref uses a load of structures, and lists of them, that need to be created and initialized in a lot of places (such as the parsers). To make this somewhat manageable, c-xref itself parses the strucures and generates macros that can be used to fill them with one call.

c-xref is also bootstrapped into reading in a lot of predefined header files to get system definitions as "preloaded definitions".

Why this pre-loading was necessary, I don’t exactly know. It might be an optimization, or an idea that was born early and then just kept on and on. In any case it creates an extra complexity building and maintaining and to the structure of c-xref.

So this must be removed, see below.

19.5.4. Mechanism

The bootstrapping uses c-xref's own capability to parse C-code and parse those structures and spit out filling macros, and some other stuff.

This is done using options like `-task_regime_generate' which prints a lot of data structures on the standard output which is then fed into generated versions of strFill, strTdef(no longer exists) and enumTxt by the Makefile.

The process starts with building a c-xref.bs executable from checked in sources. This compile uses a BOOTSTRAP define that causes some header files to include pre-generated versions of the generated files (currently strFill.bs.h and enumTxt.bs.h) which should work in all environments.

if you change the name of a field in a structure that is subject to FILL-generation you will need to manually update the strFill.bs.h, but a "make cleaner all" will show you where those are.

After the c-xref.bs has been built, it is used to generate strFill and enumTxt which might include specific structures for the current environment.

HOWEVER: if FILL macros are used for structures which are different on some platforms, say a FILE structure, that FILL macro will have difference number of arguments, so I’m not sure how smart this "smart" generation technique actually is.

TODO: Investigate alternative approaches to this generate "regime", perhaps move to a "class"-oriented structure with initialization functions for each "class" instead of macros.

19.5.5. Compiler defines

In options.h there are a number of definitions which somehow are sent to the compiler/preprocessor or used so that standard settings are the same as if a program will be compiled using the standard compiler on the platform. At this point I don’t know exactly how this conversion from C declarations to compile time definitions is done, maybe just entered as symbols in one of the many symboltables?

Typical examples include "__linux" but also on some platforms things like "fpos_t=long".

I’ve implemented a mechanism that uses "gcc -E -mD" to print out and catch all compiler defines in compiler_defines.h. This was necessary because of such definitions on Darwin which where not in the "pre-programmed" ones.

TODO?: As this is a more general approach it should possibly completely replace the "programmed" ones in options.c?

19.5.6. EnumTxt generation REMOVED!

To be able to print the string values of enums the module generate.c (called when regime was RegimeGenerate) could also generate string arrays for all enums. By replacing that with some pre-processor magic for the few that was actually needed (mostly in log_trace() calls) we could do away with that whole "generate" functionality too.

19.5.7. enumTxt

For some cases the string representing the value of an Enum is needed. c-xref handles this using the "usual" 'parse code and generate' method. The module generate.c does this generation too.

19.5.8. Include paths

Also in options.h some standard-like include paths are added, but there is a better attempt in getAndProcessGccOptions() which uses the compiler/preprocessor itself to figure out those paths.

TODO?: This is much better and should really be the only way, I think.

19.5.9. Problems

Since at bootstrap there must exist FILL-macros with the correct field names this strategy is an obstacle to cleaning up the code since every field is referenced in the FILL macros. When a field (in a structure which are filled using the FILL macro) changes name, this will make initial compilation impossible until the names of that field is also changed in the strFill.bs.h file.

One way to handle this is of course to use c-xrefactory itself and rename fields. This requires that the project settings also include a pass with BOOTSTRAP set, which it does.

19.5.10. Removing

I’ve started removing this step. In TODO.org I keep a hierarchical list of the actions to take (in a Mikado kind of style).

The basic strategy is to start with structures that no other structure depends on. Using the script utils/struct2dot.py you can generate a DOT graph that shows those dependencies.

Removal can be done in a couple of ways

  1. If it’s a very small structure you can replace a call to a FILL_XXX() macro with a compound literal.

  2. A better approach is usually to replace it with a fillXXX() function, or even better, with a newXXX(), if it consistently is preceeded with an allocation (in the same memory!). To see what fields vary you can grep all such calls, make a CSV-file from that, and compare all rows.

19.5.11. strTdef.h

The strTdef.h was generated using the option -typedefs as a part of the old -task_regime_generate strategy and generated typedef declarations for all types found in the parsed files.

I also think that you could actually merge the struct definition with the typedef so that strTdef.h would not be needed. But it seems that this design is because the structures in proto.h are not a directed graph, so loops makes that impossible. Instead the typedefs are included before the structs:

#include "strTdef.h"
struct someNode {
    S_someOtherNode *this;
    ...
struct someOtherNode {
    S_someNode *that;
    ...

This is now ideomatically solved using the structs themselves:

struct someNode {
    struct someOtherNode *this;
    ...
struct someOtherNode {
    struct someNode *that;
    ...

19.6. FILL macros

The FILL macros are now fully replaced by native functions or some other, more refactoring-friendly, mechanism. Yeah!

During bootstrapping a large number of macros named __FILL_xxxx is created. The intent is that you can fill a complete structure with one call, somewhat like a constructor, but here it’s used more generally every time a complex struct needs to be initialized.

There are even _FILLF_xxx macros which allows filling fields in sub-structures at the same time.

This is, in my mind, another catastrophic hack that makes understanding, and refactoring, c-xrefactory such a pain. Not to mention the extra bootstrap step.

I just discovered the compound literals of C99. And I’ll experiment with replacing some of the FILL macros with compound literals assignments instead.

FILL_symbolList(memb, pdd, NULL);

could become (I think):

memb = (SymbolList){.d = pdd, .next = NULL};

If successful, it would be much better, since we could probably get rid of the bootstrap, but primarily it would be more explicit about which fields are actually necessary to set.

19.7. Users

The -user option has now been removed, both in the tool and the editor adaptors, and with it one instance of a hashlist, the olcxTab, which now is a single structure, the sessionData.

There is an option called -user which Emacs sets to the frame-id. To me that indicates that the concept is that for each frame you create you get a different "user" with the c-xref server that you (Emacs) created.

The jedit adapter seems to do something similar:

options.add("-user");
Options.add(s.getViewParameter(data.viewId));

Looking at the sources to find when the function olcxSetCurrentUser() is called it seems that you could have different completion, refactorings, etc. going on at the same time in different frames.

Completions etc. requires user interaction so they are not controlled by the editor in itself only. At first glance though, the editor (Emacs) seems to block multiple refactorings and referencs maintenance tasks running at the same time.

This leaves just a few use cases for multiple "users", and I think it adds unnecessary complexity. Going for a more "one user" approach, like the model in the language server protocol, this could really be removed.