Reproducible Research with LaTeX and Git

Technology18 May 2026

Reproducible Research with LaTeX and Git

So far, this series has focused on code, models, and data.

There is one more piece that usually sits next to all of that, but is handled very differently. The paper.

Author: Hirad Emamialagha

While the engineering side at least has the tools to be structured (Git for versioning, pipelines for builds), the paper often follows a completely different workflow. Files get passed around, edits are merged manually, and eventually, a PDF is produced and shared.

It works. But it does not really align with how the rest of the system is built.

That mismatch becomes more obvious when projects grow. You can trace a model back to a dataset and a configuration, but connecting that to the exact version of the paper that describes it is not always straightforward.

At some point, it starts to feel inconsistent.

Papers tend to lag behind the rest of the system, even though they describe it.

Treating Papers Like Part of the System

A research paper is not that different from the rest of the project.

You have source files, figures, references. You run a build step and get a PDF. That is not far from compiling code into an artifact.

Once you look at it this way, the gap is mostly in how it is managed.

Code lives in repositories, with history, structure, and some level of automation. Papers often live in folders, sometimes versioned, sometimes not, with a build process that depends on a local setup that is hard to reproduce.

So the idea is not complicated.

Keep the paper in version control. Treat the source as the main artifact. Make the build reproducible. That alone already changes how things behave.

Now when you update a section, change a figure, or fix a reference, it is tracked like any other change. You can go back, compare versions, understand how the paper evolved.

A paper behaves like the rest of the project once you treat its source as the main artifact.

The Build Step Matters More Than Expected

The problem with building a paper is usually the environment. Locally, everything works because you have your specific LaTeX setup and cached files, but the moment someone else clones the repository, it fails.

Locally, everything works. You have the correct LaTeX setup, necessary packages, and perhaps cached files. Someone else clones the repository and cannot compile the document.

Our approach was to treat it like any other build: we moved it into a controlled environment. By running everything through a CI pipeline, the environment itself becomes part of the repository.

Once the paper builds through CI, the environment becomes part of the project. You know exactly what generated the PDF. Every change triggers a new build, and the output is always available.

It also removes a constant source of friction. Contributors do not need to configure LaTeX locally to make changes. They focus on content while the system handles the build.

Eventually, it feels normal. You push a change, and minutes later a new PDF attaches to it.

If the build is not reproducible, the paper is not either.

Connecting Papers to Code and Experiments

This is where things start to line up with the previous articles.

A paper describes experiments. Those experiments depend on code, models, and data. If those pieces are versioned, the paper should point to those exact versions.

Otherwise, you end up with small mismatches.

A figure in the paper comes from a model that has since been updated. Metrics change slightly. A dataset is modified. The paper still exists, but the connection to the underlying system weakens over time.

Keeping references explicit helps here.

The version of the code. The dataset snapshot. The model checkpoint. Not everything needs to be embedded in the paper itself, but it should be clear where it comes from.

A paper only stays accurate if it points to the exact versions behind it.

Where This Starts to Converge

At some point, we started standardizing this internally. Same structure, same build setup, same expectations across different papers. It made starting new work easier, but more importantly, it made older work easier to revisit.

That effort eventually turned into a small project, something we now refer to as Research PaperOps. The idea is straightforward. Apply the same discipline used for code to research papers.

Not a heavy system. Just enough structure so that papers behave like the rest of the project.

Once papers follow the same workflow as code, the technical friction disappears. You still have to deal with the actual difficulty of writing and editing, but you're no longer fighting the build system or hunting for lost figures.

Where This Turns Into Something Concrete

This approach evolved beyond just a way to think about papers.

We set up several repositories with the same structure, similar Continuous Integration (CI), and consistent expectations for building and version tracking. This made starting new projects faster, but the real win was that older papers didn't break over time. They stayed buildable and stayed linked to the right version of the code, even years later.

After repeating the setup multiple times, rebuilding it felt redundant.

We extracted these patterns into a small public project; not a framework, but a baseline: a template, build pipeline, and structure reflecting our practical usage.

The project is called Research PaperOps.

It follows the same ideas described here. The paper lives in version control. Builds happen in a controlled environment. Outputs are reproducible. And each version of the paper can be traced back to the system behind it.

It is not trying to cover every workflow. It just removes the repetitive parts so the focus stays on the content.

At some point, it is worth turning the workflow itself into something reusable.

Reproducibility Shows Up Here Too

The same issues discussed earlier come back.

Reproducibility is not only about code and models. It also includes the paper that reports the results.

When someone reads a paper and wants to know exactly where a specific number came from, they should be able to find that path. It does not have to be a perfect trail, but it should be clear enough to follow.

Version control helps. Build automation helps. Explicit links to code and data help.

Without those, the paper becomes a static snapshot. Useful, but harder to connect to the system that produced it.

Reproducibility includes the paper, not just the system behind it.

A Quick Check

A simple check here is similar to the previous ones.

Clone the paper repository. Try to build it. See what happens.

If the build works without manual fixes, most of the structure is in place. If not, the missing pieces tend to be obvious. A dependency is not declared. A file is not included. A step is assumed.

If the paper cannot be built cleanly, something important is missing.

Final Thoughts

At some point, code, models, data, and papers stop feeling like separate pieces.

They are all parts of the same system. They evolve together, and they depend on each other more than it seems at first.

Treating papers in the same way as the rest of the project does not add much overhead. It mostly removes friction. Things become easier to track, easier to rebuild, easier to understand later.

That is usually enough.

Closing the Series

This series started with a simple idea. Open source is not something you do as an afterthought, it affects how you structure things from the beginning.

That applies to code, to models, to data, and also to research.

Once those pieces are aligned, sharing the project becomes a natural step, not a separate effort.

Together, these parts take "Open Source" from being a button you click on GitHub to a design philosophy. They move the goalposts from "here is my code" to "here is a system you can actually trust and build upon."

Open source works best when the whole system is designed to be shared from the start.