Repositories
When we talk about data in Oxen, we usually talk about "Repositories". A Repository lives within your working directory of data in a hidden .oxen
directory. You can think of a Repository as a series of snapshots of your data at any given point in time.
Each snapshot contains a "mini filesystem" representing all the files and folders in that snapshot. The each mini filesystem is represented by a commit, and is stored in the .oxen
directory so that we can return to it at any point in time.
To see this in action let's instantiate a local oxen repository and see what it looks like.
$ oxen init
$ ls -trla
total 0
drwxr-xr-x 23 bessie staff 736 May 22 16:41 ../
drwxr-xr-x 3 bessie staff 96 May 22 16:41 ./
drwxr-xr-x 10 bessie staff 320 May 22 16:41 .oxen/
This magic .oxen
directory is what will hold all the snapshots of your data. Think of it as a local database that lets you roll back your data to any point in time.
Content Addressable File System
How are the different versions stored on disk? Let's add and commit some files to the repository and see what happens.
$ echo "Hello" > hello.txt
$ echo "World" > world.txt
$ oxen add hello.txt world.txt
$ oxen commit -m "Add hello.txt and world.txt"
Each file that gets added and committed to oxen gets stored in a Content Addressable File System in the .oxen/versions
directory. Oxen first computes a hash of the file, then stores the file in a sub directory that mirrors the hash. This means that the file can be retrieved by its hash at any time.
$ tree .oxen/versions
.oxen/versions
└── files
├── 18
│ └── 066113d946cfa640ffc8773c83f61b
│ └── data
└── a7
└── 666c8f5aaf946ca629d9d20c29aa6a
└── data
6 directories, 2 files
What's up with these funky hexadecimal directory names? Well each directory is a hash of the file. To see this in action, Oxen has a handy command to inspect information about an individual file.
oxen info -v world.txt
hash size data_type mime_type extension last_updated_commit_id
18066113d946cfa640ffc8773c83f61b 6 text text/plain txt 2c610ae8e424a4c8
oxen info
prints out a tab separated list of the hash, size, data type, mime type, extension, and the last updated commit id of the file.
In this case, the hash for the world.txt
file is 18066113d946cfa640ffc8773c83f61b
. As for the directory structure above, you can see we split the hash and use the first two characters (18
) of the hash as a prefix to the directory name. This is a common pattern in content addressable file systems to make sure you do not have too many sub-directories in a single directory.
Manually Inspect Older Versions
Currently the files in Oxen are uncompressed in the versions directory, so you can simply cat
the file to see the contents.
$ cat .oxen/versions/files/a7/666c8f5aaf946ca629d9d20c29aa6a/data
Hello
Note: We have compression in our list of future improvements that could be made to the system, but the fact that we keep them uncompressed is a nice property of the system. It allows us to take advantage of the native file format of the files on disk with out additional compression / decompression steps.
Storing New Versions
Let's change the hello.txt
file and commit it again.
$ echo "Hello, World!" > hello.txt
$ oxen add hello.txt
$ oxen commit -m "Update hello.txt"
Now look at the .oxen/versions
directory. You will see that we have a new hashed directory for the file. This means that the file has been updated and a new snapshot has been created.
$ tree .oxen/versions
.oxen/versions
└── files
├── 18
│ └── 066113d946cfa640ffc8773c83f61b
│ └── data
├── a7
│ └── 666c8f5aaf946ca629d9d20c29aa6a
│ └── data
└── ce
└── 1931b6136c7ad3e2a42fb0521986ba
└── data
8 directories, 3 files
Let's look at each individual file in the versions dir.
$ cat .oxen/versions/files/a7/666c8f5aaf946ca629d9d20c29aa6a/data
Hello
$ cat .oxen/versions/files/18/066113d946cfa640ffc8773c83f61b/data
World
$ cat .oxen/versions/files/ce/1931b6136c7ad3e2a42fb0521986ba/data
Hello, World!
While this doesn't give you the full picture of how Oxen works, hopefully gives you a starting point into the Content Addressable File System that Oxen uses to store all versions of the files. We will get into the details of the commit databases and other data structures as we dive into more domains.
LocalRepository
Since all of the data for all of the versions is simply stored in a hidden subdirectory, the first object we introduce is the LocalRepository
. This object simply represents the path
to the repository so that we know where to look for subsequent objects.
src/lib/src/model/repository/local_repository.rs
#![allow(unused)] fn main() { pub struct LocalRepository { pub path: PathBuf, // Optional remotes to sync the data to remote_name: Option<String>, pub remotes: Vec<Remote>, } }
Whenever starting down a code path within the CLI the first thing we do is find where the .oxen
directory is and instantiate our LocalRepository
object.
There is a handy helper method to get a repo from the current dir. This recursively traverses up in the directory structure to find a .oxen
directory and instantiates the LocalRepository
object.
#![allow(unused)] fn main() { let repository = LocalRepository::from_current_dir()?; }
You may want to reference the code for the add command to see how instantiating a LocalRepository
works in practice.
You will notice that not only does a LocalRepository
have a path
, but it also has a remote_name
and remotes
. These are read from .oxen/config.toml
and tell inform Oxen where to sync the data to.
Remotes
A remote in the context of Oxen is simply a name and a url. The name is a human readable representation and the url is the actual location of the remote repository.
#![allow(unused)] fn main() { pub struct Remote { pub name: String, pub url: String, } }
The remotes can be set through the oxen config
command.
oxen config --set-remote origin http://localhost:3001/my-namespace/my-repo
If you look in the .oxen/config.toml
file you will see the remotes listed there.
remote_name = "origin"
[[remotes]]
name = "origin"
url = "http://localhost:3001/my-namespace/my-repo"
You can have multiple remotes as well as a default remote specified by remote_name
. The default remote is the remote that will be used when you run oxen push
or oxen pull
without specifying a remote.
RemoteRepository
On the other end of the LocalRepository
is the RemoteRepository
. This object represents the remote repository that the LocalRepository
is connected to. It has the same url
as the Remote
object.
#![allow(unused)] fn main() { pub struct RemoteRepository { pub namespace: String, pub name: String, pub remote: Remote, } }
All repositories that are stored on the oxen-server
have a namespace
and name
. This helps us organize the repositories on disk, as well as in a way that is meaningful to the user.
In order to create a RemoteRepository
we will first need to spin up an oxen-server
instance. From your debug build you can do something like the following.
export SYNC_DIR=/path/to/sync/dir
./target/debug/oxen-server start
This will start a server on the default host 0.0.0.0 and port 3000. The environment variable SYNC_DIR
tells the server where to write the data to on disk.
Then we can use the oxen create-remote
command from the CLI.
oxen create-remote --name my-namespace/my-repo --host 0.0.0.0:3000 --scheme http
If you look in the SYNC_DIR
you will see a directory structure that mirrors the namespace/repo-name of the repository you just created. There will be a .oxen
directory with the remote repository created for you as well.
ls -trla /path/to/sync/dir/my-namespace/my-repo/.oxen
What's cool is that on disk the RemoteRepository
is the same structure as the LocalRepository
. This means that we can use the same code to manipulate the RemoteRepository
on the server as we can the LocalRepository
on the client.
If you didn't configure the remote earlier, you can do so now.
oxen config --set-remote origin http://0.0.0.0:3000/my-namespace/my-repo
Then simply push the data to the remote.
oxen push
This copies all the data from the local .oxen directory to the remote repository. Remember the versions directory from before? Let's see what it looks like on the remote.
$ cat /path/to/sync/dir/my-namespace/my-repo/.oxen/versions/files/ce/1931b6136c7ad3e2a42fb0521986ba/data
Hello, World!
There we go! Data is in tact on the remote server. This is the beauty of Oxen. There are not too many fancy bells and whistles when you look under the hood. Just a content addressable file system with a library that is shared between the client and server.
Next up we will look at Commits. These objects represent the group of files that were are in a single snapshot, and we will learn how Oxen knows which versions were added, removed, changed in the repository and when.