The Directory Class

The Directory class allows us to compare a directory on the filesystem to a representation of a directory structure that we expect to be present. A Directory instance stores a list of further Directory instances that implicitly encode its subdirectories - this results in the majority of class methods invoking some form of recursion to check compatibility or equality (amongst other things).

Directorys also contain metadata from the assignment specifications about which files (or types of files) they should contain, whether they should be the root of a git repository, and whether their name can be variable. A top-level Directory is typically identified when the parent attribute is None; the Assignment.structure property of the Assignment class typically makes such a Directory instance.

class Directory(name: str, directory_structure: Dict[str, Any] = {}, parent: Directory | None = None)
check_against_directory(directory: Path, do_not_set_name: bool = False, *substitutes_for_main_branch: str) Tuple[AssignmentCheckerError | None, List[str], List[str]]

Given a directory on the machine, determine if the contents of the directory are compatible with the expected setup of this Directory instance.

This method will check (in order):

  • If the given directory exists on the machine.

  • Check the name of the given directory against self.name.
    • If self.variable_name, then infer the variable name (possibly throwing warnings to the user).

    • Otherwise, confirm that the directory name matches self.name.

  • Check if the directory is the git root.
    • If self.git_root is False, confirm the directory is not a git repository, then skip these steps.

    • Otherwise;
      • Determine if a git repo is present at the given directory.

      • Determine if the working tree is clean.

      • Attempt to switch to the marking branch (main).

  • Check the files are present in the directory.
    • Check that all compulsory files are present in the directory.

    • Check that there are no unexpected files in the directory.

  • Delegate the checking process to the subdirectories of this Directory instance.

During this process, the checking algorithm may encounter FATAL errors, or WARNINGS.

FATAL errors make subsequent steps in the algorithm impossible:
  • When the directory does not exist on the filesystem.

  • When the name of the directory given does not match the fixed name of this instance.

  • When this directory should be a git repository, but is not.

  • When there are untracked changes within a git repository.

  • When there are uncommitted changes within a git repository.

  • When switching to main or another acceptable branch is impossible.

  • When a reference could not be checked out in the repository.

  • When a git repository is present in a directory that should not be a repository.

  • When a compulsory subdirectory is not present.

  • When a compulsory subdirectory with a variable name cannot be matched to a folder on the filesystem.

WARNINGs report errors in the submission, but which do not force the algorithm to halt:
  • If the repository was not on main branch when submitted.

  • If the repository does not have a main branch, but an alternative was identified & successfully checked out (EG master).

  • If compulsory files are missing from the directory.

  • If unexpected files are missing from the directory.

INFORMATION reports on other misc information obtained during the algorithm:
  • When optional files, or data files, are identified in a directory.

  • When a directory with a variable name is matched to a folder on the filesystem.

  • When optional folders are not found within the submission.

  • When optional subfolders with variable names are not matched to a folder on the filesystem.

Returns the following values, in the order given, as a tuple:

  1. An AssignmentCheckerError instance that reports the FATAL error encountered. This value is None if no FATAL errors were encountered.

  2. A list of strings that contain the text to be issued by WARNINGs that the algorithm wants to issue.

  3. A list of strings that contain the text to be issued as INFORMATION.

Note that if the first return value is None, and the second is an empty list, then the Directory instance is compatible with the directory on the filesystem that was passed in.

do_not_set_name can be set to True to prevent variable-named folders from inheriting the pattern-matched name from the filesystem. This is exclusively used when attempting to match variable-named subdirectories to those on the filesystem.

substitutes_for_main_branch should be a sequence of branch names that, if main is not present in the expected git repository, will be used instead.

check_files(directory: Path) Tuple[Set[str], Set[str], Set[str]]

Check the files that are present in the directory, returning:

  1. A list of compulsory files that are missing.

  2. A list of files that were not expected to be found.

  3. A list of optional files that were found.

check_git_repo(directory: Path, *allowable_other_branches: str) Tuple[AssignmentCheckerError | None, str | None]

Check whether the folder on the filesystem is (or is not) a git repository, as expected by the instance.

The method returns two values, in the following order.

1. An AssignmentCheckerError (corresponding to a FATAL error in check_directories) in the following cases (otherwise None):

  • The instance expects a git repository on the filesystem, but does not detect one.

  • The instance expects a git repository on the filesystem, and a repository is present but…
    • There are untracked files in the repository.

    • There are unstaged changes to files in the repository.

    • There are uncommitted changes in the repository.

    • The repository could not checkout main or another of the allowable_other_branches.

  • The instance does not expect a git repository on the filesystem, but there is one.

  1. A string containing WARNING information, that can be passed back to check_directories.
    • None is returned if there are no warnings to record.

check_name(directory_name: str, do_not_set_name: bool = False) bool

Check that the directory name given is compatible with this instance.

If self.variable_name is False, the directory_name must match self.name. If self.variable_name is True: - If self.variable_name_match is None, take the name provided as a match. - Otherwise, the directory_name must match the shell expression given in self.variable_name_match.

In the case of a variable name and a matching directory name, the self.name property will be set to the matched value. This can be suppressed using the do_not_set_name input.

compulsory: List[str]
data_file_patterns: List[str]
property fixed_name_subdirs: List[Directory]

Subdirectories of this Directory that do not have variable names.

git_root: bool = False
investigate_subdir(path_to_subdir: Path, subdir: Directory, do_not_set_name: bool = False) Tuple[AssignmentCheckerError | None, List[str], List[str]]

Essentially wraps check_directory when called on a subdirectory on the instance. This has utility within check_directory() as we can refactor out the body of two for loops into this function; - When we investigate subdirectories with fixed names, - When we investigate subdirectories with variable names, after having matched these to folders on the filesystem.

Note that path_to_subdir should point to the folder that is to be compared to subdir, unlike its counterpart in check_directories where directory points to the folder that is being compared to self.

The returned values, and remaining arguments, are identical to those of check_directory.

property is_data_dir: bool

Whether this directory is a ‘data directory’, that may contain data files with user-defined names.

property is_optional: bool

Returns True if the directory is an optional inclusion in the submission, and returns False otherwise.

A directory is optional if it contains no compulsory files.

match_variable_name_subdirs(directory: Path) Tuple[Dict[str, Directory], List[Directory]]

Handles cases where an instance has (potentially multiple) subdirectories that have variable names, meaning that we have to attempt to match directories based on their structure, not their names alone.

The method will attempt to match compulsory directories first, before attempting to match optional directories (if any exist).

The method returns two values, in the following order:

  1. A dictionary whose keys are the names of the subdirectories on the filesystem, and whose values are the Directory instances within self.subdirs that these match to.

  2. A list of Directories in self.subdirs that were not matched to directories on the filesystem.

Note that the second return value potentially includes optional subdirectories.

name: str
name_pattern: str
optional: List[str]
parent: Directory
property path_from_root: Path

Path to this directory, from the root of the directory tree.

If self.parent = None, this Directory is assumed to be the root of the tree.

subdirs: List[Directory]
traverse() Generator[Directory]

Traverse down the directory tree, yielding self first then descending into subdirectories.

property variable_name: bool

Whether this instance needs to match a name pattern.

property variable_name_subdirs: List[Directory]

Subdirectories of this Directory that have variable names.