1.4: Working with Files and Directories - Biology

1.4: Working with Files and Directories - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Now that we know how to locate files and directories in the filesystem, let’s learn a handful of important tools for working with them and the system in general.

Viewing the Contents of a (Text) File

Although there are many tools to view and edit text files, one of the most efficient for viewing them is calledless, which takes as a parameter a path to the file to view, which of course may just be a file name in the present working directory (which is a type of relative path).[1]

The invocation oflesson the file p450s.fasta opens an “interactive window” within the terminal window, wherein we can scroll up and down (and left and right) in the file with the arrow keys. (As usual, the mouse is not very useful on the command line.) We can also search for a pattern by typing/and then typing the pattern before pressing Enter.

When finished withless, pressingqwill exit and return control to the shell or command line. Many of the text formats used in computational biology include long lines; by default,lesswill wrap these lines around the terminal so they can be viewed in their entirety. Usingless -Swill turn off this line wrapping, allowing us to view the file without any reformatting. Here’s what the file above looks like when viewed withless -S p450s.fasta:

Notice that the first long line has not been wrapped, though we can still use the arrow keys to scroll left or right to see the remainder of this line.

Themkdircommand creates a new directory (unless a file or directory of the same name already exists), and takes as a parameter the path to the directory to create. This is usually a simple file name as a relative path inside the present working directory.

Move or Rename a File or Directory

Themvutility serves to both move and rename files and directories. The simplest usage works likemv , where<source_path>is the path (absolute or relative) of the file/directory to rename, and<destination_path>is the new name or location to give it.

In this example, we’ll renamep450s.fastatop450s.fa, move it into theprojectsfolder, and then rename theprojectsfolder toprojects_dir.

Becausemvserves a dual role, the semantics are important to remember:

  • If<destination_path>doesn’t exist, it is created (so long as all of the containing folders exist).
  • If<destination_path>does exist:
    • If<destination_path>is a directory, the source is moved inside of that location.
    • If<destination_path>is a file, that file is overwritten with the source.

Said another way,mvattempts to guess what it should do, on the basis of whether the destination already exists. Let’s quickly undo the moves above:

A few other notes: First, when specifying a path that is a directory, the trailing/is optional:mv projects_dir/ projectsis the same asmv projects_dir projectsifprojects_diris a directory (similarly,projectscould have been specified asprojects/). Second, it is possible to move multiple files into the same directory, for example, withmv p450s.fasta todo_list.txt projects. Third, it is quite common to see.referring to the present working directory as the destination, as inmv ../file.txt .for example, which would movefile.txtfrom the directory above the present working directory (..) into the present working directory (., or “here”).

Copying files and directories is similar to moving them, except that the original is not removed as part of the operation. The command for copying iscp, and the syntax iscp . There is one caveat, however:cpwill not copy an entire directory and all of its contents unless you add the-rflag to the command to indicate the operation should be recursive.

Forgetting the-rwhen attempting to copy a directory results in anomitting directorywarning.

It is possible to simultaneously copy and move (and remove, etc.) many files by specifying multiple sources. For example, instead ofcp ../todo_list.txt ., we could have copied both the to-do list and thep450s.fastafile with the same command:

Remove (Delete) a File or Directory

Files may be deleted with thermcommand, as inrm . If you wish to remove an entire directory and everything inside, you need to specify the-rflag for recursive, as inrm -r . Depending on the configuration of your system, you may be asked “are you sure?” for each file, to which you can reply with ay. To avoid this checking, you can also specify the-f(force) flag, as inrm -r -f orrm -rf . Let’s create a temporary directory alongside the file copies from above, inside the projects folder, and then remove thep450s.fastafile and thetodo_list.txtfile as well as the temporary folder.

Beware! Deleted files are gone forever. There is no undo, and there is no recycle bin. Whenever you use thermcommand, double-check your syntax. There’s a world of difference betweenrm -rf project_copy(which deletes the folderproject_copy) andrm -rf project _copy(which removes the foldersprojectand_copy, if they exist).

Checking the Size of a File or Directory

Althoughls -lhcan show the sizes of files, this command will not summarize how much disk space a directory and all of its contents take up. To find out this information, there is thedu(disk usage) command, which is almost always combined with the-s(summarize) and-h(show sizes in human-readable format) options.

As always,.is a handy target, here helping to determine the file space used by the present working directory.

There is no shortage of command-line text editors, and while some of them—likeviandemacs—are powerful and can enhance productivity in the long run, they also take a reasonable amount of time to become familiar with. (Entire books have been written about each of these editors.)

In the meantime, a simple text editor available on most systems isnano; to run it, we simply specify a file name to edit:

If the file doesn’t exist already, it will be created when it is first saved, or “written out.” Thenanoeditor opens up an interactive window much likeless, but the file contents can be changed. When done, the key sequenceControl-owill save the current edits to the file specified (you’ll have to press Enter to confirm), and thenControl-xwill exit and return control to the command prompt. This information is even presented in a small help menu at the bottom.

Althoughnanois not as sophisticated asvioremacs, it does support a number of features, including editing multiple files, cut/copy/paste, find and replace by pattern, and syntax highlighting of code files.

Code files are the types of files that we will usually want to edit withnano, rather than essays or short stories. By default, on most systems,nanoautomatically “wraps” long lines (i.e., automatically presses Enter) if they would be longer than the screen width. Unfortunately, this feature would cause an error for most lines of code! To disable it, nano can be started with the-wflag, as innano -w todo_list.txt.

Command-Line Efficiency

While the shell provides a powerful interface for computing, it is certainly true that the heavy reliance on typing can be tedious and prone to errors. Fortunately, most shells provide a number of features that dramatically reduce the amount of typing needed.

First, wildcard characters like*(which matches any number of arbitrary characters) and?(which matches any single arbitrary character) allow us to refer to a group of files. Suppose we want to move three files ending in.tempinto atempdirectory. We could runmvlisting the files individually:

Alternatively, we could usemv file*.temp temp; the shell will expandfile*.tempinto the list of files specified above before passing the expanded list tomv.[2]

Similarly, we could move onlyfileA.tempandfileB.temp(but notfileAA.temp) usingmv file?.tmp temp, because the?wildcard will only match one of any character. These wildcards may be used anywhere in an absolute or relative path, and more than one may be used in a single path. For example,ls /home/*/*.txtwill inspect all files ending in.txtin all users’ home directories (if they are accessible for reading).

Second, if you want to rerun a command, or run a command similar to a previously run command, you can access the command history by pressing the up arrow. Once you’ve identified which command you want to run or modify, you can modify it using the left and right arrows, backspace or delete, and then typing and pressing Enter again when you are ready to execute the modified command. (You don’t even need to have the cursor at the end of the line to press Enter and execute the command.) For a given login session, you can see part of your command history by running thehistorycommand.

Finally, one of the best ways to navigate the shell and the filesystem is by using tab completion. When typing a path (either absolute or relative), file or directory name, or even a program name, you can press Tab, and the shell will autocomplete the portion of the path or command, up until the autocompletion becomes ambiguous. When the options are ambiguous, the shell will present you with the various matching options so that you can inspect them and keep typing. (If you want to see all options even if you haven’t started typing the next part of a path, you can quickly hit Tab twice.) You can hit Tab as many times as you like while entering a command. Expert command-line users use the Tab key many times a minute!

Getting Help on a Command or Program

Although we’ve discussed a few of the options(also known as arguments, or flags) for programs likels,cp,nano, and others, there are many more you might wish to learn about. Most of these basic commands come with “man pages,” short for “manual pages,” that can be accessed with themancommand.

This command opens up a help page for the command in question (usually inlessor a program similar to it), showing the various parameters and flags and what they do, as well as a variety of other information such as related commands and examples. For some commands, there are also “info” pages; try runninginfo lsto read a more complete overview ofls. Either way, as inless, pressingqwill exit the help page and return you to the command prompt.

Viewing the Top Running Programs

Thetoputility is invaluable for checking what programs are consuming resources on a machine; it shows in an interactive window the various processes (running programs) sorted by the percentage of CPU time they are consuming, as well as which user is running them and how much RAM they are consuming. Runningtopproduces a window like this:

From a users’ perspective, the list of processes below the dark line is most useful. In this example, no processes are currently using a significant amount of CPU or memory (and those processes that are running are owned by the administratorroot). But if any user were running processes that required more than a tiny bit of CPU, they would likely be shown. To instead sort by RAM usage, use the key sequenceControl-M. When finished,qwill quittopand return you to the command prompt.

Of particular importance are the%CPUand%MEMcolumns. The first may vary from 0 up to 100 (percent) times the number of CPU cores on the system; thus a value of3200would indicate a program using 100% of 32 CPU cores (or perhaps 50% of 64 cores). The%MEMcolumn ranges from 0 to 100 (percent). It is generally a bad thing for the system when the total memory used by all process is near or over 100%—this indicates that the system doesn’t have enough “working memory” and it may be attempting to use the much slower hard drive as working memory. This situation is known as swapping, and the computer may run so slowly as to have effectively crashed.

It sometimes happens that programs that should run quickly, don’t. Perhaps they are in an internal error state, looping forever, or perhaps the data analysis task you had estimated to take a minute or two is taking much longer. Until the program ends, the command prompt will be inaccessible.

There are two ways to stop such running programs: the “soft” way and the “hard” way. The soft way consists of attempting to run the key combinationControl-c, which sends a stop signal to the running process so that it should end.

But if the rogue program is in a particularly bad error state, it won’t stop even with aControl-c, and a “hard” kill is necessary. To do this requires logging in to the same machine with another terminal window to regain some command-line access. Runtop, and note thePID(process ID) of the offending process. If you don’t see it in thetopwindow, you can also try runningps augx, which prints a table of all running processes. Suppose the PID for the process to kill is24516; killing this process can be done by runningkill -9 24156. The-9option specifies that the operating system should stop the process in its tracks and immediately clean up any resources used by it. Processes that don’t stop via akill -9are rare (though you can’tkilla process being run by another user), and likely require either a machine reboot or administrator intervention.


  1. Create the following directories inside your home directory, if you don’t already have them:downloads,local, andprojects. Inside oflocal, create a directory calledbin. These folders are a common and useful set to have in your home directory—we’ll be using them in future chapters to work in, download files to, and install software to.
  2. Open not one but two login windows, and log in to a remote machine in each. This gives you two present working directories, one in each window. You can use one to work, and another to make notes, edit files, or watch the output oftop.

    Create a hidden directory inside of your home directory called.hidden. Inside of this directory, create a file callednotes. Edit the file to contain tricks and information you fear you might forget.[3]

  3. Spend a few minutes just practicing tab completion while moving around the filesystem using absolute and relative paths. Getting around efficiently via tab-completion is a surprisingly necessary skill.
  4. Skim themanpage forls, and try out a few of the options listed there. Read a bit of theinfopage fornano.


  1. Saadya

    I versed in this matter. We need to discuss.

  2. Amett

    I can suggest to go to the site, with a huge amount of information on the topic of interest to you.

  3. Curtis

    Nothing new :(

  4. Niichaad

    you quickly invented such incomparable phrase?

  5. Ottah

    Yes, really. All above told the truth. Let's discuss this question. Here or in PM.

  6. Muti

    Interesting site

Write a message