GNU Linux

History

UNIX

A couple of engineers who initially working on an operating system called Multics. Multics was to be a time-sharing operating system. However, it started falling apart and these two engineers from AT&T branched off to make their own, smaller version of this. This turned out to be UNIX, which sarcastically meant the "uniplexed" version of Multics.

BSD

AT&T was trying to standardize their System V version of UNIX. But then, UC Berkeley made their own version of an OS based off of UNIX which they called BSD. BSD shipped with networking capabilities. This period of competition among companies to standardize their OS is termed the UNIX wars. And also AT&T copyright striked BSD.

GNU

To avoid legal issues, Stallman made an operating system based on UNIX by remaking its code. So, GNU doesn't contain any code from UNIX. It even stands for "GNU is Not UNIX".

GNU/Linux

GNU had its own kernel called GNU Hurd, but it was incomplete. Around the same time, Linus Torvalds was working on his hobby project, the Linux kernel. This kernel was then integrated with GNU to give us GNU/Linux.
What we commonly refer to everyday as Linux is actually GNU/Linux. GNU/Linux was free, both free as in cost and free as in freedom.

Distros

There was no organization making decisions. The Linux kernel is free, and anyone can make their own distro. So, distros started popping up, just because people could make them. Softlanding, Yggdrasil were among the earliest distributions. Slackware was based off of Softlanding and openSUSE is based off of Slackware.

Network management

NetworkManager

NetworkManager is a service that continuously scans for networks, and connects to the most preferred ones. Wired networks are given more preference over wireless networks. This program is used a lot on laptops where wireless network management is a necessity and we keep switching between networks. For servers, this is completely unnecessary.

Enterprise distributions

NetworkManager has been widely adopted by lots of distributions to run by default, such as Debian and Ubuntu. But on enterprise distributions like CentOS and Red Hat, NetworkManager is not configured or installed by default.

Network configuration

Hostname and IP address

The oldest and most widely accepted way of doing this is through the /etc/hosts file. This file contains information (in a line by line) about IP addresses and their names in the following format:

127.0.0.1	localhost
127.0.1.1	zephyrus-ubuntu

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

This file however, contains only local mappings and is best reserved for mappings that are required during boot. Other mappings can be found using DNS or LDAP.

On Linux, the configuration file for the DHCP client can be found at /etc/dhcp/dhclient.conf.

Network interface

A network interface is just hardware that can connect to a network. Machines with multiple ethernet ports, for example, will have different network interfaces controlling each port. Every network interface (hardware, not virtual) has a unique MAC address (also called hardware address) (which is chosen at time of manufacture) to identify itself.

Every machine has the lo network interface, which is virtual and is a loopback. A loopback interface is used by a machine to connect to itself. The other interfaces are hardware dependent. On my laptop for example:

wahid@zephyrus-ubuntu:~$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 50:eb:f6:e1:84:26 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000
    link/ether b4:8c:9d:5b:1d:6b brd ff:ff:ff:ff:ff:ff

There is the before mentioned lo interface. Then there is a eno1 interface that controls the ethernet port and a wlp3s0 wireless network interface.

These network interfaces can be turned on and off using

ip link set <device> <on|off>

Routing

When a network packet bound for some other host arrives, then the packet's destination IP is searched for in the kernel's routing table. From here, one of two things can happen:
(To understand how the packet is sent from a device to the router via Ethernet : ARP)

  • The destination IP is found in the kernel's routing table; that is, the destination address is either local or can be reached via a local gateway. Then the packet is forwarded to the appropriate destination.
  • The destination IP is not found in the kernel's routing table. The default route (0.0.0.0) is invoked.
    An example kernel routing table:
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 wlp3s0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 wlp3s0
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 wlp3s0
Network statistics

ip -s link command gives us network statistics for each network interface on a device.

SSH

Secure SHell (ssh) is a protocol for remote logins. It is a client-server protocol that requires that the remote machine be running an SSH daemon. OpenSSH, an open source implementation of SSH is the standard on almost all versions of UNIX and GNU/Linux.
Basic usage: ssh user@host or ssh -l user host

Key fingerprint

When we use ssh to connect to a remote host, that is not recognized by our machine (a remote host that we have previously not connected to), the server sends a hash of its public key called the key fingerprint. If we choose to accept this "unknown" remote server's fingerprint, then the fingerprint is added to ~/.ssh/known_hosts.

Private and public keys

These are stored at ~/.ssh. However, if this directory doesn't have the permissions set to 0700, it is ignored. The SSH protocol can make use of public key cryptography (it also has other means of authentication) for authentication.

Here is how SSH is usually configured an used:

  1. The user creates a private-public key pair with ssh-keygen
    1. ssh-keygen can be used to generate specific types of keys with the -t flag.
    2. ssh-keygen -t rsa -b 4096 generates an RSA key with 4096 bits.
    3. We are also prompted to set an optional passphrase for the keys.
  2. The private and public keys are stored as ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub respectively.
  3. Now, if the server has our public key listed in its ~/.ssh/authorized_keys, then we are allowed to login to this account via SSH.
    Important

    It is necessary that our remote user's ~/.ssh directory have the permssions set as 0700 and that ~/.ssh/authorized_keys (which contains our public key) have the permissions set to 0600.

ssh-agent

ssh-agent is a daemon, that caches private keys. When using ssh, the private keys that are cached are used automatically. It is also possible to have multiple private keys cached.

  • A private key can be loaded using the ssh-add command as follows - ssh-add ~/.ssh/id_rsa
  • And currently loaded private keys can be listed with ssh-add -l
  • To remove a private key, we do ssh-add -d ~/.ssh/id_rsa (This one is weird, it apparently requires that the public key be present in the same directory as the key we are trying to remove. If we don't have the public key, it can be extracted by doing ssh-keygen -yf ~/.ssh/id_rsa)
  • ssh-add -D purges all cached keys
    Arguably, the most important feature of ssh-agent, is that cached keys can be forwarded to remote machines. This means, we don't have to copy over our private keys to jump around remote hosts. The cached keys are taken forward with us, whenever we ssh, provided we use ssh -A (A stand for agent forwarding).

Port forwarding

Local port forwarding is done with the -L flag like this -

ssh -L local_port:remote_address:remote_port user@host

Alright, this is kinda fucky to read, but whats happening is basically

ssh
access it from host
localhost:local_port
host
remote_address:remote_port

Basically, the host tries to access the remote service instead of us, and just forwards us the data on local_port. And also, whatever we send to local_port gets forwarded to remote_address:remote_port

We can also specify -R for remote port forwarding. This is where we allow a remote server to access our local ports.
Example
If we have a webserver running on port 80 on our machine, and we have done ssh -R 9000:localhost:80 user@host. Then any requests to the host at port 9000 will be forwarded to localhost on port 80.

Text manipulation

sed

The sed command is used to modify text content line by line.
Example
sed '1,4 s/x/y/5g' file replaces every 5th and later occurences in the range of linenumbers 1 to 4.

Some things to note:

  • s means substitute
  • d means delete
  • \b means word boundary

awk

The awk command can split each line of input based on specifed delimiters into fields, and then perform functions on these fields. Some of the most commonly used functions are print and boolean logic.
Example

ps lax | awk -F' ' '$3 == 1'

This command would split each line into fields using ' ' as delimiter, and print only lines that have the third field equal to 1. The 3rd field of ps lax is PID. So this would print -

4     0       1       0  20   0  22256 12408 -      Ss   ?          0:05 /sbin/init

cut

The cut command is used to display a specific field (portion) of output given a delimiter. Given a delimiter, it splits up each line accordingly and numbers each field from 1.

  • -d flag is used to specify delimiter
  • -f is used to specify field number
    Example
wahid@zephyrus-ubuntu:~$ echo 'f1,f2,f3:f4,f5,f6:f7,f8,f9' | cut -d: -f2
f4,f5,f6

wc

This command output the number of lines, words and characters in a given file. It it almost always used with the corresponding flags -l, -w, -c to output only numbers, surrounded by backticks.

tee

Is basically like a T-fixture in plumbing. Redirects its input to STDOUT and a specified file.

Filesystem

The filesystem can be thought of as an explorable hierarchy that starts off at /, the root directory. Each component of a pathname can only be 255 characters long. And the entire pathname can only be 4095 bytes long on Linux.

mount

Think of our filesystem as a tree and a filesystem on another disk as a branch. We can add external filesystems to our "tree" by using the mount command. mount is a wrapper for filesystem specific mount commands.
Example usage

mount /dev/nvmen1p5 /mnt

This 'mounts' (or attaches) the filesystem at /dev/nvmen1p5 to /mnt. So we can cd into /mnt and browse it.
umount is used to unmount a filesystem.

Filesystem table

The contents of the file /etc/fstab dictate what filesystems are to be mounted and fsck'ed at boot time. Turns out that fsck is also a wrapper like mount for filesystem specific fsck commands. These are present in the /sbin directory.

wahid@zephyrus-ubuntu:~$ ls /sbin/ | grep fsck
dosfsck
e2fsck
fsck
fsck.cramfs
fsck.ext2
fsck.ext3
fsck.ext4
fsck.fat
fsck.minix
fsck.msdos
fsck.vfat

Organization

Info

sbin and bin were used to differentiate between statically and dynamically linked binaries. But these days they have no real difference.

  • /bin and /sbin contain system critical utils
  • /var contains log files and other files that change rapidly
  • /tmp is for temp files
  • /usr is where standard important files are kept, but these are not system critical
  • /lib and /lib64 is for shared libraries. These are often symlinks to /usr/lib and /usr/lib64
  • /etc is for system critical configuration files
  • /boot is where the file containing the kernel is located along with the bootloader
  • /dev is a virtual fs??? Further study required
    Tip

    man hier tells us general info about the filesystem hierarchy. It may not be followed on the system, it is just a general overview.

File types

ls -la lists out the file types and permissions. The characters given below can be used as a reference to this command's output.

Filetype Symbol
Regular file -
Directory d
Character device file c
Block device file b
Local domain socket s
Named pipe p
Symlink l

Regular files

These are just a blob of data, a series of bytes. They could be text, libraries, executables and so on.

Directory

A directory contains a list of references (hardlinks) to files. ., .. are special directories that mean current and parent directory. The name of a file is stored in the directory it is in, and not the file itself. So, it is possible for multiple directories to contain a reference to the same file.

Important

A file with write permissions disabled can still be deleted, as it is the directory that contains the file name to data mapping. So, to delete a file, we would require write and execute permissions on the directory and not the file itself (read perms too if we do not know the file name).

A directory stores the name of a file and its inode number.
Related: Directory permission

Device files

Device files act as a gateway between the filesystem and device drivers. When a request is made to a device file, it just forwards the request to the appropriate device driver. Each device file has a major and minor device number, to represent which driver and which unit of that driver to refer to.
Example
/dev/tty0 and /dev/tty1 would have the same major number but different minor numbers.

Sockets

Sockets allow processes to communicate. Also called Unix Domain Sockets (UDS), these are specifically used for communication between processes running on the same host OS. Unlike TCP packets which are used for communication between processes running on a network.
So, because UDS know they're on the same host, they can avoid a few checks that TCP/IP packets go through, and therefore be lighter and faster.[1]
A socket is also bidirectional, and can be used by multiple processes simultaneously.

FIFOs

Named pipes / FIFOs are basically super primitive local domain sockets. They have nothing to do with a network, are not bidirectional. We would require two pipes, one for read and one for write, for each client process that would want to communicate.[2]

Symlinks or softlinks are a pointer to a file. If the file is moved or is removed, then the symlink becomes invalid.

Types of filesystems

Theres a few different types of filesystems, each aiming to be good at specific things. ext4 is the standard in Debian and Ubuntu, while CentOS and Red Hat use XFS.
Filesystems like ext4, XFS are "traditional" in the sense that they do not handle stuff like volume management and RAID. Those things are handled separately, away from the filesystem. Non traditional filesytems would be ones like BTRFS and ZFS. They support integrated approahced to managing RAID, volumes, and even snapshots(?).

ext4

The ext4 filesystem is what is called a journaling filesystem. Journaling is somewhat like transactions in SQL. Whatever fs operation is to be performed is written to a journal, and then it is commited. Only then is the actual modification made.
If for some reason the actual operation failes, the filesystem can just look at the journal and safely reapply the last commit.

Info

ext3 was the first to introduce journaling in the ext family. ext4 was just small revision over ext3 that extended a few limits

BTRFS

BTRFS and ZFS follow what is called Copy On Write. With COW, the entire filesystem moves from one consistent state to another. Instead of modifying data in place on the disk, an in-memory copy is modified and then written to some vacant block. And whatever was pointing to the original data is also rewritten to point to the new block. And so on, the parent's parent is rewritten up until the topmost level.

Software management

Package managers make managing software easier. Packages they install often generate new config files, add new groups.

Version numbers

The version number in the name of a package might not always match the version number of the actual software it contains.

Debian has dpkg to install .deb files. dpkg --install package.deb installs the package if it is not already installed. If it is already installed, then dpkg removes the existing package before installing the new one.

Software management on Debian based distros

Debian uses Advanced Package Tool (APT) as its package manager. APT started out at the Debian and dpkg side, but nowadays it also handles RPMs.

Dummy packages

Sometimes an empty package is listed with a bunch of dependencies, so that all these dependencies can be installed under a single alias. For example, gnome-desktop-environment is not actually an existing software package, but is a dummy package with a list of dependencies that are required to run GNOME.

Repositories

The configuration file for apt is located at /etc/apt/sources.list. This file contains information about where to find packages.
Each line is supposed to follow this format:

  • Type: Can be deb, deb-src, rpm, rpm-src
  • A URL that points to a file or an http server or an ftp server from which to fetch packages
  • A release name (that is often called a "distribution")
  • A list of components that are basically categories of packages. universe is one such component that has lots of open-source software. main, multiverse are some other examples

Commonly used apt commands

apt is actually a wrapper for a bunch of other low-level commands like apt-get

  • update forces apt to download and update package information from configured repositories
  • upgrade installs available upgraded packages. If an older dependency is no longer required, it will NOT be removed
  • autoremove removes all orphan dependencies. If however, this orphan dependency is something that is useful (it could be a standalone package that we manually installed), then it can be marked with apt-mark for autoremove to exclude it
  • install searches for a given package name and installs it
  • remove just removes an installed package but leaves behind configuration files
  • purge is remove but also removes config files. NOT the config files in /home. purge also works on already remove'd packages

File permissions and attributes

3 Triplets

Every file has attributes. 9 of these are standard permission triplets - read, write and execute for the user (owner), group and others (everyone else).
They are structured as (rwx)(rwx)(rwx) for user, group and others in that order.

Special bits

Each file also has 3 other bits -

  • SUID (set user id)
  • SGID (set group id)
  • Sticky bit
    These bits are shown as the third bit in each triplet.
    s implies that x is also set. S implies that only the set... bits are set and not the x bit.
    t implies that x is also set. T implies that only the sticky bit is set and not the x bit.[3]
  • When the third bit in the user triplet is s, then it is the setuid bit.
  • When the third bit in the group triplet is s, then it is the setgid bit.
  • The other triplet's third bit can't be s.

Effects on files

SUID: When set on an executable file, the file is executed with the permissions of the user (owner).
SGID: When set on an executable file, the file is executed with the permissions of the group of the user.
Sticky: Was used to retain programs in memory. Is now obsolete.

Effects on Directory

A directory just stores a filename and the file's inode number. Read permissions on a dir allow for accessing the filenames only. Write permissions allow for adding, renaming and deleting the files in it.[4]
The "execute" bit is reused as the search permission. When it is set, we are allowed to search for a file's inode. We can only read the files inside a directory, if we have read perms on the file itself and search (execute) perms on the directory. The execute bit is also required for the stat() syscall, which is called during opening and deletion of a file.

SUID: Has no effect.
SGID: Files created in the directory will inherit the group of the directory.
Sticky: If set, only the owner and the super user are allowed to rename or delete files in it.

Managing users and permissions

chown

The chown command is used to change the ownership (owner and group) of a file. Usual syntax is

chown <user>:<group> path/to/file 

chmod

The chmod command is used to change the permissions of a file. The syntax is pretty memorable(u g o = user group other; r w x s = read write execute uid/gid).

useradd and adduser

The useradd command is used to add an user. The default config for this command (such as the default shell) is located at /etc/login.defs. Using -D, we can change the default values.
Example
useradd -D -s /bin/tcsh changes the default shell for any users that will be added in the future.
useradd -D lists the default values. On my laptop, these are the defaults (haven't been modified)

wahid@zephyrus-ubuntu:~$ useradd -D
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/sh
SKEL=/etc/skel
CREATE_MAIL_SPOOL=no

adduser is a more user friendly and interactive command (perl script really) that is available by default on Debian/Ubuntu.

Process management

Components of a process

PID and PPID

Each process is assigned a unique ID by the kernel called a Process ID (PID). When a process wants to call a sub process, it must clone itself, and then exchange itself with a different process. This clone is called the child process, and the original process is the parent process. The child is given an attribute called the Parent PID (PPID) which is the PID of the parent process.

UID, GID and EUID, GUID

  • UID is the ID of the user under which the process was invoked
  • EUID is the ID of the user, with whose perms the process is being run. It is usually the same as UID, but in case of sudo'ing a process, its EUID will be root's UID
  • GID and UGID are similar to how UID and EUID work

Niceness and priority

A process's CPU time is determined based on how much CPU time it has most recently used, and how long it waited, and something called the nice value or niceness. The higher the niceness, the higher the priority.

We can observe two values - PR (priority) and (NI) niceness, when we use a process explorer like top. PR and NI are related as - [5]
PR = 20 + NI

  • The kernel allows for 140 priority values from -100 to 39, both included. The priorites from -100 to -1 (100 total) are called real time priorites, and these take priority over the normal processes.
  • As for the niceness values, the range is -20 to 19, both included. So, the priority level for a nice'd process maps from 0 to 39, which are the rest of the 140 priorities.
    We can set a process' niceness before it starts with the nice command as show here -
nice -n <value> ./process

We can also set niceness to a running process with renice as follows -

renice -n 10 -p $(pgrep firefox)

(The -p flag is to specify that we will pass in PIDs).

We CAN set niceness to a negative value to make a process realtime, but this would require higher perms (sudo or root user). Real time processes will have negative PRs, like discussed earlier, but some of them might also have rt as their PR, which means -100, or highest priority.

Management

Stopping

Processes with a known PID can be stopped using the kill command. This command also takes in a signal as an argument that is sent to the process.
^C sends a SIGINT signal which is an interrupt signal, and the program can handle this interrupt; in most cases it aborts.

Processes can be stopped usually with a SIGINT, if however that fails, we can use -

  • kill -9 <PID> to send a SIGKILL. If this fails for some reason, the only way to kill it properly is to reboot. For other possible signals, try kill -l
  • killall <name> is very similar to kill, but instead of PIDs it takes names and kills all processes by that name. Default signal is SIGTERM but we can change it just like with kill.
  • pgrep or process grep, basically greps processes by name and lists their PIDs. pkill kills them instead of listing them. We can also pkill by owner of the process using the -u flag and specifying user.

Suspending and resuming

^Z sends a SIGTSTP signal which suspends a process.
Suspended processes can be viewed using the jobs command, and can be resumed to foreground or background with the fg and bg commands followed by the index that the jobs command provides.

Monitoring

Processes can be monitored using the ps command. This command's output is highly configurable and the most important and common usage is -

  • ps aux, a for all processes, x for including processes without a control terminal, u for user oriented output
  • One drawback of the u flag is that it converts all the UIDs to usernames, which might be very slightly computationally expensive. So, a "more" efficient way is ps lax which lists the output in (l) long format
  • Additionally add ww to the args to allow for wrapping
    How can a process not have a control terminal?

    Processes that are started by the system's init system (systemd or init ...) don't have a control terminal.
    Daemons startup by forking a child process and then the parent process is killed. The daemon is now an orphan and is adopted by the init system. Therefore, a daemon will also not have a control terminal.

strace can be used to attach to a process by doing strace -p <PID> and it outputs all syscalls made by that process.

Scheduling with cron

cron is used to schedule tasks. It is configured by modifying the cron table (crontab). When a cron service is active, it will schedule all tasks in the crontab. The actual location of crontabs depends on the implemenation that is being used.

crontab format

Each line in crontab follows this format

minute hour day_of_month month day_of_week command
  • All the time variables start with 0 except for day_of_the_month and month
  • day_of_week starts with 0 as sunday
    Example [6]
*/5 9-16 * 1-5,9-12 1-5 ~/bin/i_love_cron.sh

This would run the script

  • Every 5 minuts
  • From 0900 to 1655
  • Every day of the month
  • Except for the months - 6,7,8 (June, July and August)
  • On the weekdays (1-5)

Init systems

When the kernel finishes its initialization process, it starts the init process. init is (was lol) a system management daemon. It has a number of important responsibilites including (but not limited to) -

  • Setting hostname, timezone
  • fsck'ing filesystems and mounting the filesystems in fstab
  • Configuring network interfaces
  • Starting up other daemons
    The most commonly used init systems presently are -
  • "Traditional" init, which is based on AT&T's SystemV init. This one was the standard until systemd came along
  • The init that is used in BSD systems, and is based off of BSD UNIX
  • systemd, which aims to handle everything about daemons and system management

Related: initd

Other init systems

Mac used to use launchd until it eventually switched to systemd
Ubuntu up until 17.10 used upstart, also switched to systemd
Void Linux uses runit

systemd

systemd handles more than just processes. It handles network connections, kernel logs, logins. It has all the features that traditional init did, but adds on to it so much more and unifies them.

systemd controversies

systemd, seemingly goes against UNIX philosophy which is to keep components as small as possible. So, of course there is an anti-systemd fandom. For the funny: https://without-systemd.org

systemd manages entities that it calls units. A unit can be a lot of things, so I won't list all of them here, but some of the important ones are a service, a socket, a device, a mount point.
We can differentiate them with their "extension". A service, for example would end with .service.
Example unit file

[Unit] 
Description=fast remote file copy program daemon ConditionPathExists=/etc/rsyncd.conf 

[Service] 
ExecStart=/usr/bin/rsync --daemon --no-detach 

[Install] 
WantedBy=multi-user.target

Unit files can be found at -

  • /usr/lib/systemd/system
  • /lib/systemd/system
  • /etc/systemd/system (Highest priority)
systemctl

To manage systemd units, we have the systemctl command. This can be used to -

  • start and stop a service
  • enable and disable a service to start at boot
  • list-units to list all installled units
    • Additionally we can provide the unit name
    • Or the type argument, --type=service for example, to filter between units
  • check status

Bash scripting

Bourne shell

The Bourne shell is the original UNIX shell that was developed by AT&T. The sh that we use today is the Alqmuist shell, which is a reimplementation. The Bourne again shell, bash is pretty much the standard for both login shells and for scripting.

Scripting

Pipes and redirection

Every process has atleast three channels available - STDIN, STDOUT, STDERR. Which are assigned numbers called file descriptors, in this case 0, 1, 2 respectively. We are allowed to redirect these channels among processes in a number of ways by using a few symbols. [7]

  • < connects STDIN to an existing file
  • > and >> write and append to a file respectively. They connect STDOUT to a file.
  • >& is used to redirect both STDOUT and STDERR to the same location.
  • 2> is used to redirect only STDERR
    • A common usecase for this is to dump all errors to /dev/null
  • 1>&2 redirects STDOUT to STDERR
  • 2>&1 redirects STDERR to STDOUT
  • | connects the STDOUT of one command to the STDIN of another
  • && executes the second command if the first one succeeds
  • || executes the second command if the first one fails
  • ; can be used to write a bunch of commands in one line

Variables

Variables are assigned as var='text'. But they are referenced, prefixed with a $ sign, like $var. When a variable has to be referenced inside a string, we additionally surround the variable name in {}, like ${var}.
Anything that is wrapped in backticks is treatead like a command, and the output of the command is substituted.

Environment variables

When a process starts, it receives all command line arguments along with these "pre-existing" variables called environment variables. Based on environment variables, a program can decide to change its behaviour.
Example
When the MOZ_ENABLE_WAYLAND environment variable is set to 1. Firefox will try to use wayland instead of x11.

Viewing

The currently set environment variables can be listed with the printenv command.
Example

wahid@zephyrus-ubuntu:~$ printenv
SHELL=/bin/bash
SESSION_MANAGER=local/zephyrus-ubuntu:@/tmp/.ICE-unix/1901,unix/zephyrus-ubuntu:/tmp/.ICE-unix/1901
QT_ACCESSIBILITY=1
COLORTERM=truecolor
XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg
...
Export

The export keyword can be used to "promote" a shell variable to an environment variable. Environment variables that are to be setup at login, are export'ed in ~/.profile or ~/.bash_profile.

Wrting scripts

^P in an active terminal session brings up the last used command. By using the fc command, we can save our current command to a file, in our preferred editor (the editor is set with an env variable).

Handling I/O

printf respects escape characters unlike echo and can be used whenever something has to be output.
read var can be used to get a string of text from STDIN and store it in a variable var.

Command line arguments

Any arguments that were passed to the script can be accessed using $ followed by the index of the argument (starting from 1, 0 is the actual command that was used to invoke the script).
Important example

example arg1 arg2 arg3
  • $1 would be arg1, $2 would be arg2 and so on
  • $# contains the total number of arguments (doesn't count $0)
  • $* contains all the arguments
  • $? contains the exit code of the previously executed command or script
Control flow

The terminator for an if statement if fi. Else-if clauses are written as elif and else clauses are just else.
Syntax example

if [ $base -eq 1 ] && [ $dm -eq 1 ]; 
	then installDMBase 
elif [ $base -ne 1 ] && [ $dm -eq 1 ]; 
	then installBase 
elif [ $base -eq 1 ] && [ $dm -ne 1 ]; 
	then installDM 
else echo '==> Installing nothing'
fi

The [] are used to invoke test. test is a command that is used to compare two arguments. It returns the evaluated boolean as an exit code, 0 for false and 1 for true.
Example

wahid@zephyrus-ubuntu:~$ test 23 -eq 23
wahid@zephyrus-ubuntu:~$ echo $?
0
wahid@zephyrus-ubuntu:~$ test 23 -le 2
wahid@zephyrus-ubuntu:~$ echo $?
1

However, nowadays, test's functionality is built into the shell and /bin/test is not actually invoked.

Operators

Comparision operators are not listed here as there are very few and easy to remember. One that should be noted are

  • -n x evaluates to true if x is not null
  • -z x evaluates to true if x is null
  • <, > when used must be double bracketed or escaped so they don't get misinterpreted as redirection operators
    sh also has operators for files. They are -
  • -d to check if file exists and is a directory
  • -f to check if file exists and is a regular file
  • -r to check if read perms
  • -w to check if write perms
  • -s to check if file exists and is not empty
  • -e to check if file exists
  • -nt and -ot for newer than and older than
Case statement

The unusual thing to note here is that each pattern to be matched ends with a ) and after the case is handled, it is closed with ;;. A case is compared with a specific pattern usinig the in keyword. After all cases, have been handled, the statement is closed with esac.
Example

case "$1" in
1) echo "Sending SIGHUP signal....."
    kill -SIGHUP $2 ;;
2) echo  "Sending SIGINT signal....."
    kill -SIGINT $2 ;;
3) echo  "Sending SIGQUIT signal....."
    kill -SIGQUIT $2 ;;
4) echo  "Sending SIGKILL signals....."
   kill -SIGKILL $2 ;;
*) echo "Signal number $1 has not been processed" ;;
esac
Loops
For loops

for var in ...; do syntax can be used for iterative loops. Space seperated arguments after the in keyword are treated as a list and are iterated over as var. A for loop is closed with a done keyword.
If the in ... part is left out, then the command line arguments are iterated over in a global for loop. If however, the for loop is part of a function, then the function arguments are iterated over.
The iterative list that is provided also supports wildcards just like the command line does.
Bash also supports traditional for loops like -

for (( i=0 ; i < $CPU_COUNT ; i++ )); 
	do CPU_LIST="$CPU_LIST $i" 
done
While loops

The general syntax is -

while command; do
	...
done

We can pass in a command to a while clause, and it will loop until the command returns a non zero exit code. This allows us to do stuff like this for example -

while read line; do
	echo "$line"
done
Arithmetic

To force arithmetic between variables instead of the default (concatenation) we use $((...))). $a+$b would concat the two variables, whereas $((a+b)) would add them. As a general rule of thumb, use double parantheses whenever arithmetic is needed.

Execution

A shell script (sh) can consist of nothing but commands. The first line has what is called a shebang - #! followed by the location of the interpreter that should be used to execute the script.

  • #!/bin/sh would execute the script using sh
  • We can also use other interpreters or commands to run our script, provided they're in PATH by doing #!/usr/bin/env python, for example
    We will also have to make the script executable by setting its execute bit. It is bad practice to use setuid bits for managing security, and instead sudo should be used.
    If the shebang is set, we can invoke the script directly by its name to run it. Or we can pass the script path as argument to sh or bash or whatever interpreter we are using.

Disk partitioning

Partitioning schemes

There are two partitioning schemes -

Listing block devices

Block devices are special types of files that allow for buffered access to any hardware. To list all block devices, we use the lsblk utility.
Example from my laptop

❯ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
nvme0n1     259:0    0 953.9G  0 disk 
├─nvme0n1p1 259:1    0   260M  0 part 
├─nvme0n1p2 259:2    0    16M  0 part 
├─nvme0n1p3 259:3    0 554.7G  0 part 
├─nvme0n1p4 259:4    0    16G  0 part [SWAP]
├─nvme0n1p5 259:5    0   240G  0 part /
├─nvme0n1p6 259:6    0   750M  0 part 
├─nvme0n1p7 259:7    0    22G  0 part 
├─nvme0n1p8 259:8    0   200M  0 part 
└─nvme0n1p9 259:9    0   120G  0 part 

Partitioning disks

Once we identify the device file corresponding to the disk we want to partition, we can use utilities like fdsik, cfdisk, gparted.gparted is a GUI based tool, it is also what Ubuntu's installer uses.

Using fdisk

  • fdisk is a safe bet, as most linux distros ship with it. Basic usage is fdisk /dev/sdx
  • fdisk has a bunch of commands. These also have their shorthand notations, usually as one letter
    • g : GPT partition table
    • n : Create new partition
    • And so on, a complete list can be viewed by using help as a command
  • When creating a new partition, it asks for
    1. Partition number
    2. First sector
    3. Last sector
  • It is smart enought to figure out the partition number and auto increment for successive partitions. It is also smart enought to figure out the first sector based on the partitions that have already been made. The last sector part is what we really need to focus on.
  • If the last sector is input to be +20G for example, the last sector will be set to first_sector + 20GBytes. Basically, +size will create a new partition starting at the first sector we provide, that is as big as the size we provide.

Formatting partitions

Once we have partitioned the disk, we format each partition based on what its function should be. There are lots of different filesystem types, a couple of them are discussed at types of filesystems. Partitions are formatted with the mkfs.<type> commands. A list of available commands can be found at /sbin.

ls /sbin/ | grep mkfs
mkfs
mkfs.bfs
mkfs.btrfs
mkfs.cramfs
mkfs.ext2
mkfs.ext3
mkfs.ext4
mkfs.minix
mkfs.ntfs

Example
To format a partition, say /dev/sda1 as ext4. We would do mkfs.ext4 /dev/sda1.

For mounting partitions, refer mount.