Linux snippets
A collection of useful Linux (and macOS) command-line snippets for data science workflows. All commands are compatible with zsh and bash shells.
List files by size (descending)
Find files larger than 100MB
| find . -type f -size +100M
|
Count lines in all CSV files in a directory
Show top 10 memory-consuming processes
| ps aux --sort=-%mem | head -n 11
|
Search for a pattern in all Python files
| grep -rnw . -e 'pattern' --include=*.py
|
Replace text in multiple files (in-place)
| sed -i '' 's/oldtext/newtext/g' *.txt
|
Download a file from the internet
| curl -O https://example.com/file.csv
|
Monitor disk usage in current directory
Show GPU usage (NVIDIA)
Sample output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 | Wed Jun 11 12:18:15 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 PCIe Off | 00000000:00:08.0 Off | 0 |
| N/A 37C P0 117W / 350W | 27085MiB / 81559MiB | 51% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA H100 PCIe Off | 00000000:00:09.0 Off | 0 |
| N/A 38C P0 53W / 350W | 3MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 213705 C /usr/bin/python3 27076MiB |
+---------------------------------------------------------------------------------------+
|
Check Python package versions in environment
Create a virtual environment (Python 3)
| python3 -m venv venv
source venv/bin/activate
|
Kill a process by name
Count unique values in a CSV column
| cut -d, -f2 file.csv | sort | uniq -c | sort -nr
|
Preview a CSV file (first 5 rows)
Check open ports
| lsof -i -P -n | grep LISTEN
|
Download all images from a webpage
| wget -nd -r -P ./images -A jpg,jpeg,png,gif http://example.com
|
Show the 10 largest files in a directory tree
| find . -type f -exec du -h {} + | sort -rh | head -n 10
|
Remove all files except .csv in a directory
| find . ! -name '*.csv' -type f -delete
|
Split a large CSV into smaller files (1000 lines each)
| split -l 1000 bigfile.csv smallfile_
|
Find the 10 largest directories in the current directory
| du -h --max-depth=1 | sort -hr | head -n 10
|
Feel free to copy, modify, and combine these snippets for your data science projects!