Text Processing on Linux: Time to Stop Manual Work
Opening a 5GB log file with vim just to find a single error line? Or manually updating IPs in 50 configuration files? If you’re still doing that, you’re wasting your time. When I first started, I once spent an entire afternoon manually editing configs. Meanwhile, a senior engineer took only 3 seconds with a single sed command.
In the Linux world, sed and awk are an inseparable duo. sed (Stream Editor) excels at find-and-replace tasks. Conversely, awk is incredibly efficient at extracting column-based data and generating reports. Mastering this pair gives you complete control over server data.
Try It Now: Instant Results in 5 Minutes
Let’s put aside the dry theory. Try these two common tasks below to see the difference.
1. Batch Editing File Content with sed
Suppose you need to change localhost to 127.0.0.1 in a config.txt file:
sed 's/localhost/127.0.0.1/g' config.txt
This command only prints the result to the screen for verification. To overwrite the file directly, add the -i flag:
sed -i 's/localhost/127.0.0.1/g' config.txt
2. Extracting Column Data with awk
Get a list of usernames from the /etc/passwd file (fields are delimited by :):
awk -F: '{print $1}' /etc/passwd
Very clean. Now, let’s dive deeper into practical applications for each tool.
Using sed to Edit Files Without Opening an Editor
sed works by reading line by line, applying rules, and outputting the result. The classic syntax is s/search/replace/flag.
Practical sed Tips
- Cleanup log files: Instantly delete blank lines for better visibility.
sed -i '/^$/d' system.log - Fix config file errors: Delete content from lines 10 to 20 if the configuration is incorrect.
sed '10,20d' server.conf - Add new configurations: Insert
ServerAliasimmediately after the line containingServerName.sed '/ServerName/a \ ServerAlias www.myapp.com' vhost.conf
Important Note: Never overuse sed -i when you’re unsure. A small typo in your Regex can wreck your entire Nginx configuration file. Always perform a dry run without -i first. Or for extra safety, use sed -i.bak to automatically create a backup file.
Using awk to Analyze Data Like a Pro
If sed is a text editor, then awk is a data analyst. It treats each line as a record and each word as a data field.
Essential awk Variables
$1, $2...: Column 1, Column 2…$NF: The last column (very useful when the number of columns is unknown).NR: The current line number.
Advanced Example: Resource Monitoring
Find processes consuming more than 10% RAM:
ps aux | awk '$4 > 10.0 {print $1, $11, $4}'
This command filters the process list, printing only the User, Process name, and %RAM for those exceeding the limit.
Calculate the total size of logs in a directory:
ls -l | awk '{sum+=$5} END {print "Total:", sum/1024/1024, "MB"}'
awk supports mathematical calculations and loops. It is essentially a mini-programming language right on your terminal.
Collaboration Trick: When awk and sed Join Forces
The real power emerges when you combine them via pipes (|). A real-world scenario: Find the top 10 IP addresses accessing your server from an Nginx log file and reformat them.
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -n 10
Once you have this IP list, you can use sed to wrap them into iptables block commands. Everything happens automatically.
Pro-tips for Cleaner Commands
1. Changing Delimiters in sed
When editing file paths, the forward slash / often gets messy because of the required \/ escape. Replace it with | or : for better readability:
# Hard to read: sed 's/\/var\/www\/html/\/data\/www/g'
# Easy to read: sed 's|/var/www/html|/data/www|g'
2. Quickly Getting the Last Column
No need to count columns; $NF will always return the last value of the line. This is extremely useful when processing log files with inconsistent line lengths.
Final Thoughts
Mastering awk and sed won’t make you an expert overnight. However, it will make your work more efficient and accurate. Instead of struggling manually, spend a few minutes writing a command. Once you’re comfortable, you’ll see the terminal as an incredibly powerful tool. Good luck implementing this in your systems!
