Because of where AppScope sits in an application, we can easily observe all the traffic coming to and from the filesystem. This may be interesting for basic observability, to be able to answer questions like: "Why is this application consuming so much filesystem I/O?" "What files does this application open?" "What configuration files does this application use?" As noted earlier, we have nginx
running in the background. Let's use the scope
CLI paired with some basic command line utilities to answer the question "What files has this application opened?"
Nginx Open Files
- In Terminal 1, run:
scope events --id 1 -s fs.open
This gives us the last 20 files nginx
has opened. We're using --id
to tell scope
to access session 1, which is running nginx
in the background. -s
tells scope
to filter to source fs.open
, which contains only filesystem open events. We can scroll through lists of files that nginx
has opened, but we can also pair up the scope
CLI with other utilities like jq
and classic utilities like sort
and uniq
to answer "What files has this application opened?" more definitively.
Nginx Open Files
- In Terminal 1, run:
scope events --id 1 -a -j | jq -r '.data.file' | sort | uniq
A few important things to note about this slightly more complex command:
-a
says to output all events, not just the default last 20.-j
outputs events as JSON.jq
filters down to just the file names out of those events.sort
anduniq
help us find only the unique filenames that have been opened.
This is very powerful. We can take structured data and using basic utilities answer an interesting question. Instead of guessing, we know that nginx
is opening up SSL Cert Keys, OpenSSL configurations, /etc/passwd
, its own configurations, and log files. In addition, we're doing this on one of the most popular web servers, written in C, which has historically been extremely difficult to instrument. Lastly, in order to get this info, again, we only needed to prepend scope
to the nginx
command.
Log Files
One of the most important ways to observe an application is by reading its log files. Without AppScope, usually this data is collected by a log agent configured to tail log files written by the application. This is tried and true and it's working well on millions upon millions of hosts. But a few problems with this approach are emerging. In containerized workloads, we can't easily have one agent collect application logs for every application, because those logs are buried in a container.
As a result, we're seeing a pattern emerge of attaching sidecars to those running containers to pick up log files in the container. This works, but it requires running an agent that was designed to collect many log files from an OS instance and instead running many many copies of it, once per application. Sidecars are consuming very significant resources in containerized deployments because log agents weren't designed to scale down that small.
AppScope makes this significantly better. Because AppScope is inside the application, it sees all the bytes the application writes to the filesystem as it's writing them. With simple heuristics, we can detect that data is log data, and write it to disk in a structured way or forward it along to a logging tool. In the preview section, as we were looking at scope.yml
, there was a relevant configuration:
This is not an action to copy/paste or execute, it's merely showing an example configuration:
watch:
- type: file
name: '[\s\/\\\.]log[s]?[\/\\\.]?'
value: .*
This regular expression configures AppScope to watch for file traffic to files that contain log
in the path or the filename. Let's see how this works in practice. We have a simple script in our environment to demonstrate how this works, log.py
. Let's look at log.py
:
Simple Python Script
- In Terminal 1, run:
bat log.py
You may not know Python, but the script you've just listed should be pretty easy to read. This script outputs data to two files. One file, wontsee.txt
, does not contain log
in the name, so AppScope will see the traffic but not the contents. But willsee.log
does contain log
in the name, so AppScope will output its contents as well. Let's see it in action, by running the script we just listed, and then scope
'ing its execution.
Scoping Python
- In Terminal 1, run:
scope python3 log.py
- In Terminal 1, run:
scope events -s fs.close
There are a number of notable things from this scope
session. First, python
opens a lot of files, and we can see that easily without needing to do anything to python to observe it. Second, we can see that our script does open willsee.log
and wontsee.txt
. Now, let's look at the file
event and see the contents:
Log Event
- In Terminal 1, run:
scope events -t file
There, we can clearly see the log message output by our simple script. Using libscope.so
as an agent, we can easily now pick up log data with essentially zero configuration. Often, as an operator or security person, we don't even know where our application is writing log data. With AppScope, we can eliminate the need to know every log file being written because we can see it as it's being written. We can also eliminate resource waste created by sidecars with our lightweight instrumentation library.
Filesystem and log traffic is super interesting, but scope
really starts to get interesting as we can observe an application's interaction with others over the network. Next, we'll start looking at one of the simplest network programs, nc
.