In Python 2, input() is eval(raw_input())
Building the Easyplotlib Structure
The very first step I took while developing Easyplotlib was to define the communication protocol between the frontend and backend. To briefly explain this project, its main function is to automatically generate code according to the user’s requirements (though it’s not AI), execute it, and then show the results. Since the project dynamically generates and executes code, I was very mindful of the risk of introducing ACE (Arbitrary Code Execution) vulnerabilities if the structure was poorly designed.
Arbitrary Code Execution Vulnerability
Generating code directly in the browser and sending it to the server for execution is far too simple, but anyone with even a basic understanding of security would know to absolutely avoid this development method. The server might expect harmless code like matplotlib.pyplot.plot()
, but in reality, code like the following could be sent:
import subprocess
subprocess.run(['rm', '-rf', '/', '--no-preserve-root'])
Python 2’s input() Is Dangerous
For example, in Python 2, there were two functions for receiving user input from the terminal: raw_input()
and input()
. raw_input()
always returns a str type, just like Python 3’s input()
, but input()
would automatically cast the input if possible, which made it popular when I first learned Python.
# Python 2
>>> raw_input()
3 # input
'3' # output
>>> input()
3 # input
3 # output
But the shocking truth is that Python 2’s input() implementation was actually eval(raw_input())
! The security issues that could arise from this are almost unimaginable.
Another example of ACE is buffer overflow attacks in languages like C or C++ that allow direct access to memory. These can also be considered a type of ACE vulnerability, since the concept is the same: executing code in user input areas that would not normally be executed.
Of course, there are many ways to prevent the above ACE vulnerabilities. In Python, you could use a custom language compiled with file and networking features removed, add a virtualization layer using AWS, or restrict certain functions with pattern matching or external packages like RestrictedPython. But the best approach is to prevent ACE from the start unless it’s absolutely necessary, like in online judges such as Baekjoon.
Easyplotlib Request Design
Actually, Easyplotlib never considered such a problematic design in the first place, since the data could be structured. The only issue was that the structured data was so complex that writing code to validate it from scratch seemed quite tedious.
Click the card below to see a ‘rough version’ of the data definition. It’s not structured as a JSON schema, and since it’s taken directly from the design document, it’s not very clean.
View Structured Data (JSON)
- request_id : Is uuid4
- figure
- size
- row : Is numeric, plt.subplot(row, _)
- column : Is numeric, plt.subplot(_, column)
- axes [List] : length of
figure.size.row
- [List] : length of
figure.size.column
- Is one of
axes[].name
or null. Null axes will not be rendered, as it never exists
- Is one of
- [List] : length of
- style
- Every possible key-value pairs are defined at figure-style
- size
- axes [List]
- name : Is string
- plot [List]
- Is one of
plot[].name
- Is one of
- style
- Every possible key-value pairs are defined at axes-style
- plot [List]
- name : Is string
- format : Every possible values are defined at plot-format-list
- data
- key: Depending on
plot[].format
, there are different required and optional keys. Check plot-format-list. - value: Is one of
data[].name
- key: Depending on
- style
- Every possible key-value pairs are defined at plot-style
- data [List]
- name : Is string
- value [List]
- Is numeric
Manually validating this long JSON would be error-prone, so I soon started researching the Pydantic package, which I had read is recommended for receiving clean data during FastAPI development.
Validation with the Pydantic Package
Pydantic offers many features, but its main selling point is data validation. To avoid making things too complicated, let’s consider the following situation:
class User:
name : str # Name
age : int # Age
spouce : Optional[User] # Spouse
friends : List[User] # Friends
In Python, type hints can help linters, but they don’t guarantee actual data type consistency. In the extreme case above, the following code would not raise any error unless you explicitly validate it (e.g., User(spouce=None).spouce.name
). Some use cases for Pydantic include:
- Type enforcement
- Forcing specific data to follow type hints
- Allowing automatic casting if possible (e.g.,
'23' -> 23
) - Disabling auto-casting or raising
ValidationError
if casting is not possible - Handling unspecified entries with error, ignore, or custom conditions
- Complex validation
- Adding complex conditions beyond type checks easily via arguments
- For integers: min/max, for strings: length, regex, etc.
- For overly complex conditions, you can write custom validation functions
- Adding complex conditions beyond type checks easily via arguments
- Model validation
- Validating not just individual entries, but also inter-field relationships (custom functions)
- For example,
if(self.spouce) assert(self.spouce.name != self.name)
can be set as a function
- For example,
- Validating not just individual entries, but also inter-field relationships (custom functions)
- Class-JSON conversion support
- Convert class instances to JSON and vice versa
- All validation is performed automatically during class instantiation and everywhere else, with no extra function calls
In fact, you can use it anywhere user input sanitization is needed. I regret not learning this sooner, as it would have saved me a lot of trouble in past projects.
Integrating Logfire and Pydantic
On the Pydantic official site, there’s a logging tool called Logfire that’s compatible with Pydantic. I found this package interesting, and after looking at code from an old project, it made sense why.
# Old project code (partially modified)
def putLog(name, action=None, data=None):
if action is None:
logging.info("{:<29} | {:<8} |",
get_current_time_ISO(), name
)
elif data is None:
logging.info("{:<29} | {:<8} | action = {:<20}".format(
get_current_time_ISO(), name, action
))
else:
logging.info("{:<29} | {:<8} | action = {:<20} | data = {}".format(
get_current_time_ISO(), name, action, data
))
putLog("SystemComponentA", "auto-sync", "[sync-ratio = {:.3f}]".format(
theory_max
))
I used to log like this, but there was too much repetitive code and a high possibility of errors, which was not ideal.
What I Learned…
The reason I hadn’t learned Pydantic until now was simply that I never needed such complex data validation logic before. As always, when business logic gets complicated, there’s a limit to ad-hoc solutions, and that’s when you have to learn new technologies. And after learning something new, I always wonder, “Why didn’t I use this earlier?”
But in the end, you learn when you feel the need. And the fastest place to feel that need is always a project. Whether it’s the single responsibility principle, testing, coverage, AWS, Python, Websocket, Agile, or OS, I learned them because I needed them. Inductively, I guess my future self will learn Logfire too, when the need arises. I think it is a great blessing to enjoy learning.
(Update: 2025.05.20.) I actually did end up learning it. Unlike my previous ad-hoc Python development, using the Java Spring Boot framework showed me that these issues are already formalized solutions. The Spring equivalent of Pydantic is the Validation framework, and for Logfire, you can use Logback in Java Spring.
I also realized that these issues are ultimately connected to operations. Unlike simple development environments, real operating services make logging, observability, and error handling extremely important. This experience was a big help in understanding Aspect Oriented Programming, which separates these concerns from business logic, when I first learned Spring Boot. ■