Sunday, November 12, 2023

On SQLAlchemy

I've been using SQLAlchemy and reading about it for a few months now, and I don't get it. I don't mean I don't get SQLAlchemy, I've learned how to use it. But I don't get why people use it, and are actually enthousiastic about it. 

I'm just writing this article because I can't seem to find any articles on the internet that are critical towards it, and I feel there should be one.

To explain why I don't like SQLAlchemy I will just go through all the advantages and disadvantages that are mentioned and respond to them.

Let's abbreviate SQLAlchemy to Alchemy and explain what Alchemy should do. It is an ORM (Object-Relation-Mapper, so it converts database records to Python objects, and the other way around), and it has a query generation language that allows you to write queries using method chaining. Alchemy changed it's syntax in version 2.

Is there an alternative to Alchemy? Yes you can use psycopg to perform plain SQL queries and manually write simple functions that create objects from database records.

I read SQLAlchemy 2 In Practice which is a good book, but the author doesn't even ask the question why we should use Alchemy, and implies that it is the only way to work with database. That got me thinking. 

Grinberg actually wrote the book because he created a tutorial on Flask and found that most questions were asked about Alchemy:

Many of the questions are from developers that need to know how to do something a little more complex than what I present in the article, but are unable to figure it out how to do it on their own.

Advantages

Easy to set up

If you know what to do, creating a few simple models to set up the ORM, is indeed easy to do, and creating ORM functionality manually also takes some time. So setting up the models and generating database tables is easy enough. It gets more tricky when you need to set up a many-to-many table, especially when it has extra columns. 

Alchemy gives abstraction to the backend database. So, an average developer does not have to worry about SQL statements.

This is not only not true, it's deceptive. You need to know SQL, and you need to know the the syntax of Alchemy as well. If you know how to do something in SQL and want to do it in Alchemy, you have to find out how to do it, and its syntax has many quirks and idiosyncrasies.

The transition to other databases becomes easier.

This is true, if you have used SQLite and want to switch to Postgres, and you haven't used any SQLite specific constructs, you can switch without changing any code. But this is not a big advantage. Most of the syntax is standard SQL anyway. And how often does it occur that you change one database for another? Finally, changing one database for another and expecting this has no side effects is naive. In fact it may bring quite substantial problems with it.

Readability and Maintainability: SQLAlchemy code tends to be more readable and easier to maintain than raw SQL queries. It leverages Python classes and objects to represent database tables and rows, making your code intuitive and self-explanatory.

It is true that adding a join to a query requires less code. Except from that, plain SQL is just as readable and just as easy to maintain.

Security: SQLAlchemy includes built-in protection against SQL injection. It automatically escapes user input, effectively thwarting malicious SQL code execution.

Every modern database library includes protection against SQL injection.

Queries are optimized and may work better than SQL if you wrote it yourself unless you are an SQL veteran.

This is also deceptive. Alchemy is slower by default, simply because it forms a wrapper around SQL. Furthermore, since Alchemy has a different conceptual model than SQL, it is not clear what SQL queries are actually performed. Only by learning the ins and outs of Alchemy, and tweaking many of the model declarations, will you be able to ensure that Alchemy performs efficient queries. Of course there will be some edge cases in which Alchemy queries are actually more efficient than plain SQL queries, but only the ones written by someone with little experience in SQL.

On StackOverflow, there are, at the time of writing, 16,000 questions with SQLAlchemy tag. Therefore, you should not be concerned about unsolvable bugs when using this tool.

Quite a relief!

Disadvantages

There could be instances where Alchemy might become inefficient. Therefore, knowing SQL is always desired.

When performance matters you will need to learn Alchemy from A to Z. Alchemy still allows you to use plain SQL queries, so this may be a way to escape.

Performance overhead: SQLAlchemy introduces a slight performance overhead compared to raw SQL queries. While this overhead is generally inconsequential for most applications, it may become a concern for high-performance scenarios.

True

Learning Curve: SQLAlchemy has a learning curve, particularly for developers new to ORMs. It takes some time to become proficient in SQLAlchemy’s API and concepts.

Believe me, it takes a lot of time. Alchemy has many fixes for problems it created itself. It is a completely different mental model you need to acquire and you need to switch between this model and the plain SQL model.

Additional disadvantages

These disadvantages come from me.

Alchemy diffuses domain model code with storage code.

It's good practice to separate domain model code with database storage code. It makes you aware of database queries and allows you to focus on them. Alchemy combines the two. It may create a database request when you just get an attribute. This is not a problem for a simple operation. But when you operate on thousands of objects, you will need to know exactly what Alchemy does internally, and make tweaks to the model to ensure that it performs the right kind of queries.

Asynchronous access

When developing asynchronous code for Alchemy, you need to make changes in a number of places to prevent Alchemy from making implicit database calls. Reading about that in Grinberg's book positively scared me.

Additional advantages

Alembic

To be fair, Alchemy comes with a tool that is actually quite nice, and which is not often mentioned in the advantages I read. It's Alembic, the database migration tool that generates queries based on the changes to the model you make. This allows you to keep your model up to date and have Alembic take care of migrations.

Conclusion

As this is a critical article, I recommend against using SQLAlchemy. Writing plain SQL queries is just easier, and gives you more control. Creating simple ORM functionality and writing migration scripts doesn't need to be complicated either.

References

https://www.analyticsvidhya.com/blog/2022/07/a-brief-introduction-to-sql-alchemy/

https://python.plainenglish.io/sqlalchemy-vs-raw-sql-queries-in-python-a-comparative-example-b7a838ebef82

https://trantriducs.medium.com/orm-and-sqlalchemy-the-magic-wand-in-database-management-a2d02877a57a


Monday, November 6, 2023

Some remarks on "Puzzels of Reference"

Intro

I'm reading "Puzzels of Reference" by Herman Cappelen & Josh Dever, and I have some remarks about it.

Chapter 2: A Puzzle about Belief, Reference, and Agency

The chapter is about two names that refer to the same entity, but at the same time seem to refer to two different entities.

Examples: 
  • Marilyn Monroe and Norma Mortenson
  • Clark Kent and Superman
Why is it that you can't just replace "Norma Mortenson" for "Marilyn Monroe" in the sentence

Jill just discovered that Marilyn Monroe is Norma Mortenson

It surprises me that the book doesn't even name the distinction between sense and reference that was created by Frege. They must know about it. Frege gives as example the Morning Star and Evening Star that both refer to the planet Venus.

So the answer to the puzzle is that Marilyn Monroe and Norma Mortenson are two names of senses, that have the same referent. The sense of a name is the thought it expresses.

The authors come closest to this option in their "believing under a guise", but it's much less clear than the sense solution.

To stay with Kripke's Causal-Communicative Model, you might say that if there are two senses to a name, that there are not one, but two causal chains from name to referent.

Postscript 1:

About the sentence: "Clark Kent is Superman", what does it mean? If it would mean that they are identical, you should be able to replace one name by the next. Also it proves hard to implement such identity in a system. I came up with this idea: "Clark Kent is Superman" should be compared to sentences as "Clark Kent is male". Element belongs to set. To be precise: the entity named "Clark Kent" belongs to the set of entities named "Superman".

Postscript 2:

"Superman" may also be the name of a thief pretending to be him

Superman has been seen taking gold and gems from a jewelry store. 

What would be the meaning of "Superman" in this sentence? The sense of "Superman" here is the real Superman (the thought it expresses is that it is really Superman), but the referent is the thief. The causal chain takes a twist after the sense.

Postscript 3:

Elon Musk changed Twitter into X.

Are Twitter and X two names for the same entity? Are they still the same thing? If not, what about the Morning Star and the Evening Star? Don't they have different characteristics? What does it mean they are the same, really? Is it the physical aspect? What about Superman and Clark Kent, are they fictionally physically the same? What about the caterpillar and the butterfly? Are they the same? Is it the individual? What about the tree trunk that's turned into a boat?

Old Man Oak was chopped down and turned into an oak. Now he spent his time transporting small Indians to the other shore.

Old Man Oak was chopped down and turned into millions of tiny matches. Now he spent his time lighting candles and starting furnace fires.

Isn't identity just a man-made concept that is just as fluent as every other?

To say that sense S1 and S2 are the same, means that they have the same referent. Is that important? For a computer system? Only if there is a distinction between the sense layer and the referent layer. Most systems just have one layer. Is this the sense layer or the referent layer? Is the name of a customer in a database the sense of the person or the referent? It's usually taken to represent the referent. But we could also agree that the database is about senses. It depends on the application. However to represent the sentence "Clark Kent is Superman" in a database, we must not only represent Clark Kent and Superman as entities in the database, but also a third, more abstract entity $9822811. Only then we can say: 

name($797483, "Clark Kent")
name($188294, "Superman") 
referent(
$797483, $9822811)
referent($188294, $9822811)

and this would then represent the sentence

Clark Kent is Superman

Postscript 4:

I now think a name has a meaning, a subjective referent and an collective referent. And that none of these are required. 

The meaning is defined by a logical expression. So this is different from the sense of Frege, which is just a thought, and thus individual to a person.

The referent is the entity the name refers to. One can hold that the referent can't be fictional, but to me this is just a matter of choice, and we could just as well agree that fictional entities can be referents.

A subjective referent is the entity that an individual means by a name. A collective referent is what's meant by the name in general. Usually they are the same, of course.

Some examples:

  • No meaning, only referent: Proper names like John McCarthy
  • No referent, only meaning: Jack the Ripper (before we knew who it was; well there always was the subjective referent of the murderer himself)
  • Both meaning and referent: The Morning Star (meaning: seen in the eastern sky before or at sunrise), Superman (meaning: the guy with the cape and the letter S on his chest), Octavianus (meaning: before 27 BC)
  • No meaning, no referent: Jabberwocky
Octavianus was later called Augustus, so when we name the man we must take into account the time of the event. Same for Cassius Clay / Muhammad Ali.

Postscript 5:

There's also a solution of this problem possible that I would call the cognitive variant. It would say that a referent is just a theoretical construct that has no basis in the human mind and no relevance to ordinary communication.

To a human, a name belongs to an idea. John McCarthy is an idea, and so is Superman and Jabberwocky. It's a mental entity with a name and possibly associated with perceptions and knowledge.

Leaving referents to scientists and logicians, we continue with just ideas. Can we solve all our problems with them?

Clark Kent is Superman: this means: there exists a two-way relationship between CK and S that inherits information from the other idea. Starting out with two ideas, Clark Kent and Superman, we learn that they are the same thing. At that point we form a mental connection between the two ideas, the connection is-a. This connection is an active connection. Now, whenever we need information about Clark Kent, and we don't find it, we also make a deduction: since Superman is-a Clark Kent, we can also take this information from him. It also works the other way. It may be called two-way prototypical inheritance.

Superman has been seen taking gold and gems from a jewelry store: this is not problematic at all: the idea of Superman in the minds of the newspaper readers is just extended with an extra, be it shocking, event. There is no referent in the mind, so there's no problem either.

After 27BC, Octavius was called Augustus: with the new name, Augustus, we also create a new entity. Everything we knew about Octavius is copied to the new entity. And we create the relation is. We may be able to do a shallow copy, which means that we don't actually copy anything, but when something changes in one entity, it won't change in the other entity.

So, is a name a causal chain (Kripke) or a description (Frege)? Well, there is a chain of ideas, but it doesn't end in the referent. It ends with the first person that had the idea. And there are also descriptions of ideas, but they don't describe the idea uniquely, and there may be many descriptions for the same idea.

An idea may be linked to a perception. And we form an idea based on our perception or on our thoughts. And we give this idea a name, to make it easier to talk about. This is the origin of the name, but it's one step away of the real referent, which is the object or person we perceived. But referents are problematic. Does a rainbow have a referent? It has no objective existence. Does that matter? Only when you're interested in objective existence.

Postscript 6


People use constructs to think. Constructs may be real, which means that they have a substrate in physical reality, or fictional. 

The relationship between a construct and reality is an informational relation, no other connection exists. This relation is neither physical, nor imaginary, but exists nevertheless.

A construct can be an entity or a statement or relation about multiple entities.

A name is an attribute of a construct. 

A construct refers to an entity. I think we must let go of the idea that a referent must be real. The entity may be real or fictional. A real entity has a physical substrate.  A fictional entity has a mental substrate (it exists in the imagination of people).

The truth of a real construct should be found in reality. Only reality determines the truth value of a real construct. 

The truth of a fictional construct is decided by the author of the entity. Only J.K. Rowling may decide what happens next to Harry Potter.

The identification of constructs (i.e. "Clark Kent is Superman") doesn't change the relationship between the constructs and the object in reality/imagination, because this relationship is purely informational, and it already existed. Instead, both constructs are extended with an equals relationship. This relationship can be used in deductions about the constructs from that moment on ("Since Clark Kent is Superman, Clark Kent was in London when Superman met his nemesis").

Chapter 3: A Puzzle about Referring to What Isn't There

The chapter is about names like "Sherlock Holmes". Sherlock Holmes doesn't exist, and so how can you refer to him?

Well, existence (reality) is contrasted with fiction. So you can't refer to the real Sherlock Holmes, but you can still refer to the fictional Sherlock Holmes. And anything you say about Sherlock Holmes can have a truth value in the fictional world:

  • Sherlock Holmes lived at 221B Baker Street (true)
  • Sherlock Holmes doesn't exist (true)
  • Sherlock Holmes is a detective (true)
  • Sherlock Holmes was first introduced in A Study in Scarlet (true)
  • Sherlock Holmes was a fictional character who was the basis for the television character Gregory House (I have no idea, but whatever)

In "Sherlock Holmes is a detective" the word "is" is a copula, that has no relation to existence.

In "Sherlock Holmes lived at 221B Baker Street", living seems to presuppose existence, but it is simply a very creative verb. "the living planet", "his ideas are still very much alive", "the life cycle of software components".

The book gives us 3 "Realist Views" (that claim that these names refer). Of these "Creationism" comes close to what I think is the case: "N-names refer to fictional characters". But then the book says "Because Sherlock Holmes is an abstract entity created by human activity, Sherlock Holmes is not a detective. Detectives are concrete, not abstract." Which sounds logical, but as abstract is simply another word for fictional in this paragraph, it flatly denies exactly what I am trying to propose, without reason. Then follows the odd claim that "creationists" don't believe in their own stance: "Creationists take typical in-fiction claims to be false." 

The puzzle can simply be solved by allowing references to be about fictional entities. A thought that's simply unacceptable to the authors, it seems. 

The reason for this probably goes back to Frege, who made reference about existing entities, because he aimed at scientific truths and reasoning. However, in the common use of language, references can very well be about fictional entities.



Sunday, September 10, 2023

Combining the logical and imperative paradigm in a programming language

For my natural language execution engine I created a logical programming language to define the meaning of natural language phrases, and to specify the details of the execution. The language was a simplified version of Prolog, but I added expressions, conditionals, loops and mutable variables later, to make it easier to use. It got easier to use, but it also became more unpredictable to the application programmer (which is just me for the moment). 

The problem was that the logical constructs could create many bindings and one would just not expect that from an imperative language. A simple for loop would add multiple variables to the application and each goal of the body would create not just one set of bindings, but one for every iteration of the loop. The number of bindings could easily become unwieldy and extra care had to be taken to suppress them. This could be done by adding an extra set of mutable variables. It was possible, but very uncomfortable.      

After rethinking the language I decided to split the logical part and the imperative part and gave them each a separate language construct. 

Logical


A logical programming language has facts and inference rules, like this:

male(peter).
male(john).
female(mary).
parent(peter, john).
parent(mary, john).
father(X, Y) :- male(X), parent(X, Y).
mother(X, Y) :- female(X), parent(X, Y).

When a goal like father(X, john) is executed, X yields multiple variable bindings, one for each occurrence of male(X) in the system. If no males can be found, the inference rule stops, and the goal fails.

The essence of this paradigm is thus that each combination of variables in some scope, can have zero, one, or multiple bindings (values). A condition that can fail thus serves as a conditional for the rest of the goals. A condition that can have multiple bindings serves as a for loop for the rest of the goals.

My current syntax of the fact and inference rule are:

too_old(A) :- [ birth(A, Birth) A := age(Birth) A > 40 ]

The conditions can be only literals (birth(A, Birth)), assignments, and boolean expressions. It has no loops or conditionals. The variables are immutable and can hold multiple bindings.

This construct is great for definitions, and to interact with the database.

Imperative

A imperative programming language has functions (procedures) like this:

function hypothenuse(Width, Height) {
    WidthSquared = Width * Width
    HeightSquared = Height * Height
    Hypo = sqrt(WidthSquared + HeightSquared)
    return Hypo
}

Importantly, this paradigm yields only one binding for a function call. For loops and conditional much be named explicitly.

My current syntax of the function is:

hypothenuse(Width, Height) => Hypo {
    WidthSquared := Width * Width
    HeightSquared := Height * Height
    Hypo := go:sqrt(WidthSquared + HeightSquared)
}

Note that the returned value is specified by its variable, not its type, and that no explicit return statement is needed.

The body of the function can contain assignments, loops, conditional. It has no literals. All variables are mutable.

This construct is great for calculations.

Combining logical and imperative

To use logical literals in a function, use the explicit for-loop. This way it is clear that there can be multiple bindings, and any new variables have only the scope of the for loop.

for [size(E, S) E > 5] {
    /* statements */
}




On SQLAlchemy

I've been using SQLAlchemy and reading about it for a few months now, and I don't get it. I don't mean I don't get SQLAlchem...