Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes. The Mapper's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.
When a Mapper is created to associate a Table object with a class, all of the columns defined in the Table object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors also keep track of changes to object attributes; these changes will be stored to the database when the application "commits" the current transactional context (known as a Unit of Work). The __init__() method of the object is also decorated to communicate changes when new instances of the object are created.
The Mapper also provides the interface by which instances of the object are loaded from the database. The primary method for this is its select() method, which has similar arguments to a sqlalchemy.sql.Select object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.
The three elements to be defined, i.e. the Table metadata, the user-defined class, and the Mapper, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.
This is the simplest form of a full "round trip" of creating table meta data, creating a class, mapping the class to the table, getting some results, and saving changes. For each concept, the following sections will dig in deeper to the available capabilities.
from sqlalchemy import * # engine engine = create_engine("sqlite://mydb.db") # table metadata users = Table('users', engine, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # class definition class User(object): pass # create a mapper usermapper = mapper(User, users) # select sqluser = usermapper.select_by(user_name='fred')[0]
# modify user.user_name = 'fred jones' # commit - saves everything that changed sqlobjectstore.commit()
For convenience's sake, the Mapper can be attached as an attribute on the class itself as well:
User.mapper = mapper(User, users) userlist = User.mapper.select_by(user_id=12)
There is also a full-blown "monkeypatch" function that creates a primary mapper, attaches the above mapper class property, and also the methods get, get_by, select, select_by, selectone, commit and delete:
assign_mapper(User, users) userlist = User.select_by(user_id=12)
Other methods of associating mappers and finder methods with their corresponding classes, such as via common base classes or mixins, can be devised as well. SQLAlchemy does not aim to dictate application architecture and will always allow the broadest variety of architectural patterns, but may include more helper objects and suggested architectures in the future.
A common request is the ability to create custom class properties that override the behavior of setting/getting an attribute. Currently, the easiest way to do this in SQLAlchemy is just how its done normally; define your attribute with a different name, such as "_attribute", and use a property to get/set its value. The mapper just needs to be told of the special name:
class MyClass(object): def _set_email(self, email): self._email = email def _get_email(self, email): return self._email email = property(_get_email, _set_email) m = mapper(MyClass, mytable, properties = { # map the '_email' attribute to the "email" column # on the table '_email': mytable.c.email })
In a later release, SQLAlchemy will also allow _get_email and _set_email to be attached directly to the "email" property created by the mapper, and will also allow this association to occur via decorators.
There are a variety of ways to select from a mapper. These range from minimalist to explicit. Below is a synopsis of the these methods:
# select_by, using property names or column names as keys # the keys are grouped together by an AND operator result = mapper.select_by(name='john', street='123 green street') # select_by can also combine SQL criterion with key/value properties result = mapper.select_by(users.c.user_name=='john', addresses.c.zip_code=='12345, street='123 green street') # get_by, which takes the same arguments as select_by # returns a single scalar result or None if no results user = mapper.get_by(id=12) # "dynamic" versions of select_by and get_by - everything past the # "select_by_" or "get_by_" is used as the key, and the function argument # as the value result = mapper.select_by_name('fred') u = mapper.get_by_name('fred') # get an object directly from its primary key. this will bypass the SQL # call if the object has already been loaded u = mapper.get(15) # get an object that has a composite primary key of three columns. # the order of the arguments matches that of the table meta data. myobj = mapper.get(27, 3, 'receipts') # using a WHERE criterion result = mapper.select(or_(users.c.user_name == 'john', users.c.user_name=='fred')) # using a WHERE criterion to get a scalar u = mapper.selectone(users.c.user_name=='john') # using a full select object result = mapper.select(users.select(users.c.user_name=='john')) # using straight text result = mapper.select_text("select * from users where user_name='fred'") # or using a "text" object result = mapper.select(text("select * from users where user_name='fred'", engine=engine))
The last few examples above show the usage of the mapper's table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c to the class itself, which can be used just like the table metadata to access the columns of the table:
User.mapper = mapper(User, users) userlist = User.mapper.select(User.c.user_id==12)
When objects corresponding to mapped classes are created or manipulated, all changes are logged by a package called sqlalchemy.mapping.objectstore. The changes are then written to the database when an application calls objectstore.commit(). This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The commit() operation uses a transaction as well, and will also perform "concurrency checking" to insure the proper number of rows were in fact affected (not supported with the current MySQL drivers). Transactional resources are used effectively in all cases; the unit of work handles all the details.
When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes, and its __init__() method is also decorated to mark new objects as "new".
User.mapper = mapper(User, users) # create a new User myuser = User() myuser.user_name = 'jane' myuser.password = 'hello123' # create another new User myuser2 = User() myuser2.user_name = 'ed' myuser2.password = 'lalalala' # load a third User from the database sqlmyuser3 = User.mapper.select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones' # save all changes sqlobjectstore.commit()
In the examples above, we defined a User class with basically no properties or methods. Theres no particular reason it has to be this way, the class can explicitly set up whatever properties it wants, whether or not they will be managed by the mapper. It can also specify a constructor, with the restriction that the constructor is able to function with no arguments being passed to it (this restriction can be lifted with some extra parameters to the mapper; more on that later):
class User(object): def __init__(self, user_name = None, password = None): self.user_id = None self.user_name = user_name self.password = password def get_name(self): return self.user_name def __repr__(self): return "User id %s name %s password %s" % (repr(self.user_id), repr(self.user_name), repr(self.password)) User.mapper = mapper(User, users) u = User('john', 'foo') sqlobjectstore.commit()
>>> u User id 1 name 'john' password 'foo'
Recent versions of SQLAlchemy will only put modified object attributes columns into the UPDATE statements generated upon commit. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.
So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation function provided by the mapper module. So with our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:
from sqlalchemy import * engine = create_engine('sqlite', {'filename':'mydb'}) # define user table users = Table('users', engine, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # define user address table addresses = Table('addresses', engine, Column('address_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) )
Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper wants to indicate a relation of one object to another, this ForeignKey object is the default method by which the relationship is determined (although if you didn't define ForeignKeys, or you want to specify explicit relationship columns, that is available as well).
So then lets define two classes, the familiar User class, as well as an Address class:
class User(object): def __init__(self, user_name = None, password = None): self.user_name = user_name self.password = password class Address(object): def __init__(self, street=None, city=None, state=None, zip=None): self.street = street self.city = city self.state = state self.zip = zip
And then a Mapper that will define a relationship of the User and the Address classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties which is a dictionary relating the name of an object property to a database relationship, in this case a relation object against a newly defined mapper for the Address class:
User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses)) } )
Lets do some operations with these classes and see what happens:
u = User('jane', 'hihilala') u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543')) u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923')) objectstore.commit()
A lot just happened there! The Mapper object figured out how to relate rows in the addresses table to the users table, and also upon commit had to determine the proper order in which to insert rows. After the insert, all the User and Address objects have all their new primary and foreign keys populated.
Also notice that when we created a Mapper on the User class which defined an 'addresses' relation, the newly created User instance magically had an "addresses" attribute which behaved like a list. This list is in reality a property accessor function, which returns an instance of sqlalchemy.util.HistoryArraySet, which fulfills the full set of Python list accessors, but maintains a unique set of objects (based on their in-memory identity), and also tracks additions and deletions to the list:
del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) objectstore.commit()
So our one address that was removed from the list, was updated to have a user_id of None, and a new address object was inserted to correspond to the new Address added to the User. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the private=True parameter of relation:
User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), private=True) } ) del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) objectstore.commit()
In this case, with the private flag set, the element that was removed from the addresses list was also removed from the database. By specifying the private flag on a relation, it is indicated to the Mapper that these related objects exist only as children of the parent object, otherwise should be deleted.
By creating relations with the backref keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, even without any database queries being executed. Below, the User mapper is created with an "addresses" property, and the corresponding Address mapper receives a "backreference" to the User object via the property name "user":
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, backref='user') } ) u = User('fred', 'hi') a1 = Address('123 anywhere street', 'big city', 'UT', '76543') a2 = Address('1 Park Place', 'some other city', 'OK', '83923') # append a1 to u u.addresses.append(a1) # attach u to a2 a2.user = u # the bi-directional relation is maintained >>> u.addresses == [a1, a2] True >>> a1.user is user and a2.user is user True
The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property is placed on the child mapper. The default arguments to this property can be overridden using the backref() function:
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, backref=backref('user', lazy=False, private=True)) } )
Note that when overriding a backreferenced property, we re-specify the backreference as well. This will not override the existing 'addresses' property on the User class, but just sends a message to the attribute-management system that it should continue to maintain this backreference.
The mapper package has a helper function cascade_mappers() which can simplify the task of linking several mappers together. Given a list of classes and/or mappers, it identifies the foreign key relationships between the given mappers or corresponding class mappers, and creates relation() objects representing those relationships, including a backreference. Attempts to find the "secondary" table in a many-to-many relationship as well. The names of the relations are a lowercase version of the related class. In the case of one-to-many or many-to-many, the name is "pluralized", which currently is based on the English language (i.e. an 's' or 'es' added to it):
# create two mappers. the 'users' and 'addresses' tables have a foreign key # relationship mapper1 = mapper(User, users) mapper2 = mapper(Address, addresses) # cascade the two mappers together (can also specify User, Address as the arguments) cascade_mappers(mapper1, mapper2) # two new object instances u = User('user1') a = Address('test') # "addresses" and "user" property are automatically added u.addresses.append(a) print a.user
We've seen how the relation specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a Lazy Loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.
# define a mapper User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), private=True) }) # select users where username is 'jane', get the first element of the list # this will incur a load operation for the parent table user = User.mapper.select(user_name='jane')[0]
# iterate through the User object's addresses. this will incur an # immediate load of those child items for a in user.addresses:
print repr(a)
In mappers that have relationships, the select_by method and its cousins include special functionality that can be used to create joins. Just specify a key in the argument list which is not present in the primary mapper's list of properties or columns, but *is* present in the property list of one of its relationships:
sqll = User.mapper.select_by(street='123 Green Street')
The above example is shorthand for:
l = User.mapper.select(and_( Address.c.user_id==User.c.user_id, Address.c.street=='123 Green Street') )
Once the child list of Address objects is loaded, it is done loading for the lifetime of the object instance. Changes to the list will not be interfered with by subsequent loads, and upon commit those changes will be saved. Similarly, if a new User object is created and child Address objects added, a subsequent select operation which happens to touch upon that User instance, will also not affect the child list, since it is already loaded.
The issue of when the mapper actually gets brand new objects from the database versus when it assumes the in-memory version is fine the way it is, is a subject of transactional scope. Described in more detail in the Unit of Work section, for now it should be noted that the total storage of all newly created and selected objects, within the scope of the current thread, can be reset via releasing or otherwise disregarding all current object instances, and calling:
objectstore.clear()
This operation will clear out all currently mapped object instances, and subsequent select statements will load fresh copies from the databse.
To operate upon a single object, just use the remove function:
# (this function coming soon) objectstore.remove(myobject)
With just a single parameter "lazy=False" specified to the relation object, the parent and child SQL queries can be joined together.
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, lazy=False) } ) user = User.mapper.get_by(user_name='jane')
for a in user.addresses: print repr(a)
Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.
The generation of this query is also immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:
users = User.mapper.select_by(street='123 Green Street')
The join implied by passing the "street" parameter is converted into an "aliasized" clause by the eager loader, so that it does not conflict with the join used to eager load the child address objects.
The options method of mapper provides an easy way to get alternate forms of a mapper from an original one. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload(), lazyload() and noload():
# user mapper with lazy addresses User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses)) } ) # make an eager loader eagermapper = User.mapper.options(eagerload('addresses')) u = eagermapper.select() # make another mapper that wont load the addresses at all plainmapper = User.mapper.options(noload('addresses')) # multiple options can be specified mymapper = oldmapper.options(lazyload('tracker'), noload('streets'), eagerload('members')) # to specify a relation on a relation, separate the property names by a "." mymapper = oldmapper.options(eagerload('orders.items'))
The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation function can usually figure out what you want:
# a table to store a user's preferences for a site prefs = Table('user_prefs', engine, Column('pref_id', Integer, primary_key = True), Column('stylename', String(20)), Column('save_password', Boolean, nullable = False), Column('timezone', CHAR(3), nullable = False) ) # user table gets 'preference_id' column added users = Table('users', engine, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('password', String(20), nullable = False), Column('preference_id', Integer, ForeignKey("prefs.pref_id")) ) # class definition for preferences class UserPrefs(object): pass UserPrefs.mapper = mapper(UserPrefs, prefs) # address mapper Address.mapper = mapper(Address, addresses) # make a new mapper referencing everything. m = mapper(User, users, properties = dict( addresses = relation(Address.mapper, lazy=True, private=True), preferences = relation(UserPrefs.mapper, lazy=False, private=True), )) # select sqluser = m.get_by(user_name='fred')
save_password = user.preferences.save_password # modify user.preferences.stylename = 'bluesteel' sqluser.addresses.append(Address('freddy@hi.org'))
# commit sqlobjectstore.commit()
The relation function handles a basic many-to-many relationship when you specify the association table:
articles = Table('articles', engine, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords = Table('keywords', engine, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) itemkeywords = Table('article_keywords', engine, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")) ) # class definitions class Keyword(object): def __init__(self, name = None): self.keyword_name = name class Article(object): pass # define a mapper that does many-to-many on the 'itemkeywords' association # table Article.mapper = mapper(Article, articles, properties = dict( keywords = relation(mapper(Keyword, keywords), itemkeywords, lazy=False) ) ) article = Article() article.headline = 'a headline' article.body = 'this is the body' article.keywords.append(Keyword('politics')) article.keywords.append(Keyword('entertainment')) sql objectstore.commit()
# select articles based on a keyword. select_by will handle the extra joins. sqlarticles = Article.mapper.select_by(keyword_name='politics')
# modify a = articles[0] del a.keywords[:] a.keywords.append(Keyword('topstories')) a.keywords.append(Keyword('government')) # commit. individual INSERT/DELETE operations will take place only for the list # elements that changed. sql objectstore.commit()
Many to Many can also be done with an association object, that adds additional information about how two items are related. This association object is set up in basically the same way as any other mapped object. However, since an association table typically has no primary key columns, you have to tell the mapper what columns will compose its "primary key", which are the two (or more) columns involved in the association. Also, the relation function needs an additional hint as to the fact that this mapped object is an association object, via the "association" argument which points to the class or mapper representing the other side of the association.
# add "attached_by" column which will reference the user who attached this keyword itemkeywords = Table('article_keywords', engine, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")), Column('attached_by', Integer, ForeignKey("users.user_id")) ) # define an association class class KeywordAssociation(object): pass # mapper for KeywordAssociation # specify "primary key" columns manually KeywordAssociation.mapper = mapper(KeywordAssociation, itemkeywords, primary_key = [itemkeywords.c.article_id, itemkeywords.c.keyword_id], properties={ 'keyword' : relation(Keyword, lazy = False), # uses primary Keyword mapper 'user' : relation(User, lazy = True) # uses primary User mapper } ) # mappers for Users, Keywords User.mapper = mapper(User, users) Keyword.mapper = mapper(Keyword, keywords) # define the mapper. m = mapper(Article, articles, properties={ 'keywords':relation(KeywordAssociation.mapper, lazy=False, association=Keyword) } ) # bonus step - well, we do want to load the users in one shot, # so modify the mapper via an option. # this returns a new mapper with the option switched on. m2 = mapper.options(eagerload('keywords.user')) # select by keyword again sqlalist = m2.select_by(keyword_name='jacks_stories')
# user is available for a in alist: for k in a.keywords: if k.keyword.name == 'jacks_stories': print k.user.user_name