Data Mapping
Version: 0.2.3 Last Updated: 06/16/06 21:59:41
View: Paged  |  One Page
Basic Data Mapping

Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes.

The Mapper's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.

When a Mapper is created to associate a Table object with a class, all of the columns defined in the Table object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "flushes" the current state of objects (known as a Unit of Work).

Two objects provide the primary interface for interacting with Mappers and the "unit of work" in SA 0.2, which are the Query object and the Session object. Query deals with selecting objects from the database, whereas Session provides a context for loaded objects and the ability to communicate changes on those objects back to the database.

The primary method on Query for loading objects is its select() method, which has similar arguments to a sqlalchemy.sql.Select object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.

The three configurational elements to be defined, i.e. the Table metadata, the user-defined class, and the Mapper, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.

Also, keep in mind that the examples in this section deal with explicit Session objects mapped directly to Engine objects, which represents the most explicit style of using the ORM. Options exist for how this is configured, including binding Table objects directly to Engines (described in Binding MetaData to an Engine), as well as using the "Threadlocal" plugin which provides various code shortcuts by using an implicit Session associated to the current thread (described in threadlocal).

back to section top
Synopsis

First, the metadata/mapper configuration code:

from sqlalchemy import *

# metadata
meta = MetaData()

# table object
users_table = Table('users', meta, 
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(16)),
    Column('password', String(20))
)

# class definition 
class User(object):
    pass

# create a mapper and associate it with the User class.
# technically we dont really need the 'usermapper' variable.
usermapper = mapper(User, users_table)

Note that no database definitions are required. Next we will define an Engine and connect a Session to it, and perform a simple select:

# engine
engine = create_engine("sqlite://mydb.db")

# session
session = create_session(bind_to=engine)

# select
sqluser = session.query(User).select_by(user_name='fred')[0]
# modify
user.user_name = 'fred jones'

# flush - saves everything that changed
sqlsession.flush()
back to section top
The Query Object

The method session.query(class_or_mapper) returns a Query object. Below is a synopsis of things you can do with Query:

# get a query from a Session based on class:
query = session.query(User)

# get a query from a Session given a Mapper:
query = session.query(usermapper)

# select_by, which takes keyword arguments.  the
# keyword arguments represent property names and the values
# represent values which will be compared via the = operator.
# the comparisons are joined together via "AND".
result = query.select_by(name='john', street='123 green street')

# select_by can also combine ClauseElements with key/value properties.
# all ClauseElements and keyword-based criterion are combined together
# via "AND". 
result = query.select_by(users_table.c.user_name=='john', 
        addresses_table.c.zip_code=='12345', street='123 green street')

# get_by, which takes the same arguments as select_by
# returns a single scalar result or None if no results
user = query.get_by(id=12)

# "dynamic" versions of select_by and get_by - everything past the 
# "select_by_" or "get_by_" is used as the key, and the function argument
# as the value
result = query.select_by_name('fred')
u = query.get_by_name('fred')

# get an object directly from its primary key.  this will bypass the SQL
# call if the object has already been loaded
u = query.get(15)

# get an object that has a composite primary key of three columns.
# the order of the arguments matches that of the table meta data.
myobj = query.get((27, 3, 'receipts'))

# using a WHERE criterion
result = query.select(or_(users_table.c.user_name == 'john', users_table.c.user_name=='fred'))

# using a WHERE criterion to get a scalar
u = query.selectfirst(users_table.c.user_name=='john')

# selectone() is a stricter version of selectfirst() which
# will raise an exception if there is not exactly one row
u = query.selectone(users_table.c.user_name=='john')

# using a full select object
result = query.select(users_table.select(users_table.c.user_name=='john'))

Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c to the class itself, which can be used just like the table metadata to access the columns of the table:

userlist = session.query(User).select(User.c.user_id==12)
back to section top
Saving Objects

When objects corresponding to mapped classes are created or manipulated, all changes are logged by the Session object. The changes are then written to the database when an application calls flush(). This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The flush() operation batches its SQL statements into a transaction, and can also perform optimistic concurrency checks (using a version id column) to insure the proper number of rows were in fact affected (not supported with the current MySQL drivers).

The Unit of Work is a powerful tool, and has some important concepts that should be understood in order to use it effectively. See the Session / Unit of Work section for a full description on all its operations.

When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes. New objects by default must be explicitly added to the Session, however this can be made automatic by using threadlocal or SessionContext.

mapper(User, users_table)

# create a new User
myuser = User()
myuser.user_name = 'jane'
myuser.password = 'hello123'

# create another new User      
myuser2 = User()
myuser2.user_name = 'ed'
myuser2.password = 'lalalala'

# create a Session and save them
sess = create_session()
sess.save(myuser)
sess.save(myuser2)

# load a third User from the database            
sqlmyuser3 = sess.query(User).select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones'

# save all changes            
sqlsession.flush()

The mapped class can also specify whatever methods and/or constructor it wants:

class User(object):
    def __init__(self, user_name, password):
        self.user_id = None
        self.user_name = user_name
        self.password = password
    def get_name(self):
        return self.user_name
    def __repr__(self):
        return "User id %s name %s password %s" % (repr(self.user_id), 
            repr(self.user_name), repr(self.password))
mapper(User, users_table)

sess = create_session()
u = User('john', 'foo')
sess.save(u)
sqlsession.flush()
>>> u
User id 1 name 'john' password 'foo'

SQLAlchemy will only put modified object attributes columns into the UPDATE statements generated upon flush. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.

back to section top
Defining and Using Relationships

So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation function provided by the orm module.

One to Many

So with our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:

from sqlalchemy import *

metadata = MetaData()

# define user table
users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(16)),
    Column('password', String(20))
)

# define user address table
addresses_table = Table('addresses', metadata,
    Column('address_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper wants to indicate a relation of one object to another, the ForeignKey relationships are the default method by which the relationship is determined (options also exist to describe the relationships explicitly).

So then lets define two classes, the familiar User class, as well as an Address class:

class User(object):
    def __init__(self, user_name, password):
        self.user_name = user_name
        self.password = password

class Address(object):
    def __init__(self, street, city, state, zip):
        self.street = street
        self.city = city
        self.state = state
        self.zip = zip

And then a Mapper that will define a relationship of the User and the Address classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties which is a dictionary relating the names of class attributes to database relationships, in this case a relation object against a newly defined mapper for the Address class:

mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address)
    }
  )

Lets do some operations with these classes and see what happens:

engine = create_engine('sqlite:///mydb.db')
metadata.create_all(engine)

session = create_session(bind_to=engine)

u = User('jane', 'hihilala')
u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543'))
u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923'))

session.save(u)
session.flush()
INSERT INTO users (user_name, password) VALUES (:user_name, :password)
{'password': 'hihilala', 'user_name': 'jane'}
INSERT INTO addresses (user_id, street, city, state, zip) VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'big city', 'state': 'UT', 'street': '123 anywhere street', 'user_id':1, 'zip': '76543'}
INSERT INTO addresses (user_id, street, city, state, zip) VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'some other city', 'state': 'OK', 'street': '1 Park Place', 'user_id':1, 'zip': '83923'}

A lot just happened there! The Mapper figured out how to relate rows in the addresses table to the users table, and also upon flush had to determine the proper order in which to insert rows. After the insert, all the User and Address objects have their new primary and foreign key attributes populated.

Also notice that when we created a Mapper on the User class which defined an addresses relation, the newly created User instance magically had an "addresses" attribute which behaved like a list. This list is in reality a property function which returns an instance of sqlalchemy.util.HistoryArraySet. This object fulfills the full set of Python list accessors, but maintains a unique set of objects (based on their in-memory identity), and also tracks additions and deletions to the list:

del u.addresses[1]
u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839'))

session.flush()
UPDATE addresses SET user_id=:user_id
WHERE addresses.address_id = :addresses_address_id
[{'user_id': None, 'addresses_address_id': 2}]
INSERT INTO addresses (user_id, street, city, state, zip)
VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'Houston', 'state': 'TX', 'street': '27 New Place', 'user_id': 1, 'zip': '34839'}

Note that when creating a relation with the relation() function, the target can either be a class, in which case the primary mapper for that class is used as the target, or a Mapper instance itself, as returned by the mapper() function.

back to section top
Lifecycle Relations

In the previous example, a single address was removed from the addresses attribute of a User object, resulting in the corresponding database row being updated to have a user_id of None. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the cascade parameter of relation:

clear_mappers()  # clear mappers from the previous example
mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address, cascade="all, delete-orphan")
    }
  )

del u.addresses[1]
u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839'))

session.flush()
INSERT INTO addresses (user_id, street, city, state, zip)
VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'Houston', 'state': 'TX', 'street': '27 New Place', 'user_id': 1, 'zip': '34839'}
DELETE FROM addresses WHERE addresses.address_id = :address_id
[{'address_id': 2}]

In this case, with the delete-orphan cascade rule set, the element that was removed from the addresses list was also removed from the database. Specifying cascade="all, delete-orphan" means that every persistence operation performed on the parent object will be cascaded to the child object or objects handled by the relation, and additionally that each child object cannot exist without being attached to a parent. Such a relationship indicates that the lifecycle of the Address objects are bounded by that of their parent User object.

Cascading is described fully in Cascade rules.

back to section top
Backreferences

By creating relations with the backref keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, independently of database operations. Below, the User mapper is created with an addresses property, and the corresponding Address mapper receives a "backreference" to the User object via the property name user:

Address = mapper(Address, addresses_table)
User = mapper(User, users_table, properties = {
                'addresses' : relation(Address, backref='user')
            }
          )

u = User('fred', 'hi')
a1 = Address('123 anywhere street', 'big city', 'UT', '76543')
a2 = Address('1 Park Place', 'some other city', 'OK', '83923')

# append a1 to u
u.addresses.append(a1)

# attach u to a2
a2.user = u

# the bi-directional relation is maintained
>>> u.addresses == [a1, a2]
True
>>> a1.user is user and a2.user is user
True

The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property (i.e. a second relation()) is placed on the child mapper. The default arguments to this property can be overridden using the backref() function:

mapper(User, users_table)
mapper(Address, addresses_table, properties={
    'user':relation(User, backref=backref('addresses', cascade="all, delete-orphan"))
})

The backref() function is often used to set up a bi-directional one-to-one relationship. This is because the relation() function by default creates a "one-to-many" relationship when presented with a primary key/foreign key relationship, but the backref() function can redefine the uselist property to make it a scalar:

mapper(User, users_table)
mapper(Address, addresses_table, properties={
    'user' : relation(User, backref=backref('address', uselist=False))
})
back to section top
Selecting from Relationships

We've seen how the relation specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a lazy loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.

# define a mapper
mapper(User, users_table, properties = {
      'addresses' : relation(mapper(Address, addresses_table))
    })

# select users where username is 'jane', get the first element of the list
# this will incur a load operation for the parent table
sqluser = session.query(User).select(User.c.user_name=='jane')[0]
# iterate through the User object's addresses.  this will incur an
# immediate load of those child items
sqlfor a in user.addresses:
print repr(a)
Creating Joins Across Relations

For mappers that have relationships, the select_by method of the Query object can create queries that include automatically created joins. Just specify a key in the argument list which is not present in the primary mapper's list of properties or columns, but is present in the property list of one of its relationships:

sqll = session.query(User).select_by(street='123 Green Street')

The above example is shorthand for:

l = session.query(User).select(and_(
         Address.c.user_id==User.c.user_id, 
         Address.c.street=='123 Green Street')
   )

All keyword arguments sent to select_by are used to create query criterion. This means that familiar select keyword options like order_by and limit are not directly available. To enable these options with select_by, you can try the SelectResults extension which offers methods off the result of a select or select_by such as order_by() and array slicing functions that generate new queries.

Also, select_by will not create joins derived from Column-based expressions (i.e. ClauseElement objects); the reason is that a Column-based expression may include many columns, and select_by has no way to know which columns in the expression correspond to properties and which don't (it also prefers not to dig into column expressions which may be very complex). The next section describes some ways to combine Column expressions with select_by's auto-joining capabilities.

back to section top
More Granular Join Control Using join_to, join_via

Feature Status: Alpha API

The join_to method of Query is a component of the select_by operation, and is given a keyname in order to return a "join path" from the Query's mapper to the mapper which is referenced by a relation() of the given name:

>>> q = session.query(User)
>>> j = q.join_to('addresses')
>>> print j
users.user_id=addresses.user_id

join_to can also be given the name of a column-based property, in which case it will locate a path to the nearest mapper which has that property as a column:

>>> q = session.query(User)
>>> j = q.join_to('street')
>>> print j
users.user_id=addresses.user_id

Also available is the join_via function, which is similar to join_to, except instead of traversing through all properties to find a path to the given key, its given an explicit path to the target property:

>>> q = session.query(User)
>>> j = q.join_via(['orders', 'items'])
>>> print j
users.c.user_id==orders.c.user_id AND orders.c.item_id==items.c.item_id

Expressions produced by join_to and join_via can be used with select to create more complicated query criterion across multiple relations:

>>> l = q.select(
    (addresses_table.c.street=='some address') &
    (items_table.c.item_name=='item #4') &
    q.join_to('addresses') &
    q.join_via(['orders', 'items'])
    )
back to section top
Eager Loading

With just a single parameter lazy=False specified to the relation object, the parent and child SQL queries can be joined together.

mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address, lazy=False)
    }
  )

sqlusers = session.query(User).select(User.c.user_name=='Jane')
for u in users:
    print repr(u)
    for a in u.addresses:
        print repr(a)

Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.

The generation of this query is also immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:

sqlusers = session.query(User).select_by(street='123 Green Street')

The join implied by passing the "street" parameter is stated as an additional join between the addresses and users tables. Also, since the eager join is "aliasized", no name conflict occurs.

back to section top
Using Options to Change the Loading Strategy

The options method on the Query object provides an easy way to get alternate forms of a mapper query from an original one. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload(), lazyload() and noload():

# user mapper with lazy addresses
mapper(User, users_table, properties = {
             'addresses' : relation(mapper(Address, addresses_table))
         }
)

# query object
query = session.query(User)

# make an eager loading query
eagerquery = query.options(eagerload('addresses'))
u = eagerquery.select()

# make another query that wont load the addresses at all
plainquery = query.options(noload('addresses'))

# multiple options can be specified
myquery = oldquery.options(lazyload('tracker'), noload('streets'), eagerload('members'))

# to specify a relation on a relation, separate the property names by a "."
myquery = oldquery.options(eagerload('orders.items'))
back to section top
One to One/Many to One

The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation function can usually figure out what you want:

metadata = MetaData()

# a table to store a user's preferences for a site
prefs_table = Table('user_prefs', metadata,
    Column('pref_id', Integer, primary_key = True),
    Column('stylename', String(20)),
    Column('save_password', Boolean, nullable = False),
    Column('timezone', CHAR(3), nullable = False)
)

# user table with a 'preference_id' column
users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('password', String(20), nullable = False),
    Column('preference_id', Integer, ForeignKey("user_prefs.pref_id"))
)

# engine and some test data
engine = create_engine('sqlite:///', echo=True)
metadata.create_all(engine)
engine.execute(prefs_table.insert(), dict(pref_id=1, stylename='green', save_password=1, timezone='EST'))
engine.execute(users_table.insert(), dict(user_name = 'fred', password='45nfss', preference_id=1))

# classes
class User(object):
    def __init__(self, user_name, password):
        self.user_name = user_name
        self.password = password

class UserPrefs(object):
    pass

mapper(UserPrefs, prefs_table)

mapper(User, users_table, properties = dict(
    preferences = relation(UserPrefs, lazy=False, cascade="all, delete-orphan"),
))

# select
session = create_session(bind_to=engine)
sqluser = session.query(User).get_by(user_name='fred')
save_password = user.preferences.save_password

# modify
user.preferences.stylename = 'bluesteel'

# flush
sqlsession.flush()
back to section top
Many to Many

The relation function handles a basic many-to-many relationship when you specify the association table:

metadata = MetaData()

articles_table = Table('articles', metadata,
    Column('article_id', Integer, primary_key = True),
    Column('headline', String(150), key='headline'),
    Column('body', TEXT, key='body'),
)

keywords_table = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('keyword_name', String(50))
)

itemkeywords_table = Table('article_keywords', metadata,
    Column('article_id', Integer, ForeignKey("articles.article_id")),
    Column('keyword_id', Integer, ForeignKey("keywords.keyword_id"))
)

engine = create_engine('sqlite:///')
metadata.create_all(engine)

# class definitions
class Keyword(object):
    def __init__(self, name):
        self.keyword_name = name

class Article(object):
    pass

mapper(Keyword, keywords_table)

# define a mapper that does many-to-many on the 'itemkeywords' association 
# table
mapper(Article, articles_table, properties = dict(
    keywords = relation(Keyword, secondary=itemkeywords_table, lazy=False)
    )
)

session = create_session(bind_to=engine)

article = Article()
article.headline = 'a headline'
article.body = 'this is the body'
article.keywords.append(Keyword('politics'))
article.keywords.append(Keyword('entertainment'))
session.save(article)

sqlsession.flush()
# select articles based on a keyword.  select_by will handle the extra joins.
sqlarticles = session.query(Article).select_by(keyword_name='politics')
a = articles[0]

# clear out keywords with a new list
a.keywords = []
a.keywords.append(Keyword('topstories'))
a.keywords.append(Keyword('government'))

# flush
sqlsession.flush()
back to section top
Association Object

Many to Many can also be done with an association object, that adds additional information about how two items are related. This association object is set up in basically the same way as any other mapped object. However, since an association table typically has no primary key columns, you have to tell the mapper what columns will compose its "primary key", which are the two (or more) columns involved in the association. Also, the relation function needs an additional hint as to the fact that this mapped object is an association object, via the "association" argument which points to the class or mapper representing the other side of the association.

from sqlalchemy import *
metadata = MetaData()

users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
)

articles_table = Table('articles', metadata,
    Column('article_id', Integer, primary_key = True),
    Column('headline', String(150), key='headline'),
    Column('body', TEXT, key='body'),
)

keywords_table = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('keyword_name', String(50))
)

# add "attached_by" column which will reference the user who attached this keyword
itemkeywords_table = Table('article_keywords', metadata,
    Column('article_id', Integer, ForeignKey("articles.article_id")),
    Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")),
    Column('attached_by', Integer, ForeignKey("users.user_id"))
)

engine = create_engine('sqlite:///', echo=True)
metadata.create_all(engine)

# class definitions
class User(object):
    pass
class Keyword(object):
    def __init__(self, name):
        self.keyword_name = name
class Article(object):
    pass
class KeywordAssociation(object):
    pass

mapper(User, users_table)
mapper(Keyword, keywords_table)

# mapper for KeywordAssociation
# specify "primary key" columns manually
mapper(KeywordAssociation, itemkeywords_table,
    primary_key = [itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id],
    properties={
        'keyword' : relation(Keyword, lazy = False), 
        'user' : relation(User, lazy = False) 
    }
)

# Article mapper, relates to Keyword via KeywordAssociation
mapper(Article, articles_table, properties={
    'keywords':relation(KeywordAssociation, lazy=False, association=Keyword)
    }
)

session = create_session(bind_to=engine)
# select by keyword
sqlalist = session.query(Article).select_by(keyword_name='jacks_stories')
# user is available
for a in alist:
    for k in a.keywords:
        if k.keyword.name == 'jacks_stories':
            print k.user.user_name

Keep in mind that the association object works a little differently from a plain many-to-many relationship. Members have to be added to the list via instances of the association object, which in turn point to the associated object:

user = User()
user.user_name = 'some user'

article = Article()

assoc = KeywordAssociation()
assoc.keyword = Keyword('blue')
assoc.user = user

assoc2 = KeywordAssociation()
assoc2.keyword = Keyword('green')
assoc2.user = user

article.keywords.append(assoc)
article.keywords.append(assoc2)

session.flush()
back to section top