Pulp v1.0.4 Release

April 11th, 2012 by

A bug fix release has been built to the v1 stable repository (http://repos.fedorapeople.org/repos/pulp/pulp/v1/stable/). The fixed bug list can be found below.

Upgrade

One of the bug fixes present in this release is to address a situation that may arise when two similarly named packages have similar checksums (a small chance however we actually managed to run into it). As part of that fix, existing packages need to be migrated to a new directory structure. This release contains a script to perform that migration. Instructions on running that script can be found on the Pulp wiki.

The migration script must be run before attempting to run a sync with the new release to avoid duplicating content.

Quick Links

Bug Fixes

  • 809628 – fixed error message formatting
  • 809195 – added post_sync dequeue hook to scheduled syncs
  • 807332 – the continuing story of the post sync url firing off actuall POST
  • 807516 – deleting a repository does not affect repoids associated with a package
  • 806976 – fix overlap in /etc/pulp files between main and sub-packages
  • 805740 – generate metadata on re-promotion with filters
  • 805922 – adding sync logic to look for server directory during clone/local
  • 802454 – added post_sync_url and post_sync dequeue hook for POSTing task
  • 798656 – include full checksum when constructing package paths
  • 801184 – error messages when installing selinux RPM after its already been installed
  • 799120 – added package filtering at package import stage
  • 797929 – add requirement on semanage command
  • 795819 – account for the relativepath from the metadata location field

Version 1.1 In Development

Bug/RFE triage and planning for our 1.1 release has completed and work will begin this sprint towards the included fixes. Aligned bugs can be found using this bugzilla query and while there are no known changes to be made to that list, it is, as always, subject to change. Current plans are to target a June 1 release date. More information will be available on this blog as it becomes available.


Categories: Releases | No Comments »

Pulp Beyond 1.0

April 3rd, 2012 by

With Pulp 1.0 a few weeks old now, I wanted to send out an update on the next steps for the project.

Version 1.0.x

Our recent work on the v1 stream has been bug fixes for a related project in Red Hat (Katello). Our current (internal-only for now) v1 release is 1.0.2 and contains about a dozen bug fixes. The current plan is to move that into the v1/stable repository sometime next week. More information, including a list of bug fixes, coming in the next few days.

Version 1.1

This may or may not even exist :)

The original thought was that we’d be a few months out from being able to do a version 2.0 release and would probably want to do an interim release with critical bug fixes. That role seems to have been filled with the 1.0.x builds. As a team we’re not exactly swimming in free time, so it’s possible that we may push forward to get version 2.0 in an alpha state and skip 1.1 entirely. We should have a clearer idea of this in the next few weeks as Katello gets closer to a release.

Version 2.0

Not surprisingly, our big push as a team is to get the 2.0 release to a usable state as quickly as possible. It’s been talked about on this site before but never as an overall checklist of features and goals. Now that 1.0 is out and in use, it makes sense to start to look to what’s coming in 2.0.

  • New architecture to allow for user-defined content types to be managed.
  • Plugin-based architecture allows content synchronization from non-yum external sources. Plugins are able to leverage Pulp’s platform capabilities such as concurrency management, sync scheduling, permissions, and more.
  • A more clear distinction between external content synchronization and the publishing of that content from the Pulp server. As with sync, the architecture revolves around a plugin model which allows new publishing mechanisms to be easily added to the platform.
  • Revamped concurrency layer to ensure safety across multiple requests through postponing or outright rejecting conflicting requests.
  • Closer adherence to REST practices including the proper usage of HTTP response codes and consistent data and format for exception conditions.
  • Rewritten client using an extension mechanism to easily add new commands by leveraging the client’s pre-configured server bindings. Not happy with Pulp client output? In 5 minutes you can write a new extension that formats the data however you see fit.

The above list is already implemented, just not quite in a totally clean state yet. The following are planned changes for version 2.0 but obviously may change over the next few months:

  • Separation of the server->consumer communications, allowing the existing consumer agent infrastructure to be used outside of Pulp for custom needs.
  • Revamped CDS functionality. There are a lot of ideas here, many revolving around allowing the CDS to function as more than a simple repository mirror. One such ability is to act as a proxy to the Pulp server in the case of large geographic differences between server and consumers or firewall and security considerations.


Categories: 2.0 | 1 Comment »

Pulp v1.0 Release

February 24th, 2012 by

The team is proud to announce the availability of the Pulp 1.0 release.

Quick Links

What is Pulp?

Pulp is a repository and consumer management system. On the repository management side of things, the Pulp server will synchronize yum repositories from external sources, either manually initiated or at recurring scheduled intervals. Repositories can be manually created and populated with user-uploaded content as well in the case there is no external source. These repositories are then hosted by either the Pulp server directly or synchronized to an external server (called a Content Deliver Server or “CDS”) to better control the bandwidth required for the repository. Repositories can be made available to anybody or secured, requiring yum clients to present a valid x.509 certificate to be granted access.

Pulp also provides a number of features designed to facilitate the management of clients (called “consumers”) using Pulp-provided repositories. Using the Pulp consumer scripts, a client machine registers itself with the Pulp server and uploads a profile describing its installed packages. As the consumer binds itself to Pulp repositories, the server keeps track and can then provide information to administrators about applicable errata available to a given consumer. Each consumer is connected to a message bus that the Pulp server can use to remotely trigger operations on the consumer such as package installations and updates.

Going Forward

As with any release, we recognize issues are going to arise. We’ve already set up a 1.1 version in bugzilla so we can begin to plan out what support for the v1 stream will look like.

Outside of that, the team is focused on the re-architecting of Pulp for v2. The theme for v2 is “Pulp as a Platform” providing mechanisms for developers to use Pulp infrastructure for non-RPM content types. Of course, Pulp v2 will this architecture to continue to handle its existing content types, including RPMs and errata, as well as many refinements to its existing subsystems. More information and progress on the v2 stream can be found in the 2.0 category of this blog.

Thank You

The pulp.spec file was first committed to git on May 20, 2010, which I suppose represents as good a birth date for Pulp as any. In the time since that commit our small 4 person team has over doubled in size. But more importantly, a community has sprung up in the process. Today, the chat room and mailing list are buzzing with comments ranging from using Pulp as a standalone product to embedding Pulp as part of a larger initiative (see below) to hacking up the pieces of Pulp to meet more specific needs.

It’s been awesome to watch it all take shape. I hesitate to thank individuals by name for fear of missing any, but realize that all contributions, from feature requests to bug reports (and everything in between), are greatly appreciated by the team. The community has helped Pulp become a better product; that’s simply a fact and there’s no disputing it. So thank you to everyone for your interest in Pulp, feedback on what you like (and don’t), keeping us company in the chat room, and general patience as so few balance the needs of so many. Y’all rock.

Powered by Pulp

Below are a few examples of projects using Pulp as part of the back end:

Upgrades from Community Releases

Upgrading a Community Release to the 1.0 release is supported. Be sure to run pulp-migrate and restart Apache once the new packages are installed.


Categories: Releases | No Comments »

Unit Associations

January 10th, 2012 by

I spent the bulk of last sprint working on the relationship (read: association) between a content unit and a repository. A lot of the changes and functionality were driven by community feedback. Given their use cases, in 2.0 we’ll be able to support a lot of queries that aren’t capable in the 1.0 architecture.

Association Metadata

In 1.0, the association between a repository and a package is maintained by a simple list of packages on the repository. In 2.0, this relationship is its own entity, allowing us to capture metadata about the association itself. This metadata can be easily expanded in the future, but for now we track the following:

Association Creation Timestamp

Timestamp of when the association between repository and unit was first created.

Association Update Timestamp

The methods to create an association is idempotent; if the association already exists, a second isn’t created and no error is raised. The one change that occurs is that the “updated” field on the association is changed to reflect when the association call was made.

This can be used as a mechanism for tracking the last time an association was confirmed, which can then be used to ask questions such as “Show me all units that have not been present in the external source in the last 3 months.”

Owner

I’m starting to work through the scenarios surrounding user associations v. those created through an importer. User associations can be either by a unit upload or by copying the associations from an existing repository. I wanted to be able to track both the type of owner (user v. importer) and the ID of the owner for auditing purposes.

This has other implications as well. An importer can only affect its own associations, preventing the possibility that the importer will undo explicit repository additions made by a user. It is possible that a unit will be associated multiple times with a single repository, each time by a different owner. The unit association query provides the ability to filter on owner metadata or to have Pulp remove the duplicate associations before returning the results.

Unit Association Query

The unit association query, or more specifically the criteria that drives the query, is used in a number of places beyond showing a user the contents of a repository. At very least, the query is used in the following places:

  • Providing users with the ability to list or search for specific units in a repository.
  • Accessible to importers through the sync conduit to be used when determining the current state of the repository during a sync.
  • Similarly, accessible to distributors through the publish conduit to give the plugin the flexibility it needs to get the data necessary to publish the repository.
  • The criteria document is used in the 2.0 evolution of repository cloning to let users select a custom set of units to migrate to the new repository.

I’ve mentioned the criteria a number of times now. The criteria refers to a JSON document passed to Pulp in the unit association query to drive the query and returned results. The following capabilities are provided:

  • Filtering – Filtering can be done on both the association metadata as well as the unit metadata itself. MongoDB syntax is used directly in the criteria document, giving users access to useful constructs such as $and, $or, $in, $gt, $lt — you get the idea. If you can filter on it using Mongo, you can filter on it for the unit association query too.
  • Sorting – Sorting can be done on either the association metadata or the unit metadata; within a particular metadata category, however, multiple sort fields can be provided.
  • Skip & Limit – Used in conjunction with each other, the skip and limit fields can be used to achieve server-side pagination driven at the database level.
  • Field Limiting – A list of association metadata and/or unit metadata fields can be provided to limit the returned data to save on bandwidth and memory costs.

Examples

While the above list describes the supported functionality, I’ve found it’s often easier to think in terms of the sorts of questions that can be supported by it. Below are a few of the query use cases (and the respective JSON criteria document) I wanted to answer with the query functionality.

The NEVRA for all RPMs in the repository, sorted by name (a basic web UI for RPMs in a repo):

{ "type_ids" : ["rpm"],
  "fields" : { "unit" : ["name", "epoch", "version", "release", "arch"] },
  "sort" : { "unit" : { "name" : "ascending"} }
}

All units added to the repository more than a month ago, regardless of type (too lazy to type out the date subtraction code):

{ "filters" : { "association" : {"created" : {"$lt" : <now - 30 days>}}} }

Basic pagination (on page 4 showing 25 items per page):

{ "limit" : 25, "skip" : "75" }

All noarch RPMs:

{ "type_ids" : ["rpm"],
  "filters" : { "unit" : { "arch" : "noarch" }}
}

User selected multiple sort columns to sort RPMs by name with the most recent version first:

{ "type_ids" : ["rpm"],
  "sort" : { "unit" : {"name" : 1, "version" : -1, "epoch" : -1}}
}

All user-uploaded RPMs in a repository, showing the most recently uploaded ones first:

{ "type_ids" : ["rpm"],
  "filter" : { "association" : { "owner_type" : "user" }},
  "sort" : { "association" : { "owner_type" : "descending" }}
}

Units uploaded by a system admin sometime in November:

{ "filters" : { "association" : { "owner_type" : "user", "created" { "$gte" : "November 1", "$lte" : "November 30"}}}}

Units uploaded by a rogue system admin:

{ "filters" : { "association" : "owner_id" : "evil-jdob" } }

Association from Repository

Let’s get this out of the way now: I haven’t thought of a good name for this feature. It’s a new spin on cloning, allowing the user to specify a criteria document to dictate which units from a source repository should be imported into a destination repository. I’m open to suggestions :)

As I’ve mentioned a few times now, the criteria document format is the same as for the unit association query. This lets a user tweak the criteria and see matching units in the source repository. Once the criteria is acceptable, the same document is then fed back into Pulp to be used to associate those units into a new repository.

It’s important to note that the destination repository’s importer is notified of the new associations. This gives the importer a hook to do whatever setup work it needs to do to accomodate the newly associated units. I won’t go into the details of the plugin implementation here, but the docstrings for the Importer base class should provide enough guidance if you’re curious.

Next Steps

When I get the time, it really shouldn’t take long to add the infrastructure to be able to save named copies of criteria documents to the database. Once that is in place, adapting the query to accept a previously saved criteria document instead of respecifying it each time should be a pretty trivial change.

At that point, I’d like to add some small variable support to the criteria document. This is mostly necessary for any searches that focus around dates. A saved query that shows all units added since January 1st starts to lose its usefulness over time. However, the ability to specify “in the last 30 days” and have Pulp resolve that at execution time is much more useful.

I haven’t gotten any of this on the Pulp backlog yet, much less scheduled for a sprint, but it’s definitely functionality that’s in my head and pretty achievable given the way the current code is written.


Categories: 2.0 | No Comments »

Pulp Community Release 19

December 15th, 2011 by

Installation & Upgrade

Installation instructions can be found in the Pulp User Guide at the Pulp project web site. As usual, upgraded environments must run pulp-migrate to upgrade to the latest database changes. This release includes upgraded versions of both the server and client RPMs (admin and consumer) and should be upgraded in conjunction with each other.

Upgrade Database Migration

Last release, changes were made with respect to requiring the relative path of a repository to be unique. This sprint saw further refinements that prevent one repository from being nested within another. For example, hosting a repository at “/foo/bar” and another at “/foo/bar/baz” causes a number of issues and will be prevented going forward. However, repositories “/foo/bar/wombat” and “/foo/bar/zombie” are valid as long as “/foo/bar” is not itself a repository.

If your installation contains repositories with relative paths that violate this rule, the pulp-migrate script will display a warning. The migration will complete and the Pulp server will continue to run, however the repositories in question should be manually fixed as soon as possible.

Custom Built Dependencies

Due to a bug in python-oauth, the Pulp repositories now contain a patched version: python-oauth2-1.5.170-2.pulp.fc16.noarch.rpm. This version is a dependency on the Pulp RPM and the version from the Fedora repositories (1.5.170-2.fc16) will be upgraded during the Pulp installation. Appropriate care should be taken for systems using this library.

Additionally, the Pulp built version of python-isodate has been upgraded to ython-isodate-0.4.4-4.pulp.fc16.noarch.rpm.

Supported Platforms

In keeping with Pulp’s policy of building for Fedora’s current and previous releases, builds for Fedora 14 will no longer be provided. Builds for this Community Release are provided for Fedora 15 and Fedora 16 (RHEL builds remain unchanged).

Features & Changes

  • Updated SELinux Policy
    • Improved upon security rules from previous releases.
    • Pulp files are now labled with the httpd_sys_content_t context.
    • Renamed semodule name from pulp to pulp-server.
    • More details can be found on the SELinux design wiki page.
  • Repository Filters Enahncements
    • Repository filters are now applied to manually uploaded content in addition to synchronized content.
  • Bulk Repository Status APIs
    • Added API calls to retrieve sync status and history for all or a specified set of repositories in a single API call.
    • More details can be found on the Repository Bulk API wiki page.
  • Repository Enhancements & Changes
    • Added the ability to update checksum type on a repository. Doing so will cause the repository metadata to be regenerated.
    • Added uniqueness constraint to ensure relative paths are unique across repositories.
    • Added checks to ensure that the relative path of a new repository will not allow the repository to be nested within another repository. See the user guide for more information.
  • Repository Clone Enhancements & Changes
    • Overall performance tweaks to greatly improve the speed of the clone operation.
    • Cloning now has the ability to resolve duplicate packages, distribution tree missing sub-directories, and metadata checksum mismatches between filesystem and database.
    • Changed cloned repository symlinks to refer to the central package location instead of the parent repository.
    • Added the clone operation to the persistent task framework, allowing an in process clone to be resumed on server restart.
  • Distribution Enhancements
    • Added an arch field to the distribution model.
    • Distribution API root changed from /distribution to /distributions to be consistent with the Pulp API conventions.
    • A number of changes have been made to the distribution API to be more REST-like.
  • Added a new AMQP event that is raised when a task is dequeued. Currently, this only applies to tasks related to repository synchronization. More information can be found on the Pulp’s AMQP events wiki page.
  • Added support for updating packages on a consumer. Package update is modeled after yum behavior in that when packages are specified, only those packages are updated. When no packages are specified, all upgradable pacakges are updated. See the User Guide and API documentation for more details.
  • Fixed 21 bugs.


Categories: Releases | No Comments »

Importer Sync APIs

November 21st, 2011 by

One of my stories for the sprint is to revisit the the APIs for interacting back and forth between a plugin and the Pulp server during a sync operation. The goal is too make sure they are not only sufficient from a functionality perspective, but designed in such a way as to not absolutely destroy performance. As you can imagine, that last part takes a little bit of thought and creativity. I want to avoid making an API that is awesome for RPMs and near unusable for everything else (I had to work with a guy once who wrote APIs that only he would want to use, I’m not getting back into that situation again), so I figured I’d lay out how I see most sync operations taking place and use that as a starting point to talk with current plugin writers on their needs.

Below is what I imagine is a rough outline of what most importer plugins will look like. It will reference how the APIs from importer back into Pulp* look today, but realize the driving force behind this is refining those APIs, so they’ll be changing in the next few days as I digest this.

* The term “conduit” is used to refer to the object passed to the plugin that exposes the Pulp functionality it will want to use. Each conduit instance is scoped to the repository being synchronized, so don’t be surprised to not see the repository ID in any of the conduit signatures below.

Step 1: Query External Feed

I figure the first step just about any importer will want to do is query the external source from which units will be imported. Already in place is the ability for each importer type to accept a custom set of configuration options on a per repo basis, so the importer should already have everything it needs to find out what work it has to do.

In the RPM world, this involves fetching the repository’s metadata and starting to break down the file list.

Step 2: Current State of the Repository

Once we know what the external source says the repository should look like, we need to know what the repository currently looks like. On first sync this will be empty, but the common case is incremental updates to a previously synchronized repository.

Conduit Call: get_unit_keys_for_repo
A unit key is a dictionary of key-value pairs that uniquely identifies a content unit in the context of its content type. The keys contained within will vary by content type depending on what makes sense. I suppose you could call it the “natural” key as compared to the unit ID which is talked about later.

This is a single call the plugin will use to determine the unit keys of all units currently associated with the repository. The “single call” part of that is important since I’m trying to keep plugin writers from having to utterly smash the database, a theme you’ll see repeated throughout this write up.

Step 3: Resolve Repository Changes

Using the above two pieces of information (what should be in the repository and what currently is), the plugin will want to resolve the differences. It should be noted that I’m expecting this to happen almost entirely in memory at this point and not incur massive amounts of Pulp database hits (this is the driving reason behind the single call in step 2 rather than giving the plugin writer calls to the database to test units on a case by case basis).

In some cases, I expect this to be an early exit. If you know there’s no chance of content metadata changing and needing to update the unit’s metadata, you can punch out early if there are no new units.

By now, I expect the plugin writer to have two lists: content units to be added (and updated, you’ll see what I mean in a minute) in the repository and units to be unassociated from the repository.

Step 4: Add or Update Units

My gut reaction* is that in most cases, the plugin won’t care if a unit is being added or updated. It just wants the correct metadata in the database and the unit associated to the repository. So the current API has a single idempotent call that does either and add or update; the Pulp server makes the call and the plugin writer’s life is easier.

* My gut may be totally wrong, so feel free to argue this point.

I suspect I’ll be changing this to provide both fine-grained semantics as well as this sort of utility combination call. For now though, the idea is that the plugin will:

  1. Add (or update) the unit to Pulp. This will create an entry in the database for the unit if one didn’t already exist, but it is orphaned (doesn’t belong to a repository) by default.
  2. Associate the unit to the repository being synchronized.

Even now as I look at the above list I wonder why I didn’t just take the extra step of adding another aggregate call that will add/update and associate all in one, leveraging the fact that they are all idempotent so the plugin writer only really needs to concern themselves with the post-conditions and not the correct incantation of conduit calls to get there.

The other big operation taking place in this step is the downloading of the unit’s bits. This is left entirely to the plugin itself to implement, though in the future if there are any utilities we can provide they may be in some form of supplemental package.

The noteworthy part of the download operation is that the plugin asks Pulp for the final say on where to store the unit. The plugin determines the relative path that will keep one unit’s bits from conflicting with another’s (segregated by type), but Pulp is asked for the actual location on disk.

Also notice that the download bits part is optional. If a plugin wants to use Pulp strictly for cataloging content and not actually doing any bits movement, that’s totally possible with these APIs.

This area is my biggest concern in terms of Pulp overhead’s influence on the overall performance. So far it seems like the add/update call can’t cleanly be batched, but I’m also ok with the concept of 2 database hits per unit being added (at some point we’re going to have to actually use the database). What we have batched are the associate calls, which means that until that uber add/update/associate call exists, it’s only a single call to associate the running list of added/updated units to the repository.

Conduit Call: request_unit_filename(content_type, relative_path)
Not much else to explain here. This prevents the plugin from having to care where the admin configured Pulp to store content and lets the plugin focus on the important part: uniquely storing unit bits without duplication.

Conduit Call: add_or_update_content_unit(type_id, unit_key, filename, custom_unit_data)
(This isn’t exactly how it appears in git right now, but the git version is definitely busted, so I’ll talk to where I want to go with it.)

The unit_key has been discussed earlier. The custom_unit_data field is a dictionary containing whatever metadata the plugin wants to store for that unit; there is no set schema for what goes in here. The filename may seem odd, but I’m expecting the purging of orphaned packages to be a Pulp operation (in other words, not contacting a plugin). That means Pulp needs to know where on disk the bits are stored. In the case of metadata-only units, this will be None and Pulp will simply remove the database entry on orphaned unit cleanup.

Conduit Call: associate_content_unit(type_id, unit_id)
The unit ID is Pulp’s unique ID for the content unit (as compared to the unit key which is the natural key). This is returned from the add/update call which gives that method the side effect of translating unit_key into database-level ID. Again, a batch operation will reduce the database hits for making all of these associations.

Step 5: Unassociate Removed Units

Using the data from early on, the plugin will know which units are in the repository that are not in the external source. I still have to figure out a solution for how to handle units manually associated with a repository; either the plugin should support a flag that says do not unassociate unknown units or the manually associated units will have a flag indicating they were explicitly associated and the plugin should use that information and not undo user-initiated changes.

Conduit Call: unassociate_content_unit(type_id, unit_id)
As I write this I realize that, given my explanation, the plugin writer doesn’t have the unit_id for units that are supposed to be removed at this point. My expectation is that they used the get_unit_keys_for_repo to determine which units need to be removed, but that doesn’t include the database IDs. So I either need to enhance these APIs to be able to unassociate by unit_key or give an explicit ID lookup call (that call has the smell of giving a plugin writer enough rope to hang himself with). Like I said, this is a work in progress and I’m half using this to think out loud :)

Step 6: Return a Report

A successful sync_repo call is expected to return an instance of SyncReport. That’s stored in the newly added sync history tracking on the repository and can be accessed through the REST API. The sync report currently includes three pieces of data specified by the plugin:

  • Number of units added
  • Number of units removed
  • Arbitrary log of the sync

The log is meant to give the user visibility into the sync process itself. Added/removed counts are meant to, among other things, be used as triggers for other things that may need to happen when a repository’s contents change.

Other: Repository Working Directory

I didn’t know where to fit this in, but Pulp will provide each importer with a working directory for each repository. This is meant to store temporary files needed during the synchronize process. For instance, our current RPM sync operation utilizes the repository metadata which can be downloaded and unpacked to this directory. Pulp will take care of cleaning up this directory on repository delete.

Next Steps

I’m focused on this for the next day or so, depending on how early I mentally check out for Thanksgiving. I can’t stress enough how any input is appreciated. Ping me in chat (jdob @ #pulp on Freenode), e-mail pulp-list, comment to this blog, send me a carrier pigeon… whatever it takes to let me know what you think.


Categories: 2.0 | No Comments »

Pulp Fedora 16 Builds

November 9th, 2011 by

I forgot to mention in the last announcement that Community Release 18 will be the last build of Pulp offered on Fedora 14. Going forward support will continue on Fedora 15 and 16.

The latest testing build for Pulp is available on Fedora 16 and can be found in our testing repositories. Instructions for using our testing repositories can be found in the installation portion of our user guide. As you can imagine by the term “testing build”, we make no guarantees as to its stability. :)


Categories: Releases | No Comments »

Pulp Community Release 18

November 8th, 2011 by

Installation & Upgrade

Installation instructions can be found in the Pulp User Guide at the Pulp project web site. As usual, upgraded environments must run pulp-migrate to upgrade to the latest database changes.

Upgrade Database Migration

The pulp-migrate script might warn about repositories with the same relative paths. Relative paths need to be unique across repositories in Pulp. Please make sure to clean up these repositories as this functionality is deprecated and will be unsupported in upcoming releases.

Server Configuration

The configuration option:

[tasking]
max_concurrent

has changed to:

[tasking]
concurrency_threshold

This is in preparation for the administrator to “weight” tasks. Currently, the semantics of the property have not changed. Details can be found here on the weighted tasks design wiki.

Consumers

The pulp consumer certificate location is now configurable. As part of this change, the default location was changed from:

/etc/pki/consumer/

to:

/etc/pki/consumer/pulp

to avoid collisions with RHSM certificates. After upgrading Pulp on consumer systems, the pulp-agent daemon must be restarted and the consumer re-registered. Alternatively, the certificate may be manually migrated to the new directory.

Features & Changes

  • Added support for package uninstallation through the API and CLI.
  • Added support for package group uninstallation through the API and CLI.
  • Enhanced progress reporting of repository synchronization. Progress is now reported on each individual item being downloaded.
  • Exporter (continued from last sprint):
    • Added Support for deltarpms and prestodelta metadata export.
    • Added ability to export repository groups to target location.
  • Distribution Enhancements:
    • Distributions now use a new id format: ks-$family-$variant-$version-$arch
    • New selective sync support to add and remove distributions between repositories.
    • Added new timestamp field to distribution model to capture when the treeinfo was last updated.
    • Pulp now stores distributions in a central location (/var/lib/pulp/distribution) for space optimization and to allow a distribution to be associated with multiple repositories.
    • The distribution relativepath to be path of the distro location.
    • The distribution URL will now use the repository’s relative path. A distribution can be associated to multiple repositories using same distribution ID.
  • Local Discovery: Support to discover repositories from a local file based URL paths.
  • Selective operations such as add/remove packages or errata on a repository will not automatically trigger a metadata generation task. Users need to manually trigger by calling the generate metadata on the repository after the associations are complete (pulp-admin repo generate_metadata).
  • Repository Notes: Repository notes are similar to key-value pairs on consumers. The ability to add, update and delete notes from a repository has been added. Repositories may also be queried on note data. For more information, see the repository section of the user guide.
  • Repo Synchronization Schedule
    • REST API changed to separate sync schedule management from repository creation and update. A new sub-collection has been added (/pulp/api/repositories//schedules/sync/) that accepts GET, PUT, and DELETE calls. Details can be found inthe repository section of the user guide.
    • The pulp-admin script has been changed to utilize the new REST API. Schedule creation and update are no longer part of the pulp-admin repo create and pulp-admin repo update actions. Repository scheduled syncs can now be viewed and managed with the pulp-admin repo sync action. The --help flag should be used for more details. Details can be found here: https://fedorahosted.org/pulp/wiki/UGRepo#sync


Categories: Releases | No Comments »

Preview: Pulp v2.0 Repository APIs

November 4th, 2011 by

Last post talked about the process of adding a v2.0 plugin to a Pulp server. This post will provide an overview of the currently available APIs. These APIs were developed with the focus of a sprint demo, so there are some holes in functionality. There is a story this sprint to revisit these and flush them out (along with correcting some conventions), but for now I wanted to provide a quick list for those interested in playing around with the plugin model.

The APIs can be found after the jump.
Read the rest of this entry »


Categories: 2.0, Articles | 1 Comment »

Preview: Creating a Pulp Plugin

November 2nd, 2011 by

Buried in Community Release 17 is the first working implementation of the new plugin mechanism for Pulp 2.0. I didn’t make a big deal about it since, well, it’s still really, really early. I figured I’d wait to start to highlight it until we got some more work done.

That changed with the introduction of the pulpdist project which is looking to leverage Pulp for synchronizing and publishing mirrors within Red Hat. It’s awesome to have such a flushed out use case for Pulp that’s going to be written at the same time we’re designing the features and should prove really helpful in getting an outside contributors view on our approach.

Since I needed to write up some information for the pulpdist team on how to get started writing a plugin, I figured I’d do it here in case others are interested in playing with the new architecture as it takes shape. Again, it’s still really early in terms of code and API stability, so realize things are likely to still be tweaked. Feedback is appreciated if you do start to poke around.

This post will cover the basics in terms of creating and installing Pulp plugins. A later post will go into more details on the plugin and conduit APIs with some examples of the sorts of things I’ve done as demo plugins.

Type Definitions

The first step is to make sure type definitions exist for the types of content you are looking to manage with Pulp. A type definition informs Pulp of the details of how to store and manage content units of that type. This metadata can be user-focused (display name, description) or used by Pulp to optimize storage of a content unit’s metadata (search indexes). A type definition may also reference other types to establish an aggregate relationship. For instance, the errata type will reference the RPM type to express that an errata may contain one or more RPM units.

Type definitions are written in type descriptors which are located in /var/lib/pulp/plugins/types. A type descriptor may contain more than one definition; the intention is that related definitions will be defined in the same file for ease of managing them. Type descriptors are JSON documents.

Let’s start with an example and break it down. To be clear, this is a slimmed down version of what RPM and errata definitions would look like, so don’t get caught up in what may be missing.

{"types": [
    {"id" : "rpm",
     "display_name" : "RPM",
     "description" : "RPM Package",
     "unique_indexes" : ["name", "version"],
     "search_indexes" : [
         ["name", "epoch", "version", "release", "architecture"]
      ]},
    {"id" : "errata",
     "display_name" : "Errata",
     "description" : "Errata",
     "unique_indexes" : "id",
     "search_indexes" : [
         "title",
         "type",
         "severity"
      ],
     "child_types" : ["rpm"]}
   ]}

The above example contains two type definitions. When the Pulp server is started, it will parse the type definitions and ensure syntax validity and integrity (e.g. does a type reference a child type that doesn’t exist). Pulp will alter its database accordingly to be able to store units of these types.

The fields in a type definition are as follows:

  • id – Programmatically identifies the content type. This must be unique across all type definitions.
  • display_name – User-friendly name for the type.
  • description – User-friendly details on what the type represents.
  • unique_indexes – Identifies the set of fields that dictate uniqueness for units of this type. This may be either a single field (e.g. “name”) or a compound index for uniqueness when paired together (e.g. “name”, “version”). Pulp will configure the database with unique indexes on these fields and will enforce uniqueness accordingly when handling units of this type.
  • search_indexes – A list of additional indexes to create for units of this type. This allows the type definition to optimize Pulp’s storage of units for expected queries. Again, entries in here may either be a single list or a list of indexes for a compound index.
  • child_types(optional) List of type IDs that may be referenced from units of this type. Pulp uses this information to create the necessary links in the database to track this relationship.

The unique_indexes and search_indexes fields map pretty closely to the same concepts in MongoDB. More information on how compound indexes are handled in MongoDB can be found in their documentation and should be consulted when defining these indexes in a type definition.

If a type definition cannot be parsed or loaded, the Pulp application will fail to start. Information on what went wrong can be found in /var/log/httpd/error_log rather than the Pulp logs.

Plugins

Writing importers and distributors follow a similar model; the most notable difference is the use of different APIs depending on what you want to do. For the purpose of this post, I’ll describe how to write an importer and add a note at the end as to the differences in writing a distributor.

Importers

Each importer is a separate directory located in /var/lib/pulp/plugins/importers. This directory must be a Python package (simply drop __init__.py in the directory). Pulp will attempt to load the plugin by looking for a file in that directory named importer.py. The Pulp hook into the plugin must be located in that module; all other code can be organized into separate modules and packages.

Pulp will look in importer.py for any classes that subclass the pulp.server.content.plugins.importer.Importer class. The Importer class defines the API the plugin may implement (I still have to go back in and clean up the docs in there).

At very least, subclasses must implement the class method metadata to allow Pulp to retrieve the required metadata describing the importer. This method must return a dictionary containing the following information:

  • id – Programmatically identifies the importer. This must be unique across all importers.
  • display_name – User-friendly identification of the importer.
  • types – List of type IDs to inform Pulp of the types of content handled by the plugin. Pulp application boot will fail if a plugin references a type ID that does not exist.

Below is the metadata implementation for example importer:

class LocalRpmImporter(Importer):
 
    @classmethod
    def metadata(cls):
        metadata = {
            'id'           : 'local-rpm',
            'display_name' : 'Local RPM Importer',
            'types'        : ['rpm'],
        }
        return metadata

At that point, the only thing left to do is override the base class methods to provide the actual functionality of the plugin. Again, I’ll cover those APIs more thoroughly in a future post. For now, the documentation in the plugin base classes themselves should be consulted for more information. These classes may be viewed in our git repository here.

Distributors

Distributor plugins follow the same model as importers. A distributor must also implement the metadata class method and the same fields are required. The following differences are probably self-explanatory, but I’ll list them for completeness:

  • Distributors are installed to /var/lib/pulp/plugins/distributors.
  • The distributor package must include a file named distributor.py.
  • Distributors must subclass pulp.server.content.plugins.distributor.Distributor.

One last note, since each plugin directory is itself a Python package that will be loaded into the classpath, an importer and distributor may not share the exact same plugin package name even though they are stored in separate directories.

Summary

I tend to get long-winded when I write, so for reference here is the short version of the above wall of text:

  • Create type definitions
    • Installed to /var/lib/pulp/plugins/types/ (file name and extension do not matter).
    • JSON documents; see explanation above for required fields.
  • Create importer plugins
    • Each plugin is a python package under /var/lib/pulp/plugins/importers/.
    • Each plugin module must contain a file named importer.py.
    • The importer.py module must define a class that subclasses pulp.server.content.plugins.importer.Importer.
    • The importer subclass must implement the class method metadata (required fields can be found above).
    • The plugin implementation is provided by overriding the methods in the Importer base class.
  • Create distributor plugins
    • Each plugin is a python package under /var/lib/pulp/plugins/distributors/.
    • Each plugin module must contain a file named distributor.py.
    • The distributor.py module must define a class that subclasses pulp.server.content.plugins.distributor.Distributor.
    • The distributor subclass must implement the class method metadata (required fields can be found above).
    • The plugin implementation is provided by overriding the methods in the Distributor base class.

Next Steps

I'll be flushing out the documentation in the plugin base classes over the next few days. When I do, I'll do another post that outlines the sorts of hooks provided to plugins.

Currently, none of the functionality for creating v2.0 repositories is exposed through the CLI. REST APIs exist for repository creation and configuration, triggering synchronize and publish operations, and querying for content units in Pulp. Unfortunately, none of that is documented yet, so I'll be posting on that in the next few days as well.


Categories: 2.0, Articles | No Comments »