Snowfakery Documentation

Snowfakery is a tool for generating fake data that has relations between tables. Every row is faked data, but also unique and random, like a snowflake.

To tell Snowfakery what data to generate, write a recipe file in YAML.

Snowfakery can write its output to stdout, or any database accessible to SQLAlchemy. When it is embedded in CumulusCI, it can output to a Salesforce org. Adding new output formats is a fairly straightforward process.

Installation

Installation for Salesforce Users

If you intend to use Snowfakery with Salesforce, we recommend installing CumulusCI, which includes Snowfakery in the installation. To install CumulusCI, follow the steps in the Get Started section of the CumulusCI documentation. (Don't forget to install Python, too!)

After installation, use the snowfakery task command to invoke Snowfakery.

$ snowfakery somefile.yml
...

Installation for Non-Salesforce Users

If you are not interested in using Snowfakery with Salesforce, directly install Snowfakery via pipx.

$ pip3 install pipx
...
$ pipx install snowfakery
...

After installation, use the snowfakery task command to invoke Snowfakery.

$ snowfakery somefile.yml
...

Central Concepts

Snowfakery recipes are specified in the simple, human-readable YAML format. A YAML file is designated by the .yml extension.

Let's start with a stupidly simple recipe named simple_static.yml. In this example, you can see that YAML uses indentation to define what parts of the file are related to each other.

- object: Person
  fields:
    name: Buster Bluth
    age: 35

To create a record with this data, run the snowfakery task command with the recipe name.

$ snowfakery docs/examples/simple_static.yml
...

This simple example generates a single record, a Person with three fields.

Person(id=1, name=Buster Bluth, age=35)

Two of the fields, name and age, include data from the YAML file. The id field is auto-generated by Snowfakery.

Field id name age
Value 1 Buster Bluth 35

.. note:: Snowfakery only works for models that are amenable to including an id field for every record. Your tools can use the id column, ignore the id column, or exchange the id column for another kind of ID (like CumulusCI does with Salesforce), but Snowfakery always generates an ID for each record, and refers between tables with IDs. (Future versions are expected to include a command line option to turn this behavior on or off.)

Let's make a new recipe named persons_of_interest.yml that requires Snowfakery to do some automation.

- object: Person
  count: 3
  fields:
    name:
      fake: name
    age:
      random_number:
        min: 12
        max: 95

Here's a breakdown for each step of this recipe.

  • object: Person: Creates a record template for any rows generated in the Person table representing real-world Person objects.
  • count: 3: Creates three Person records.
  • name: fake: name: Generates a fake name for the name column instead of requiring you to manually enter one. The list of things you can fake is based on a library called Faker, which is discussed later in this documentation.
  • age: random_number: Generates a random number between the min and the max values.

To run this recipe:

$ snowfakery docs/examples/persons_of_interest.yml
...

Let's look at what this recipe (randomly) generates.

Person(id=1, name=Allison Garcia, age=94)
Person(id=2, name=Megan Campos, age=67)
Person(id=3, name=Katherine Nelson, age=92)

We've created people! Or at least fake personas for people! And every time you run the recipe, you get a different set of "people".

While that's pretty cool, it doesn't use much of Snowfakery's power. Let's go deeper with a recipe named pet_stories.yml.

- object: Person
  count: 3
  fields:
    name:
      fake: name
    age:
      random_number:
        min: 12
        max: 95
    dog:
      - object: Animal
        fields:
          name:
            fake: FirstName
          species: canine
          food:
            - object: PetFood
              fields:
                name: Pets Choice
                cost: $10

    cat:
      - object: Animal
        fields:
          name:
            fake: FirstName
          species: feline
          food:
            - object: PetFood
              fields:
                name: Pets Choice
                cost: $1

If you're lost, don't worry. There are a lot of new concepts in this recipe, so let's tackle them one by one. In this example:

  • Three Person objects are created with the count function.
  • Each Person has a name, an age, a dog, and a cat.
  • Each dog or cat is an Animal row.
  • If each Person has two animals, there are a total of six Animal objects.
  • If each animal has a favorite food, there are a total of six PetFood objects.

To run this recipe:

$ snowfakery docs/examples/pet_stories.yml
...

Here's a visual example of the output.

Relationship diagram

Later, we discuss how to create two PetFood objects that are shared, and how to generate a random value for species and food.

Outputs

Snowfakery builds on a tool called SQLAlchemy, which provides a variety of database connectors.

When integrated with CumulusCI, it's possible for Snowfakery to output to a Salesforce instance. To learn more about integration with CumulusCI, see Advanced Features.

Snowfakery can also output JSON, SQL, directories of CSV, and object diagrams. CSV output goes to a directory with one CSV file per table and a JSON manifest file in the csvw format.

This is a complete list of file-based (as opposed to database-based) formats.

  • JSON: A custom JSON dialect
  • TXT: A debugging-style output
  • CSV: A directory of CSV files plus a csvw file
  • SQL: A SQL file with CREATE TABLE and INSERT statements
  • DOT: A graphviz file for use with the graphviz command line or web-based tools
  • SVG, SVGZ, JPEG, and PS PNG: Graphic formats that can be created if graphviz is installed

Objects

The core concept of Snowfakery is an object template. The object template represents instructions on how to create a row (or multiple rows) in a database. Rows, also known as records, in turn represent real-world entities such as people, places, or things, which is why we use the keyword object.

Each row has a type, a value that represents the name of the table that the row is added to in a relational database, or which CSV file the row is placed in. You declare the type after the keyword object.

As in an Object Relational system, think of objects or rows as synonyms. Rows are how we represent objects in tables or files. Records is another word synonymous with objects and rows in Salesforce.

Each row that is generated by Snowfakery is given a unique ID.

Let's revisit the person_of_interest.yml recipe.

- object: Person
  count: 10
  fields:
    name:
      fake: name
    age:
      random_number:
        min: 12
        max: 95

These are the main properties of this Object Template.

  • object is a type declaration that determines what table or file the row goes in.
  • count determines how many rows are made. It can be randomized or computed using Function Blocks or the Formula Language.
  • fields determine what data values to put in the row.

You can also have more than one Object Template for any row type, such as a relational table, CSV file, Salesforce Object, and so on.

Let's add a second object to the person_of_interest.yml recipe.

- object: Person
  count: 3
  fields:
    name:
      fake: name
    age:
      random_number:
        min: 12
        max: 95
- object: Person
  count: 3
  fields:
    name:
      fake: name
    age:
      random_number:
        min: 0
        max: 12

Now there are six Person rows generated in the Person table (or file)--three between ages 0 and 12, and three between ages 12 and 95.

Fake Data

To generate fake data with the fake function:

- object: Account
  fields:
    Name:
      fake: Company
    Description:
      fake: CatchPhrase
    BillingStreet:
      fake: StreetAddress
    BillingCity:
      fake: City
    BillingState:
      fake: State

You can fake all sorts of data: names, addresses, Latin text, English sentences, URLs, and so much more. To see the complete list, along with other related features, see the Fake Data Tutorial

friends

To create a rule such as "For every Person created, create two Animals", use the friends feature.

- object: Person
  count: 3
  fields:
    name:
      fake: name
    age:
      random_number:
        min: 12
        max: 95
  friends: # I get by with a little help from my...
    - object: Animal
      count: 2
      fields:
        name:
          fake: FirstName

This recipe outputs two animals per person.

Person(id=1, name=Sierra Ortega, age=91)
Animal(id=1, name=Sarah)
Animal(id=2, name=Brian)
Person(id=2, name=Robert Haley, age=42)
Animal(id=3, name=Michele)
Animal(id=4, name=Jonathan)
Person(id=3, name=Jason White, age=35)
Animal(id=5, name=John)
Animal(id=6, name=Kimberly)

Relationship diagram

There is no explicit relationship between the animals and the people in this example, but sometimes you need an implicit "relationship" between the number of one object created and the number of the other.

You can also use this feature for Many to One.

Relationships

Relationships are a big part of what separates Snowfakery from the dozens(!) of data generation tools out there.

Let's relate pets to their owners with ease in a recipe named person_pet.yml.

- object: Person
  count: 3
  fields:
    name:
      fake: name
    pet:
      - object: Animal
        fields:
          name:
            fake: FirstName
          species: Petaurus Breviceps

Now each of the Person rows has a Sugar glider for a pet (which is a good thing, as long as they know how to take care of them).

Let's look at what this recipe (randomly) generates.

Person(id=1, name=Rachel Thompson, pet=Animal(1))
Animal(id=2, name=Charles, species=Petaurus Breviceps)
Person(id=2, name=Alexander Zhang, pet=Animal(2))
Animal(id=3, name=Thomas, species=Petaurus Breviceps)
Person(id=3, name=Lisa King, pet=Animal(3))

Relationship Diagram

In addition, we can relate pets and owners "bidirectionally" with the reference feature.

Let's use the reference feature in a recipe named secret_life_of_pets.yml.

# snowfakery/docs/examples/secret_life_of_pets.yml
- object: Person
  count: 3
  fields:
    name:
      fake: name
    pet:
      - object: Animal
        fields:
          name:
            fake: FirstName
          owner:
            reference: Person

Now Person has a pet field that refers to an Animal row. Each Animal row has an owner field that refers back to the Personrow. It's called a bidirectional relationship--because it goes both ways. Of course, not all relationships are bi, but sometimes it's what your schema demands.

Let's look at what this recipe (randomly) generates.

Animal(id=1, name=Nicole, owner=Person(1))
Person(id=1, name=Steven Ellis, pet=Animal(1))
Animal(id=2, name=Richard, owner=Person(2))
Person(id=2, name=Chad Delacruz, pet=Animal(2))
Animal(id=3, name=Tammie, owner=Person(3))
Person(id=3, name=Corey Zamora, pet=Animal(3))

Relationship Diagram

The relationship from Person to Animal is called pet. It's expressed by embedding the Object Template for Animal in the field named pet.

The relationship from Animal to Person is called owner. It's expressed by the reference function. This function searches the YAML tree for the relevant Person row.

You can also use the nickname property to express a relationship between two rows that are not directly related in the hierarchy. For example, here's a recipe named pet_stories_2.yml.

- object: PetFood
  nickname: petschoice
  fields:
    name: Pets Choice
    cost: $10

- object: PetFood
  nickname: vetschoice
  fields:
    name: Vets Choice
    cost: $12

- object: Person
  count: 3
  fields:
    name:
      fake: name
    dog:
      - object: Animal
        nickname: dog
        fields:
          owner:
            reference: Person
          name:
            fake: FirstName
          species: canine
          food:
            reference: petschoice

    cat:
      - object: Animal
        fields:
          owner: Person
          name:
            fake: FirstName
          species: feline
          nemesis: dog
          food:
            reference: vetschoice

Let's look at what this recipe (randomly) generates.

PetFood(id=1, name=Pets Choice, cost=$10)
PetFood(id=2, name=Vets Choice, cost=$12)
Animal(id=1, owner=Person, name=Dustin, species=canine, food=PetFood(1))
Animal(id=2, owner=Person, name=Edwin, species=feline, nemesis=dog, food=PetFood(2))
Person(id=1, name=Antonio Martin, dog=Animal(1), cat=Animal(2))
Animal(id=3, owner=Person, name=Kristy, species=canine, food=PetFood(1))
Animal(id=4, owner=Person, name=Bryan, species=feline, nemesis=dog, food=PetFood(2))
Person(id=2, name=Hunter Wright, dog=Animal(3), cat=Animal(4))
Animal(id=5, owner=Person, name=Gregory, species=canine, food=PetFood(1))
Animal(id=6, owner=Person, name=Veronica, species=feline, nemesis=dog, food=PetFood(2))
Person(id=3, name=Rebecca Williams, dog=Animal(5), cat=Animal(6))

Relationship Diagram

Funky!

The basic rule is that the last row (object) created with the nickname is the one that's referenced.

Function Blocks

Fields can refer to functions that randomize, compute or look up data. You do that by nesting the function name under the field name, or with formulas.

reference

The reference function lets you look up another row (object) and make a reference to it. For example:

- object: Animal
  fields:
    name:
      fake: FirstName
    owner:
      reference: Person

The reference function looks for another object by table name (Person, in this example), or a nicknamed object by nickname.

If an object is created earlier in the recipe, and it has the appropriate nickname or tablename property, that object is the target reference.

Otherwise, the reference can be to an object hasn't been created yet. Snowfakery generates an ID for the object so that the current row can be generated--only no other properties of this new object can be referred to, because it does not exist yet.

If the reference value contains periods, Snowfakery follows field names to get to the final name. The first name before the first "." can either be a field name or an object name.

- object: cat
  nickname: Fluffy
  fields:
    color: black

- object: fiance
  nickname: sam
  fields:
    pet:
      reference: Fluffy

- object: betrothed
  fields:
    spouse:
      reference: sam
    pet:
      reference: spouse.pet
    color: ${{pet.color}}

Snowfakery (and CumulusCI) lets you loop over a recipe many times to generate multiple rows. In this case, a reference is always to an object created within the current iteration of a recipe. The exception to this rule are objects marked with the just_once property. These objects are created only in the first iteration, and any references to them are solely to the objects created in that first iteration.

random_choice

The random_choice function randomly chooses an option from a list.

Payment_Method:
  random_choice:
    - Cash
    - Cheque
    - Credit Card

random_choice evens the odds of its random selection unless you enter specific odds as a percentage.

StageName:
  random_choice:
    Closed Won: 60%
    In Progress: 20%
    New: 20%

To learn how to do more sophisticated randomness with features, read the Random Weights That Are Not Percentages section.

random_choice can also be used to randomly select potential child objects.

- object: Task
  fields:
    person_doing_the_task:
      random_choice:
        - object: Contact
          fields:
            FirstName: Bart
            LastName: Simpson
        - object: Lead
          fields:
            FirstName: Marge
            LastName: Simpson

A more sophisticated syntax lets you combine probabilities with values that are more complex than simple strings.

Let's explore this syntax with a recipe named random-choice-complex.yml.

# examples/random-choice-complex.yml
- object: Task
  count:
    random_choice:
      - choice:
          probability: 30%
          pick: 1
      - choice:
          probability: 30%
          pick: 3
      - choice:
          probability: 30%
          pick: 10
  fields:
    person_doing_the_task:
      random_choice:
        - choice:
            probability: 40%
            pick:
              - object: Contact
                fields:
                  FirstName: Bart
                  LastName: Simpson
        - choice:
            probability: 40%
            pick:
              - object: Contact
                fields:
                  FirstName: Marge
                  LastName: Simpson
        - choice:
            probability: 20%
            pick:
              - object: Contact
                fields:
                  FirstName: Lisa
                  LastName: Simpson

random_reference

The random_reference property creates a reference to a random, existing row from a table.

- object: Owner
  count: 10
  fields:
    name:
      fake: Name
- object: Pet
  count: 10
  fields:
    ownedBy:
      random_reference: Owner

The selected row can be any one that matches the object type and was already created in the current iteration of the recipe. For example, this previous recipe was executed twenty times (iterations) to generate 200 Pets and Owners. The selected rows in the first iteration would be one of the first ten Owners, and the selected rows in the last iteration would be one of the last ten.

Snowfakery cannot currently generate a random reference based on a nickname or to a row created in a previous or future iteration of the recipe. If you need these features, contact the Snowfakery team through a GitHub issue.

fake

The fake function generates fake data. This function is defined further in the Fake Data Tutorial

date_between

The date_between function picks a random date in some date range. For example, this recipe picks a date between Y2K and the present day.

- object: OBJ
    fields:
    date:
        date_between:
            start_date: 2000-01-01
            end_date: today

The options start_date and end_date can take these forms, too.

  • YYYY-MM-DD
  • +<number>d: number of days in the future, such as +10d
  • -<number>d: number of days in the past, such as -10d
  • +<number>M: number of months in the future, such as +10M
  • -<number>M: number of months in the past, such as -10M
  • +<number>y: number of years in the future, such as +10y
  • -<number>y: number of years in the past, such as -10y
  • +<number>w: number of weeks in the future, such as +10w
  • -<number>w: number of weeks in the past, such as -10w
  • today: The date the template is evaluated

Case is relevant. The "M" for months is uppercase. The rest are lowercase.

Let's pick a date between 30 days ago and 108 days in the future.

Payment_Date:
  date_between:
    start_date: -30d
    end_date: +180d

The date_between function can also be used in formulas.

wedding_date: Our big day is ${{date_between(start_date="2022-01-31", end_date="2022-12-31")}}

random_number

The random_number function picks a number in a range specified by min and max.

age:
  random_number:
    min: 12
    max: 95

If the number has to be divsible by another number (such as only multiples of 10), supply a step.

age:
  random_number:
    min: 10
    max: 90
    step: 10

As the name step implies, and to be a bit more precise, the output number minus min is a multiple of the step, which generates 12, 17 or 22.

age:
  random_number:
    min: 12
    max: 23
    step: 5

The random_number function can also be used as a function in formulas.

some_number: A number ${{random_number(min=5, max=10)}}

if and when

if lets you make field values conditional on other field values.

when can be a Python formula. It's even interpreted as a boolean similar to how Python does it: the first when clause that matches is selected. The final choice clause in the sequence has no when clause; it's selected if there's no match to the previous choices.

Let's use if and when in a recipe named conditional.yml.

# examples/conditional.yml
- object: Person
  fields:
    gender:
      random_choice:
        Male: 40%
        Female: 40%
        Other: 20%
    name:
      if:
        - choice:
            when: ${{gender=='Male'}}
            pick:
              fake: FirstNameMale

        - choice:
            when: ${{gender=='Female'}}
            pick:
              fake: FirstNameFemale

        - choice:
            pick:
              fake: FirstNameNonBinary

Macros

Macros re-uses groups of fields instead of requiring you to repeat them manually.

Let's use a macro in a recipe named evolution.yml.

- macro: canine
  fields:
    sound: barks
    legs: 4
    family: Caninae

- object: Animal
  include: canine
  fields:
    species: dog
    home: inside

- object: Animal
  include: canine
  fields:
    species: wolf
    home: outside

This recipe generates:

Animal(id=1, sound=barks, legs=4.0, family=Caninae, species=dog, home=inside)
Animal(id=2, sound=barks, legs=4.0, family=Caninae, species=wolf, home=outside)

Let's include more than one group of macros in a recipe named evolution_2.yml.

- macro: canine
  fields:
    sound: barks
    legs: 4
    family: Caninae

- macro: domestic
  fields:
    home: inside
    eats: petfood

- object: Animal
  count: 2
  include: canine, domestic
  fields:
    name: dog

This recipe generates:

Animal(id=1, sound=barks, legs=4.0, family=Caninae, home=inside, eats=petfood, species=dog)
Animal(id=2, sound=barks, legs=4.0, family=Caninae, home=inside, eats=petfood, species=dog)

Macros can include other macros. In fact, macros are especially powerful if you combine them with the include_file feature that lets one file include another. We even suggest you make a library of the most common object types you work with and then override fields to combine or specialize them.

.. note:: fields or friends declared in the macros listed later override those listed earlier. fields or friends declared in the Object Template override those declared in macros.

Debug

You can use Snowfakery's debug function to output values to stderr (usually, the command line) for debugging.

The debug function can "wrap" any formula expression and return its value.

# tests/debug.recipe.yml
- object: Example
  fields:
    value1: ${{ debug(399 + 21) / 10 }}
    value2: ${{ debug((399 + 21) / 10) }}
    value3:
      debug:
        - fake: datetime

In addition to its usual output, this would output:

DEBUG - 420 (<class 'int'>)
DEBUG - 42.0 (<class 'float'>)
DEBUG - 2017-06-19 22:10:13+00:00 (<class 'datetime.datetime'>)

Define Variables

To generate a value shared by multiple templates, create a variable with var.

Let's create a variable named "lastname.yml" that uses a shared var.

- var: lastname_var
  value:
    fake: LastName
- object: person
  fields:
    FirstName:
      fake: FirstName
    LastName: ${{lastname_var}}
- object: spouse
  fields:
    FirstName:
      fake: FirstName
    LastName: ${{lastname_var}}

.. note:: This works both at the top level of your recipe and in friends lists.

To group several fields together, create a "hidden" object.

- var: shared_address
  value:
    - object: __shared_address
      fields:
        street:
          fake: StreetAddress
        city:
          fake: city
        state:
          fake: state

Include Files

The include_file feature pulls in all declarations from the declared file. The file can itself include other files.

To include a file by a relative path:

- include_file: child.yml

Formulas

To insert data from one field into into another, use a formula.

# examples/price_math.recipe.yml
- object: Sale
  fields:
    num_items:
      random_number:
        min: 10
        max: 20
    per_item_price:
      random_number:
        min: 10
        max: 20
    message: Thanks for buying ${{num_items}} items @ $${{per_item_price}} each!

You can also make your data more dynamic with formulas. Formulas even use the same functions described in Function Blocks, but they can also be used inline.

- object: Sale
  count: 2
  fields:
    per_item_price: ${{random_number(20, 50)}}
    number_of_items: 3
    total: ${{per_item_price * number_of_items}}
    message: Thank you for buying $${{total}} items!

Formula Language

Some general principles about formulas:

  • Use ${{ to start a formula and }} to end it.
  • Use Python expression syntax in the middle.
  • Field values defined earlier on this object are available as names.
  • Generate fake values in formulas with dots ("."). For example, Name: ${{fake.FirstName}} Johnson
  • Parent (or ancestor) values are available through the parent's object name, such as Opportunity.amount.

Formulas are based on a similar language called Jinja2, but we use ${{ and }} to be more compatible with YAML.

The relevant section of the Jinja document is called Expressions. It includes information about Literals, Math, Comparisons, Logic, Other Operators, If Expressions and Python Methods.

Formula Functions and Variables

"Filters" allow you to do transformations of strings and numbers like this:

# examples/jinja_filters.recipe.yml
- object: Campaign
  fields:
    Name: ${{fake.bs | title}} Campaign
    UpperCaseName: ${{fake.bs | upper}}
    LowerCaseName: ${{fake.bs | lower}}

The list of filters is in the Jinja documentation.

Formula Functions and Variables

The functions below are designed to be used in formulas.

child_index

The child_index function returns a counter of how many objects from this template were generated during the execution of the nearest parent template. It resets each time the parent template is executed.

child_index: ${{child_index}}

unique_id

The unique_id variable returns a unique number that can be used to distinguish any record from others. For example, we can incorporate a unique ID into an email address or an employee ID.

# examples/unique_id/test_unique_id.yml
- object: Contact
  count: 3
  fields:
    FirstName:
      fake: FirstName
    LastName:
      fake: LastName
    EmployeeNum: ${{unique_id}}
    Email: ${{unique_id}}_${{fake.email}}

By default, Snowfakery works in "small id" mode, which means that IDs start as 6-7 digits, and grow longer as you generate more records.

$ snowfakery examples/unique_id/test_unique_id.yml
Contact(id=1, FirstName=Melody, LastName=Sherman, EmployeeNum=1371010, Email=9782010_cowens@example.org)
Contact(id=2, FirstName=Christian, LastName=Owens, EmployeeNum=2433010, Email=2414010_jimmy21@example.com)
Contact(id=3, FirstName=Tami, LastName=Stevenson, EmployeeNum=6375010, Email=8126010_leah50@example.net)
...

"Big ID" mode allows you to make 22+ digit IDs that are very likely to be unique across time and space. It is described more in unique_id and Big IDs.

Snowfakery may change its default mode to Big IDs at some point in the future. Do not assume that your IDs will always be small.

If you do not care about uniqueness, use random_number instead.

If you want the numbers to be generated in a predictable fashion with no gaps, use counter instead.

If you want the numbers to have a certain number of digits, use the UniqueId plugin.

unique_alpha_code

The unique_alpha_code functions generates unique alphanumeric codes similar to those used in some bureaucracies were space is an issue. Snowfakery alphanumeric codes are about 11 characters long in small ID mode and about 17 in Big ID mode. The more you use, the bigger they will grow.

# examples/unique_id/test_unique_alpha_code.yml
- object: Contact
  count: 3
  fields:
    FirstName:
      fake: FirstName
    LastName:
      fake: LastName
    DepartmentCode: ${{unique_alpha_code}}
Contact(id=1, FirstName=Jeremy, LastName=Higgins, DepartmentCode=2AUHHSZN)
Contact(id=2, FirstName=Vicki, LastName=Vazquez, DepartmentCode=GDH9OYLN)
Contact(id=3, FirstName=Gene, LastName=Wall, DepartmentCode=42Q3XX3N)

today

The today variable returns a date representing the current date. This date will not chanage during the execution of a single recipe. See the date function to learn more about this type of object.

now

The now variable returns a datetime representing the current moment. It outputs microsecond precision and is in the UTC timezone.

Using this recipe, you can see three different ways of outputting the timestamp:

# tests/test_now.yml
- object: Times
  fields:
    current_datetime: ${{now}}
    current_datetime_as_number: ${{now.timestamp()}}
    current_datetime_without_microseconds: ${{int(now.timestamp())}}

This would generate field values similar to these:

current_datetime=2022-02-23 15:39:49.513086+00:00, current_datetime_as_number=1645630789.513975, current_datetime_without_microseconds=1645630789

Experimentally, this variable is not guaranteed to return a unique value each time, especially on Windows.

fake: and fake.

The fake: block function and fake. namespace both generate fake data as described in the Fake Data Tutorial.

Let's use fake: and fake. in a recipe named two_fakers.yml.

# examples/two_fakers.yml
- object: Contact
  fields:
    FirstName:
      fake: FirstName
    LastName: ${{fake.LastName}}

snowfakery_filename

The snowfakery_filename variable represents the file that contains the template. This is useful for relative paths.

date

The date function can coerce a string into a date object for calculations, or generate a new date object from year/month/day designations. For example:

the_date: ${{date("2018-10-30")}}
another_date: ${{date(year=2018, month=11, day=30)}}

These objects have the following attributes:

  • date.year: the year
  • date.month: between 1 and 12 inclusive.
  • date.day: between 1 and the number of days in the given month of the given year.

And these methods:

  • date.weekday(): the day of the week as an integer, where Monday is 0 and Sunday is 6. For example, date(2002, 12, 4).weekday() == 2, a Wednesday.
  • date.isoformat() - string representing the date in ISO 8601 format, YYYY-MM-DD
  • strftime(format) - create a string representing the time under the control of an explicit format string.

datetime

The datetime function can generate a new datetime object from year/month/day parts:

# tests/test_datetime.yml
- object: Datetimes
  fields:
    from_date: ${{datetime(year=2000, month=1, day=1)}}
    from_datetime: ${{datetime(year=2000, month=1, day=1, hour=1, minute=1, second=1)}}
    right_now: ${{now}}
    hour: ${{now.hour}}
    minute: ${{now.minute}}
    second: ${{now.second}}
    right_now_with_timezone: ${{now.astimezone()}}
    to_date: ${{now.date()}}

These objects have the following attributes:

  • datetime.year: the year
  • datetime.month: between 1 and 12 inclusive.
  • datetime.day: between 1 and the number of days in the given month of the given year.
  • datetime.hour: the hour
  • datetime.minute: the minute
  • datetime.second: the second

And these methods:

  • date.weekday(): the day of the week as an integer, where Monday is 0 and Sunday is 6. For example, date(2002, 12, 4).weekday() == 2, a Wednesday.
  • date.isoformat() - string representing the date in ISO 8601 format, YYYY-MM-DD
  • strftime(format) - create a string representing the time under the control of an explicit format string.
  • astimezone() - create a timezone-aware datetime for your current timezone
  • date() - convert a datetime into a date.

relativedelta

The relativedelta function from dateutil is available for use in calculations. For example:

${{ date(Date_Established__c) + relativedelta(months=) }}

Some plugins are also interested in a template variable that has an id attribute that represents a unique identifier for the current template. Look at datasets.py to see one use-case where the template's ID can used to differentiate between two otherwise identical datasets.

NULL

The NULL value represents a missing value.

Let's use this value in a recipe named nullvalue.yml.

- object: foo
  count: 10
  fields:
    EndDate:
      random_choice:
        - date_between:
            start_date: 2018-01-01
            end_date: 2021-12-12
        - NULL
    DateSupplied:
      if:
        - choice:
            when: ${{ EndDate!=NULL }}
            pick: "Yes"
        - choice:
            pick: "No"

In this example, NULL becomes a random_choice option in the EndDate field. Then in the DateSupplied field, you test if the value is NULL or not.

Command Line Interface

Use the command line interface to learn the list of options available in the latest version of Snowfakery.

Usage: snowfakery [OPTIONS] YAML_FILE

      Generates records from a YAML file

      Records can go to:
          * stdout (default)
          * JSON file (--output-format=json --output-file=foo.json)
          * diagram file (--output-format=png --output-file=foo.png)
          * a database identified by --dburl (such as --dburl sqlite:////tmp/foo.db)
          * or to a directory as a set of CSV files (--output-format=csv --output-folder=csvfiles)

      Diagram output depends on the installation of graphviz
      (https://www.graphviz.org/download/)

      Full documentation here:

          * https://snowfakery.readthedocs.io/en/docs/

Options:
  --dburl TEXT                    URL for database to save data to. Use
                                  sqlite:///foo.db if you don't have one set
                                  up.

  --output-format [png|svg|svgz|jpeg|jpg|ps|dot|json|txt|csv|sql]
  --output-folder PATH
  -o, --output-file PATH
  --option EVAL_ARG...            Option to send to the recipe YAML in a
                                  format like 'OptName OptValue'. Specify
                                  multiple times if needed.

  --target-number, --target-count TEXT...
                                  Target record count for the recipe YAML in
                                  the form of 'number tablename'. For example:
                                  '50 Account' to generate roughly 50
                                  accounts.

  --reps INTEGER                  Target repetition count for the recipe YAML.
                                  Use as an alternative to --target-number

  --debug-internals / --no-debug-internals
  --generate-cci-mapping-file FILENAME
                                  Generate a CumulusCI mapping file for the
                                  dataset

  --generate-continuation-file FILENAME
                                  A file that captures information about how
                                  to continue a multi-batch data generation
                                  process

  --continuation-file FILENAME    Continue generating a dataset where
                                  'continuation-file' left off

  --plugin-option EVAL_ARG...     Option to send to a plugin in a format like
                                  'OptName OptValue'. Specify multiple times
                                  if needed.

  --load-declarations FILE        Declarations to mix into the generated
                                  mapping file

  --version                       Show the version and exit.
  --help                          Show this message and exit.

Scale Up Recipe Execution

From the command line you can control how many rows a recipe generates.

The simplest method:

snowfakery accounts.yml --reps 1000

This runs the recipe 1000 times. Easy!

But let's say you want to run a test against 50,000 accounts. You can calculate out how many reps to run in the command line, but this can be complex because Snowfakery recipes can also generate a random number of accounts in each run. Even for simple recipes, doing the math can be a headache, especially if you frequently make edits to the recipe.

A better solution is to specify the target-number and target tablename. For example:

snowfakery accounts.yml --target-number 1000 Account

The count works like this:

  • Snowfakery always executes a complete recipe. It never stops halfway through.
  • At the end of recipe execution, Snowfakery checks if it created enough of the object type defined by target-number.
  • If so, Snowfakery finishes the recipe. If not, it runs the recipe again.

If your recipe creates 10 Accounts, 5 Contacts and 15 Opportunities, and the previous command runs the recipe 100 times (1000/10=100), it generates 1000 Accounts, 500 Contacts, and 1500 Opportunites.

CSV Output

To create a CSV directory:

$ snowfakery template.yml --output-format csv --output-folder csvfiles
...

This generates a directory that looks like:

Animal.csv
Person.csv
PetFood.csv
csvw_metadata.json

.. note:: If you do not specify an output-folder, the files are created in the current folder.

The CSVW JSON file is a sort of manifest for all of the CSV files.

Advanced Features

Singletons with the just_once Feature

Snowfakery scales up to larger data volumes by evaluating your recipe over and over again. Each evaluation is called an iteration.

Some objects are meant to be produced only once, regardless of how many times the recipe executes. The programming language term for this is a "singleton". For example, an accounting system generates a dataset that has exactly two Ledger objects, Checking and Savings. You can have dozens of Entries per Ledger, or even billions. But the recipe only generates two Ledger rows.

Let's generate two--and only two!--Ledger rows with the just_once function in a recipe named twoledgers.yml.

- object: Ledger
  just_once: True
  nickname: Checking
  fields:
    Name: Checking

- object: Ledger
  just_once: True
  nickname: Savings
  fields:
    Name: Savings

- object: Entry
  count: 1000
  fields:
    Ledger: Checking
    ...

- object: Entry
  count: 1000
  fields:
    Ledger: Savings
    ...

To execute this recipe from the Snowfakery command line:

$ snowfakery accounting.yml --target-number 10_000 Entry
...

This recipe creates two Ledger rows, and 5000 Entry rows attached to each of the Ledger rows. If you scale up the recipe to one million, you still have only two Ledger rows.

Hidden Fields and Objects

As described earlier, fields can refer to each other. For example, field c can be the sum of fields a and b. Or perhaps you only want to output PersonLastName if PersonFirstName was set, and PersonFirstName is set randomly.

To create a value that gets used in computations but does not become output in the final database or CSV, create a field value prefixed by two underscores.

You can even do this with Object Templates to generate "objects" that are never saved as rows to your database, Salesforce org, or output file.

Let's hide several fields and objects in a recipe named hidden_fields.yml.

- object: Dates
  fields:
    __total_months: 48
    __first_month: ${{today - relativedelta(months=__total_months)}}
    __end_of_first_quarter: ${{date(__first_month) + relativedelta(months=3)}}
    ProgramStartDate: ${{__first_month}}
    FirstEvent:
      date_between:
        start_date: ${{__first_month}}
        end_date: ${{__end_of_first_quarter}}
    ProgramEndDate: ${{date(__first_month) + relativedelta(months=12)}}

This generates:

Dates(id=1, ProgramStartDate=2016-11-30, FirstEvent=2017-02-24, ProgramEndDate=2017-11-30)

Random Weights that are Not Percentages

Consider this field definition.

StageName:
  random_choice:
    Closed Won: 5
    In Progress: 3
    New: 4

Observant readers note that the values do not add up to 100. That's fine. Closed Won is selected 5/12 of the time, In Progress is selected 3/12, and New is selected 4/12 of the time. They are just weights, not necessarily percentage weights.

Many to One Relationships

In relational databases, child records typically have a reference to their parent record, but the opposite is not true.

Let's examine the relationships featured in the recipe named company.yml. You see that the Company object (row) relates to many Employee objects (rows).

# examples/company.yml
- object: Company
  fields:
    Name:
      fake: company

- object: Employee
  nickname: Employee 1
  fields:
    Name:
      fake: name
    EmployedBy:
      - object: Company

- object: Employee
  nickname: Employee 1
  fields:
    Name:
      fake: name
    EmployedBy:
      - object: Company

This generates:

Company(id=1, Name=Nelson-Sampson)
Company(id=2)
Employee(id=1, Name=Julie Turner, EmployedBy=Company(2))
Company(id=3)
Employee(id=2, Name=Amanda Martin, EmployedBy=Company(3))

Now let's say you generate 10 companies with 100 employees per company. It's easy with Snowfakery's friends or hidden field feature.

Let's use the friends feature in a recipe named company2.yml.

# examples/company2.yml
- object: Company
  count: 10
  fields:
    Name:
      fake: company
  friends:
    - object: Employee
      count: 100
      nickname: Employee 1
      fields:
        Name:
          fake: name
        EmployedBy:
          reference: Company

And let's use the hidden fields([#hidden-fields-and-objects]) feature in a recipe named company3.yml.

# examples/company3.yml
- object: Company
  count: 10
  fields:
    Name:
      fake: company
    __employees:
      - object: Employee
        count: 100
        nickname: Employee 1
        fields:
          Name:
            fake: name
          EmployedBy:
            reference: Company

Advanced Unique ID Generators

Snowfakery has two built-in functions for generating UniqueIDs plus the more advanced objects described below. The simple functions are unique_id and unique_alpha_code. More advanced generators ID generators can also be created.

unique_id and Big IDs

Snowfakery's default mode for ID generation is "Small ID" mode, wherein IDs start out as single-digit numbers and grow over time.

"Big ID" mode allows you to make IDs that are very likely to be unique across time and space. Specifically: it incorporates a timestamp, information about the current process and about 8 bits of randomness. The generated IDs are about 22 digits long. Unless you are running Snowfakery on thousands of computers at the exact same time, the chances of a clash are miniscule, and in any case, a clash only matters if the unique IDs are destined to end up in a common Salesforce Org or database.

CumulusCI manages Big ID mode for you automatically.

If you are using Snowfakery outside of CumulusCI, you can turn on Big ID mode from the command line:

$ snowfakery examples/test_unique_id.yml --plugin-option big_ids True
Contact(id=1, FirstName=Stephen, LastName=Parrish, Employee=763534209134265915391, Email=763534209134265915392_brittany17@example.org)

If you do need even higher levels of uniqueness, you can inject "unique execution ids" from some external source through the plugin option pid. Those execution ids would replace the timestamp and process information in your Big ID's calculation.

 $ snowfakery examples/test_unique_id.yml --plugin-option big_ids True --plugin-option pid 111
Contact(id=1, FirstName=Melody, LastName=Marquez, Employee=157937691, Email=157937692_emmahoover@example.org)

The 111 value is not literally included in the ID, but it is an input into the unique ID generation algorithm. As an example, you can inject about 15 digits of randomness and a timestamp like this on Unix-ish systems:

snowfakery examples/test_unique_id.yml --plugin-option big_ids True --plugin-option pid `date +"%s"`$RANDOM$RANDOM$RANDOM
Contact(id=1, FirstName=Cheryl, LastName=Estes, Employee=11014765223046647500920591, Email=11014765223046647500920592_solisroy@example.org)

If you pause and resume Snowfakery recipe generation using the continuation file feature, indexes will restart, so you should inject an external unique context ID through pid as shown above or through the API.

Counters

Snowfakery has a plugin called Counters which can generate numeric counters (1,2,3,4 or 3,6,9,12) and date counters (2021-01-01, 2021-01-02, 2021-01-03).

Counters are reset every iteration of the recipe. Counters are reset at the beginning of every "iteration". This makes them inappropriate for generating unique identifiers. Use unique_id instead.

Numeric Counters

Snowfakery can generate incrementing numbers like this:

# examples/counters/number_counter.recipe.yml
- plugin: snowfakery.standard_plugins.Counters
- object: Example
  count: 10
  fields:
    count:
      Counters.NumberCounter:
        start: 1
        name: foo

This would output:

$ snowfakery examples/counters/number_counter.recipe.yml
Example(id=1, count=1)
Example(id=2, count=2)
Example(id=3, count=3)
Example(id=4, count=4)
Example(id=5, count=5)
Example(id=6, count=6)
Example(id=7, count=7)
Example(id=8, count=8)
Example(id=9, count=9)
Example(id=10, count=10)

You can also control the start position and "step" like this:

# examples/counters/counter_start.recipe.yml
- plugin: snowfakery.standard_plugins.Counters
- object: Example
  count: 10
  fields:
    count:
      Counters.NumberCounter:
        start: 11
        step: 3

This generates:

Example(id=1, count=11)
Example(id=2, count=14)
Example(id=3, count=17)
Example(id=4, count=20)
Example(id=5, count=23)
Example(id=6, count=26)
Example(id=7, count=29)
Example(id=8, count=32)
Example(id=9, count=35)
Example(id=10, count=38)

As described above, counters are reset each iteration, as described above:

snowfakery examples/counters/number_counter.recipe.yml --reps 2
Example(id=1, count=1)
Example(id=2, count=2)
Example(id=3, count=3)
Example(id=4, count=4)
Example(id=5, count=5)
Example(id=6, count=6)
Example(id=7, count=7)
Example(id=8, count=8)
Example(id=9, count=9)
Example(id=10, count=10)
Example(id=11, count=1)
Example(id=12, count=2)
Example(id=13, count=3)
Example(id=14, count=4)
Example(id=15, count=5)
Example(id=16, count=6)
Example(id=17, count=7)
Example(id=18, count=8)
Example(id=19, count=9)
Example(id=20, count=10)

Date Counters for Schedules

Snowfakery can generate incrementing numbers like this:

# examples/counters/simple_date_counter.recipe.yml
- plugin: snowfakery.standard_plugins.Counters
- var: EveryTwoWeeks
  value:
    Counters.DateCounter:
      start_date: 2021-01-01
      step: +2w
- object: Meetings
  count: 4
  fields:
    Date: ${{EveryTwoWeeks.next}}
    Topic:
      fake: catchphrase

This generates:

Meetings(id=1, Date=2021-01-01, Topic=Innovative dedicated solution)
Meetings(id=2, Date=2021-01-15, Topic=Open-architected tangible artificial intelligence)
Meetings(id=3, Date=2021-01-29, Topic=Compatible global definition)
Meetings(id=4, Date=2021-02-12, Topic=Optimized real-time archive)

The start_date can be a particular date in YYYY-MM-DD or the word today.

The step is based on a syntax from the Python Faker library. It can be:

  • +<number>d : number days between steps, e.g. +10d (10 days)
  • +<number>w: number weeks between steps, e.g. +10w (70 days)
  • +<number>M: number months between steps, e.g. +10M (304 days)
  • +<number>y: number years between steps, e.g. +10y (3625 days)

Recipe Options

Instead of manually entering the exact number of records to create into a template file, pass an option (such as numbers, strings, booleans, and so on) to your generator recipe from the command line.

First, declare the option in your template file. For example:

- option: num_accounts
  default: 10

.. note:: If you do not specify a default, the option itself becomes mandatory, and the template is not processed without it.

To use an option value by referring to it in a formula:

- object: Account
  count: ${{num_accounts}}

Of course, you can do any math you want in the formula...

- object: Account
  count: ${{num_accounts / 2}}
    field:
        type: A
- object: Account
  count: ${{num_accounts / 2}}
    field:
        type: B

...Then pass that option.

--option numaccounts 10

Use Snowfakery in Python

You can embed Snowfakery in a Python application.

from snowfakery import generate_data

generate_data(
    yaml_file="examples/company.yml",
    option=[("A", "B")],
    target_number=(20, "Employee"),
    debug_internals=True,
    output_format="json",
    output_file=outfile,
)

To learn more about using Snowfakery in Python, see Embedding Snowfakery into Python Applications

Use Snowfakery with Databases

Snowfakery is built on top of a very flexible engine called SQLAlchemy. This engine lets it connect to many different databases subject to the limitations described in this section.

First, install Snowfakery in a context that makes it easy to use the Python command pip to manage your Python environment. For example, you can install Python using the standard installers from python.org, and then run these commands to create and use a venv with the Postgres package.


# create a new directory for our experiment
$ mkdir experiment_with_postgres
# cd into it
$ cd experiment_with_postgres
# create a new database:
# https://www.postgresql.org/docs/9.1/app-createdb.html
$ createdb snowfakerydb
# create a virtual environment. A good habit to get into.
# https://docs.python.org/3/library/venv.html
$ python3 -m venv myvenv
# activate the venv
$ source myvenv/bin/activate
# install Snowfakery in this venv
$ pip install snowfakery
# install the Postgres library for Python
# https://pypi.org/project/psycopg2/
$ pip install psycopg2
# let's use it!
$ snowfakery --dburl='postgresql://localhost:5432/snowfakerydb' ~/code/Snowfakery/examples/company.yml --target-number 1000 Employee
# and check the results
# https://www.postgresql.org/docs/9.3/app-psql.html
$ echo 'select * from "Employee"' | psql snowfakerydb

A limitation of this process is that Snowfakery currently can only create new tables rather than import into existing ones. The table has an id column in addition to columns for every field that was generated by the recipe. All columns are type text.

The list of databases supported by our underlying infrastructure (SQLAlchemy) is listed here and here.

Snowfakery is not proactively tested with every output database. We encourage you to keep in touch with the Snowfakery team about your experience of using Snowfakery with your databases. It gives us a better sense of what works well and what does not. We also appreciate any bug reports and pull requests related to problems you discover along the way.

Plugins and Providers

Plugins and providers extend Snowfakery with Python code. A plugin adds new functions to Snowfakery. A provider adds new capabilities to the Faker library, which is exposed to Snowfakery users through the fake: keyword.

To include plugins or providers in a Snowfakery file:

- plugin: package.module.classname

Built-in Plugins

Advanced Math

Snowfakery has a Math plugin that offers access to all features from Python's math, min, max, and round.

For example:

- plugin: snowfakery.standard_plugins.Math
- object: OBJ
  fields:
    twelve:
      Math.sqrt: ${{Math.min(144, 169)}}

Or:

- plugin: snowfakery.standard_plugins.Math
- object: OBJ
  fields:
    twelve: ${Math.sqrt}

Advanced Unique IDs with the UniqueId plugin

There is a plugin which gives you more control over the generation of Unique IDs and Alphanumeric Codes.

Unique ID Alphabets

One of the features of the UniqueId plugin is controlling the alphabet used for generating alphanumeric codes. One could make the alphabet bigger or smaller as below:

# examples/unique_id/alphabet.recipe.yml
- plugin: snowfakery.standard_plugins.UniqueId
- var: LargeAlphabetGenerator
  value:
    UniqueId.AlphaCodeGenerator:
      alphabet: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
- var: DNAGenerator
  value:
    UniqueId.AlphaCodeGenerator:
      alphabet: ACGT
- var: NumberGenerator
  value:
    UniqueId.AlphaCodeGenerator:
      alphabet: 0123456789

- object: DemonstrateAlphabets
  count: 5
  fields:
    big_alpha_example: ${{LargeAlphabetGenerator.unique_id}}
    dna_example: ${{DNAGenerator.unique_id}}
    num_example: ${{NumberGenerator.unique_id}}

This would generate:

DemonstrateAlphabets(id=1, big_alpha_example=2r0NXGiBY, dna_example=GACGCCCTACCA, num_example=6843813157)
DemonstrateAlphabets(id=2, big_alpha_example=4LTepQ9TE, dna_example=TGTGTTTTTATGA, num_example=56465396322)
DemonstrateAlphabets(id=3, big_alpha_example=2yRu7l5Rk, dna_example=TGTCTGGTCTTA, num_example=22377438298)
DemonstrateAlphabets(id=4, big_alpha_example=1WCE6hDIO, dna_example=TGGTTTGTTAGA, num_example=22299878992)
DemonstrateAlphabets(id=5, big_alpha_example=1I25wfsXE, dna_example=GCGTGTCGTATTA, num_example=39979614438)

Code length

You can control the minimum (but not the maximum) length of a unique code:

# examples/unique_id/min_length.recipe.yml
- plugin: snowfakery.standard_plugins.UniqueId
- var: MySmallCodeGenerator
  value:
    UniqueId.AlphaCodeGenerator:
      min_chars: 6
- object: Example
  count: 10
  fields:
    unique: ${{MySmallCodeGenerator.unique_id}}

The code will grow over time as a consequence of keeping every value unique.

External datasets

Snowfakery can incorporate data from external CSV files or databases as datasets.

Iterate Over CSV Datasets

Here's an example of a simple CSV-based dataset named addresses.csv. Note that the column names come from the first row.

Number,Street,City,Postcode
420,Kings Ave,Burnaby,85633
421,Granville Street,White Rock,85633
422,Kingsway Road,Richmond,85633

To iterate over this data with Snowfakery:

- plugin: snowfakery.standard_plugins.datasets.Dataset
- object: Person
  count: 10
  fields:
    Name:
      fake: name

    __address_from_csv:
      Dataset.iterate:
        dataset: addresses.csv
    StreetAddress: ${{__address_from_csv.Number}} ${{__address_from_csv.Street}}
    City: ${{__address_from_csv.City}}

examples/datasets/datasets.recipe.yml

The plugin only needs to be declared once per recipe.

The double-underscore ("dunder") syntax is described in [#hidden-fields-and-objects]. In this context, it grabs an entire CSV row and gives it a name so that specific columns can later be called by that name.

The Dataset.iterate function block pulls each row from the dataset in order, one for each Person. It returns to the top after it has used every row.

Let's look at what this dataset (randomly) generates.

$ snowfakery examples/datasets/datasets.recipe.yml
Person(id=1, Name=Dawn Gray, StreetAddress=420 Kings Ave, City=Burnaby)
Person(id=2, Name=Melissa Walker, StreetAddress=421 Granville Street, City=White Rock)
Person(id=3, Name=Roberto Simon, StreetAddress=422 Kingsway Road, City=Richmond)
Person(id=4, Name=Miss Jessica Ramsey, StreetAddress=420 Kings Ave, City=Burnaby)
Person(id=5, Name=Bobby Sutton, StreetAddress=421 Granville Street, City=White Rock)
Person(id=6, Name=Tiffany Cox, StreetAddress=422 Kingsway Road, City=Richmond)
Person(id=7, Name=Ethan Potts, StreetAddress=420 Kings Ave, City=Burnaby)
Person(id=8, Name=Tamara Farley, StreetAddress=421 Granville Street, City=White Rock)
Person(id=9, Name=Austin Wong, StreetAddress=422 Kingsway Road, City=Richmond)
Person(id=10, Name=Kelly Jones, StreetAddress=420 Kings Ave, City=Burnaby)

You can use this feature to create "enriched" data by combining real or pre-generated data in a CSV file with fake data from Snowfakery.

You can execute a template exactly once for every line in a CSV file (or database table, or Salesforce Query) using the for_each keyword:

# examples/datasets/simple_for_each.yml
- plugin: snowfakery.standard_plugins.datasets.Dataset

- object: Person
  for_each:
    var: current_address
    value:
      Dataset.iterate:
        dataset: addresses.csv
  fields:
    FirstName:
      fake: FirstName
    LastName:
      fake: LastName
    StreetAddress: ${{current_address.Number}} ${{current_address.Street}}
    City: ${{current_address.City}}

This binds a new row from the CSV to the variable named current_address over and over, expanding the template for each row. The for_each replaces the count in a template that uses it.

Iterate over Salesforce Datasets

CumulusCI can be used to download CSV data for enrichment.

Let's download CSV data in a recipe named CumulusCI.yml.

contacts_for_accounts:
  steps:
    1:
      task: query
      options:
        object: Account
        query: select Id from Account
        result_file: accounts.csv
    2:
      task: generate_and_load_from_yaml
      options:
        generator_yaml: examples/salesforce/ContactsForAccounts.recipe.yml
        num_records: 100
        num_records_tablename: Contact

As you can see, CumulusCI.yml uses the generate_and_load_from_yaml task to parse data from the recipe named ContactsForAccounts.recipe.yml.

- plugin: snowfakery.standard_plugins.datasets.Dataset
- object: Contact
  fields:
    __accounts:
      Dataset.shuffle:
        dataset: ../../accounts.csv
    FirstName:
      fake: FirstName
    LastName:
      fake: LastName
    AccountId: ${{__accounts.Id}}

Of course, adjust the paths based on your directory structure.

It's a matter of time until Snowfakery has built-in features to query SOQL, so we recommend that you evaluate those features as an alternative to this technique.

Iterate over SQL Database Datasets

If the reference to a dataset is a URL, Snowfakery attempts to use it as a reference to a database. Snowfakery supports sqlite:/// urls as described in the documentation for SQL Alchemy.

That looks exactly like above except for this line.

dataset: sqlite:///addresses.db

Depending on the context, other database URLs work as well if the context has appropriate drivers installed. Only SQLite is part of our test suite, however.

If a SQL dataset has more than one table, specify what table to use. For example:

__address_from_csv:
  Dataset.iterate:
    dataset: addresses.csv
    table: addresses

Shuffle Data

If you prefer the rows be pulled out in random order, you have two options. The simplest thing is to shuffle it in your Snowfakery recipe.

__address_from_csv:
  Dataset.shuffle:
    dataset: addresses.csv

If you experience performance or memory usage problems with big datasets, your next option is to shuffle the data into a random order before using it in Snowfakery. Snowfakery's source repository includes a simplistic tool called shufflecsv.py that shuffles data for CSV files. You feed it a CSV file on stdin, and it generates another CSV file on stdout.

To shuffle data with shufflecsv.py in the command line:

$ python tools/shufflecsv.py < examples/datasets/addresses.csv > examples/datasets/shuffled.csv
...

.. note:: This script does not currently support CSVs that include newlines within fields.

To shuffle for SQLite or another database, write a script that uses the clause order by random(). This is also how Snowfakery does its randomization internally, so there's not much incentive to only run the Snowfakery recipe once. However, it can save you time if you run the Snowfakery recipe over and over because the shuffling happens once.

Read Files

To read and include Unicode files:

- plugin: snowfakery.standard_plugins.file.File

- object: TextData
  fields:
    encoded_data:
      - File.file_data:
          encoding: utf-8
          file: ../CODE_OF_CONDUCT.md

utf-8 is the default encoding, so you don't actually need to declare it here. That said, other popular text encodings include ascii, big5, latin-1 and shift_jis. There's even more encodings featured on the 100 encodings list.

To read and include Binary files:

- plugin: snowfakery.standard_plugins.base64.Base64
- plugin: snowfakery.standard_plugins.file.File

- object: BinaryData
  fields:
    encoded_data:
      Base64.encode:
        - File.file_data:
            encoding: binary
            file: salesforce/example.pdf

Other encodings of binary data are not currently supported, and output streams generally don't support raw binary data being written to them. It is relatively easy to make a plugin that does other encodings by building on the code in File.py.

Custom Plugins and Providers

Snowfakery can be extended with custom plugins and fake data providers as described in Extending Snowfakery with Python Code.

Update mode

Snowfakery has a special mode that lets you loop over every row of an input, update it, and then output the result with minimal recipe code. For example, if you have a CSV of contacts, and you want to add an address for every one, you do it like this:

# examples/updates/update_contacts.recipe.yml
- object: Contact
  fields:
    FirstName:
      fake: FirstName
    LastName:
      fake: LastName
    BillingStreet: ${{input.Number}} ${{input.Street}}
    BillingCity: ${{input.City}}
    BillingState: Texas
    BillingPostalCode: ${{input.Postcode}}
    BillingCountry: US

Given an input file like this:

Number,Street,City,Postcode
420,Kings Ave,Burnaby,85633
421,Granville Street,White Rock,85633
422,Kingsway Road,Richmond,85633

You can run:

$ snowfakery examples/updates/update_contacts.recipe.yml --update-input-file examples/datasets/addresses.csv --output-format csv

This generates output like this:

FirstName,LastName,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,id
Tommy,Nash,420 Kings Ave,Burnaby,Texas,85633,US
Lindsey,Callahan,421 Granville Street,White Rock,Texas,85633,US
Greg,Sellers,422 Kingsway Road,Richmond,Texas,85633,US

An update recipe should have a single top-level object with no count on it. The recipe can take options if needed. It will generate the same number of output rows as input rows.

To do updates in a Salesforce org, refer to the CumulusCI documentation.

Use Snowfakery with Salesforce

Snowfakery recipes that generate Salesforce records are like any other Snowfakery recipes, but instead use SObject names for the objects. There are several examples in the Snowfakery repository.

Salesforce-specific patterns and tools are described in Use Snowfakery with Salesforce

Snowfakery 3

Moving to Snowfakery 3 might require you to make very minor updates to a few recipes. In Snowfakery 2, all formulas are evaluated to either a string or number. In Snowfakery 3, other types are possible.

In most cases, this will enable things that were previously difficult or impossible. For example, this recipe does relative date calculations between fields:

- snowfakery_version: 3
- object: OBJ
  fields:
    basedate: ${{datetime(year=2000, month=1, day=1)}}
    dateplus: ${{basedate + relativedelta(years=22)}}

Note the snowfakery_version declaration which triggers the Snowfakery 3 interpretation of the recipe. Becauses of this interpretation basedate is a datetime, rather than a string.

In very obscure cases this might cause changes in the meaning of a recipe which depended on values being flattened to strings. For example, a recipe might append a string to a date, which would generate an error in Snowfakery 3.

These errors are expected to be rare, but you can validate your own recipes by adding the snowfakery_version: 3 declaration.

In June of 2022, Snowfakery 3 formula interpretation will become the default. You should test all your recipes with the snowfakery_version: 3 declaration before then, so that you have time to fix any recipes needed.

Appendices

Snowfakery Glossary

  • Object: A type of data, or in this case a row of data, that represents real-world entities like people, places, or things. See also: Rows.
  • Object Template: Instructions on how to create a row, or multiple rows, in a database. Each row represents a real-world object.
  • Rows: A unit of related information to an object listed in a database, often called records. For example, in Salesforce (which includes a database) a Contact record has a first name, last name, phone number, and so on. In a database each Contact is listed as a separate row. Contact is the object type of each of those rows, and rows represent real-world objects. See also: Objects.
  • Recipe: A Snowfakery YAML file that instructs Snowfakery on what to generate.
  • Iteration: Snowfakery recipes can be "scaled up" to generate more data when executed over and over again. These executions are called iterations.
  • Singleton: A singleton is an Object Template that generates a single row regardless of how many times the recipe is iterated.
  • YAML: A relatively simple, human-readable format. Learn more at yaml.org.

Appendix: Security Profile of Snowfakery

Snowfakery is a domain-specific programming language with access to most of the power of Python. It can load Python plugins and call Python methods. It's unwise to run untrusted recipes in an environment that has access to secure resources, such as passwords, network connections, and so on.

Appendix: The Age Old Puzzle

# snowfakery/docs/examples/st_ives.yml
# As I was going to St. Ives,
# I met a man with seven wives,
# Each wife had seven sacks,
# Each sack had seven cats,
# Each cat had seven kits:
# Kits, cats, sacks, and wives,
# How many were there going to St. Ives?
#
# https://en.wikipedia.org/wiki/As_I_was_going_to_St_Ives
- object: narrator
- object: man
  fields:
    wives:
      - object: woman
        count: 7
        fields:
          husband:
            reference: man
          luggage:
            - object: sack
              count: 7
              fields:
                holder:
                  reference: woman
                contents:
                  - object: cat
                    count: 7
                    fields:
                      container:
                        reference: sack
                      offspring:
                        - object: kit
                          count: 7
                          fields:
                            parent:
                              reference: cat
- object: stats
  fields:
    num_narrators: ${{ man.id }}
    num_men: ${{ man.id }}
    num_women: ${{ woman.id }}
    num_sack: ${{ sack.id }}
    num_cat: ${{ cat.id }}
    num_kittens: ${{ kit.id }}
    everyone: ${{  num_men + num_narrators + num_women + num_sack + num_cat + num_kittens }}
    going_to_st_ives: ${{ num_narrators }}

What does it output as its last row?

This (incomplete) picture probably won't help....

Silly diagram