Snowfakery Documentation¶
Snowfakery is a tool for generating fake data that has relations between tables. Every row is faked data, but also unique and random, like a snowflake.
To tell Snowfakery what data to generate, write a recipe file in YAML.
Snowfakery can write its output to stdout
, or any database accessible to SQLAlchemy. When it is embedded in CumulusCI, it can output to a Salesforce org. Adding new output formats is a fairly straightforward process.
Installation¶
Installation for Salesforce Users¶
If you intend to use Snowfakery with Salesforce, we recommend installing CumulusCI, which includes Snowfakery in the installation. To install CumulusCI, follow the steps in the Get Started section of the CumulusCI documentation. (Don't forget to install Python, too!)
After installation, use the snowfakery
task command to invoke Snowfakery.
$ snowfakery somefile.yml
...
Installation for Non-Salesforce Users¶
If you are not interested in using Snowfakery with Salesforce, directly install Snowfakery via pipx.
$ pip3 install pipx
...
$ pipx install snowfakery
...
After installation, use the snowfakery
task command to invoke Snowfakery.
$ snowfakery somefile.yml
...
Central Concepts¶
Snowfakery recipes are specified in the simple, human-readable YAML format. A YAML file is designated by the .yml
extension.
Let's start with a stupidly simple recipe named simple_static.yml
. In this example, you can see that YAML uses indentation to define what parts of the file are related to each other.
- object: Person
fields:
name: Buster Bluth
age: 35
To create a record with this data, run the snowfakery
task command with the recipe name.
$ snowfakery docs/examples/simple_static.yml
...
This simple example generates a single record, a Person
with three fields.
Person(id=1, name=Buster Bluth, age=35)
Two of the fields, name
and age
, include data from the YAML file. The id
field is auto-generated by Snowfakery.
Field | id | name | age |
---|---|---|---|
Value | 1 | Buster Bluth | 35 |
.. note::
Snowfakery only works for models that are amenable to including an id
field for every record. Your tools can use the id
column, ignore the id
column, or exchange the id
column for another kind of ID (like CumulusCI does with Salesforce), but Snowfakery always generates an ID for each record, and refers between tables with IDs. (Future versions are expected to include a command line option to turn this behavior on or off.)
Let's make a new recipe named persons_of_interest.yml
that requires Snowfakery to do some automation.
- object: Person
count: 3
fields:
name:
fake: name
age:
random_number:
min: 12
max: 95
Here's a breakdown for each step of this recipe.
object: Person
: Creates a record template for any rows generated in thePerson
table representing real-worldPerson
objects.count: 3
: Creates threePerson
records.name: fake: name
: Generates a fake name for thename
column instead of requiring you to manually enter one. The list of things you can fake is based on a library called Faker, which is discussed later in this documentation.age: random_number
: Generates a random number between themin
and themax
values.
To run this recipe:
$ snowfakery docs/examples/persons_of_interest.yml
...
Let's look at what this recipe (randomly) generates.
Person(id=1, name=Allison Garcia, age=94)
Person(id=2, name=Megan Campos, age=67)
Person(id=3, name=Katherine Nelson, age=92)
We've created people! Or at least fake personas for people! And every time you run the recipe, you get a different set of "people".
While that's pretty cool, it doesn't use much of Snowfakery's power. Let's go deeper with a recipe named pet_stories.yml
.
- object: Person
count: 3
fields:
name:
fake: name
age:
random_number:
min: 12
max: 95
dog:
- object: Animal
fields:
name:
fake: FirstName
species: canine
food:
- object: PetFood
fields:
name: Pets Choice
cost: $10
cat:
- object: Animal
fields:
name:
fake: FirstName
species: feline
food:
- object: PetFood
fields:
name: Pets Choice
cost: $1
If you're lost, don't worry. There are a lot of new concepts in this recipe, so let's tackle them one by one. In this example:
- Three
Person
objects are created with thecount
function. - Each
Person
has aname
, anage
, adog
, and acat
. - Each
dog
orcat
is anAnimal
row. - If each
Person
has two animals, there are a total of sixAnimal
objects. - If each animal has a favorite food, there are a total of six
PetFood
objects.
To run this recipe:
$ snowfakery docs/examples/pet_stories.yml
...
Here's a visual example of the output.
Later, we discuss how to create two PetFood
objects that are shared, and how to generate a random value for species
and food
.
Outputs¶
Snowfakery builds on a tool called SQLAlchemy, which provides a variety of database connectors.
When integrated with CumulusCI, it's possible for Snowfakery to output to a Salesforce instance. To learn more about integration with CumulusCI, see Advanced Features.
Snowfakery can also output JSON, SQL, directories of CSV, and object diagrams. CSV output goes to a directory with one CSV file per table and a JSON manifest file in the csvw format.
This is a complete list of file-based (as opposed to database-based) formats.
- JSON: A custom JSON dialect
- TXT: A debugging-style output
- CSV: A directory of CSV files plus a csvw file
- SQL: A SQL file with
CREATE TABLE
andINSERT
statements - DOT: A graphviz file for use with the graphviz command line or web-based tools
- SVG, SVGZ, JPEG, and PS PNG: Graphic formats that can be created if graphviz is installed
Objects¶
The core concept of Snowfakery is an object template. The object template represents instructions on how to create a row (or multiple rows) in a database. Rows, also known as records, in turn represent real-world entities such as people, places, or things, which is why we use the keyword object
.
Each row has a type, a value that represents the name of the table that the row is added to in a relational database, or which CSV file the row is placed in. You declare the type after the keyword object
.
As in an Object Relational system, think of objects or rows as synonyms. Rows are how we represent objects in tables or files. Records is another word synonymous with objects and rows in Salesforce.
Each row that is generated by Snowfakery is given a unique ID.
Let's revisit the person_of_interest.yml
recipe.
- object: Person
count: 10
fields:
name:
fake: name
age:
random_number:
min: 12
max: 95
These are the main properties of this Object Template.
object
is a type declaration that determines what table or file the row goes in.count
determines how many rows are made. It can be randomized or computed using Function Blocks or the Formula Language.fields
determine what data values to put in the row.
You can also have more than one Object Template for any row type, such as a relational table, CSV file, Salesforce Object, and so on.
Let's add a second object
to the person_of_interest.yml
recipe.
- object: Person
count: 3
fields:
name:
fake: name
age:
random_number:
min: 12
max: 95
- object: Person
count: 3
fields:
name:
fake: name
age:
random_number:
min: 0
max: 12
Now there are six Person
rows generated in the Person
table (or file)--three between ages 0 and 12, and three between ages 12 and 95.
Fake Data¶
To generate fake data with the fake
function:
- object: Account
fields:
Name:
fake: Company
Description:
fake: CatchPhrase
BillingStreet:
fake: StreetAddress
BillingCity:
fake: City
BillingState:
fake: State
You can fake all sorts of data: names, addresses, Latin text, English sentences, URLs, and so much more. To see the complete list, along with other related features, see the Fake Data Tutorial
friends
¶
To create a rule such as "For every Person
created, create two Animals
", use the friends
feature.
- object: Person
count: 3
fields:
name:
fake: name
age:
random_number:
min: 12
max: 95
friends: # I get by with a little help from my...
- object: Animal
count: 2
fields:
name:
fake: FirstName
This recipe outputs two animals per person.
Person(id=1, name=Sierra Ortega, age=91)
Animal(id=1, name=Sarah)
Animal(id=2, name=Brian)
Person(id=2, name=Robert Haley, age=42)
Animal(id=3, name=Michele)
Animal(id=4, name=Jonathan)
Person(id=3, name=Jason White, age=35)
Animal(id=5, name=John)
Animal(id=6, name=Kimberly)
There is no explicit relationship between the animals and the people in this example, but sometimes you need an implicit "relationship" between the number of one object created and the number of the other.
You can also use this feature for Many to One.
Relationships¶
Relationships are a big part of what separates Snowfakery from the dozens(!) of data generation tools out there.
Let's relate pets to their owners with ease in a recipe named person_pet.yml
.
- object: Person
count: 3
fields:
name:
fake: name
pet:
- object: Animal
fields:
name:
fake: FirstName
species: Petaurus Breviceps
Now each of the Person
rows has a Sugar glider for a pet (which is a good thing, as long as they know how to take care of them).
Let's look at what this recipe (randomly) generates.
Person(id=1, name=Rachel Thompson, pet=Animal(1))
Animal(id=2, name=Charles, species=Petaurus Breviceps)
Person(id=2, name=Alexander Zhang, pet=Animal(2))
Animal(id=3, name=Thomas, species=Petaurus Breviceps)
Person(id=3, name=Lisa King, pet=Animal(3))
In addition, we can relate pets and owners "bidirectionally" with the reference
feature.
Let's use the reference
feature in a recipe named secret_life_of_pets.yml
.
# snowfakery/docs/examples/secret_life_of_pets.yml
- object: Person
count: 3
fields:
name:
fake: name
pet:
- object: Animal
fields:
name:
fake: FirstName
owner:
reference: Person
Now Person
has a pet
field that refers to an Animal
row. Each Animal
row has an owner
field that refers back to the Person
row. It's called a bidirectional relationship--because it goes both ways. Of course, not all relationships are bi, but sometimes it's what your schema demands.
Let's look at what this recipe (randomly) generates.
Animal(id=1, name=Nicole, owner=Person(1))
Person(id=1, name=Steven Ellis, pet=Animal(1))
Animal(id=2, name=Richard, owner=Person(2))
Person(id=2, name=Chad Delacruz, pet=Animal(2))
Animal(id=3, name=Tammie, owner=Person(3))
Person(id=3, name=Corey Zamora, pet=Animal(3))
The relationship from Person
to Animal
is called pet
. It's expressed by embedding the Object Template for Animal
in the field named pet
.
The relationship from Animal
to Person
is called owner
. It's expressed by the reference
function. This function searches the YAML tree for the relevant Person
row.
You can also use the nickname
property to express a relationship between two rows that are not directly related in the hierarchy. For example, here's a recipe named pet_stories_2.yml
.
- object: PetFood
nickname: petschoice
fields:
name: Pets Choice
cost: $10
- object: PetFood
nickname: vetschoice
fields:
name: Vets Choice
cost: $12
- object: Person
count: 3
fields:
name:
fake: name
dog:
- object: Animal
nickname: dog
fields:
owner:
reference: Person
name:
fake: FirstName
species: canine
food:
reference: petschoice
cat:
- object: Animal
fields:
owner: Person
name:
fake: FirstName
species: feline
nemesis: dog
food:
reference: vetschoice
Let's look at what this recipe (randomly) generates.
PetFood(id=1, name=Pets Choice, cost=$10)
PetFood(id=2, name=Vets Choice, cost=$12)
Animal(id=1, owner=Person, name=Dustin, species=canine, food=PetFood(1))
Animal(id=2, owner=Person, name=Edwin, species=feline, nemesis=dog, food=PetFood(2))
Person(id=1, name=Antonio Martin, dog=Animal(1), cat=Animal(2))
Animal(id=3, owner=Person, name=Kristy, species=canine, food=PetFood(1))
Animal(id=4, owner=Person, name=Bryan, species=feline, nemesis=dog, food=PetFood(2))
Person(id=2, name=Hunter Wright, dog=Animal(3), cat=Animal(4))
Animal(id=5, owner=Person, name=Gregory, species=canine, food=PetFood(1))
Animal(id=6, owner=Person, name=Veronica, species=feline, nemesis=dog, food=PetFood(2))
Person(id=3, name=Rebecca Williams, dog=Animal(5), cat=Animal(6))
Funky!
The basic rule is that the last row (object) created with the nickname is the one that's referenced.
Function Blocks¶
Fields can refer to functions that randomize, compute or look up data. You do that by nesting the function name under the field name, or with formulas.
reference
¶
The reference
function lets you look up another row (object) and make a reference to it. For example:
- object: Animal
fields:
name:
fake: FirstName
owner:
reference: Person
The reference
function looks for another object by table name (Person
, in this example), or a nicknamed object by nickname
.
If an object is created earlier in the recipe, and it has the appropriate nickname
or tablename
property, that object is the target reference.
Otherwise, the reference can be to an object hasn't been created yet. Snowfakery generates an ID for the object so that the current row can be generated--only no other properties of this new object can be referred to, because it does not exist yet.
If the reference
value contains periods, Snowfakery follows field names to get to the final name. The first name before the first "." can either be a field name or an object name.
- object: cat
nickname: Fluffy
fields:
color: black
- object: fiance
nickname: sam
fields:
pet:
reference: Fluffy
- object: betrothed
fields:
spouse:
reference: sam
pet:
reference: spouse.pet
color: ${{pet.color}}
Snowfakery (and CumulusCI) lets you loop over a recipe many times to generate multiple rows. In this case, a reference
is always to an object created within the current iteration of a recipe. The exception to this rule are objects marked with the just_once
property. These objects are created only in the first iteration, and any references to them are solely to the objects created in that first iteration.
random_choice
¶
The random_choice
function randomly chooses an option from a list.
Payment_Method:
random_choice:
- Cash
- Cheque
- Credit Card
random_choice
evens the odds of its random selection unless you enter specific odds as a percentage.
StageName:
random_choice:
Closed Won: 60%
In Progress: 20%
New: 20%
To learn how to do more sophisticated randomness with features, read the Random Weights That Are Not Percentages section.
random_choice
can also be used to randomly select potential child objects.
- object: Task
fields:
person_doing_the_task:
random_choice:
- object: Contact
fields:
FirstName: Bart
LastName: Simpson
- object: Lead
fields:
FirstName: Marge
LastName: Simpson
A more sophisticated syntax lets you combine probabilities with values that are more complex than simple strings.
Let's explore this syntax with a recipe named random-choice-complex.yml
.
# examples/random-choice-complex.yml
- object: Task
count:
random_choice:
- choice:
probability: 30%
pick: 1
- choice:
probability: 30%
pick: 3
- choice:
probability: 30%
pick: 10
fields:
person_doing_the_task:
random_choice:
- choice:
probability: 40%
pick:
- object: Contact
fields:
FirstName: Bart
LastName: Simpson
- choice:
probability: 40%
pick:
- object: Contact
fields:
FirstName: Marge
LastName: Simpson
- choice:
probability: 20%
pick:
- object: Contact
fields:
FirstName: Lisa
LastName: Simpson
random_reference
¶
The random_reference
property creates a reference to a random, existing row from a table.
- object: Owner
count: 10
fields:
name:
fake: Name
- object: Pet
count: 10
fields:
ownedBy:
random_reference: Owner
The selected row can be any one that matches the object type and was already created in the current iteration of the recipe. For example, this previous recipe was executed twenty times (iterations) to generate 200 Pets
and Owners
. The selected rows in the first iteration would be one of the first ten Owners
, and the selected rows in the last iteration would be one of the last ten.
Snowfakery cannot currently generate a random reference based on a nickname or to a row created in a previous or future iteration of the recipe. If you need these features, contact the Snowfakery team through a GitHub issue.
fake
¶
The fake
function generates fake data. This function is defined further in the Fake Data Tutorial
date_between
¶
The date_between
function picks a random date in some date range. For example, this recipe picks a date between Y2K and the present day.
- object: OBJ
fields:
date:
date_between:
start_date: 2000-01-01
end_date: today
The options start_date
and end_date
can take these forms, too.
YYYY-MM-DD
+<number>d
:number
of days in the future, such as+10d
-<number>d
:number
of days in the past, such as-10d
+<number>M
:number
of months in the future, such as+10M
-<number>M
:number
of months in the past, such as-10M
+<number>y
:number
of years in the future, such as+10y
-<number>y
:number
of years in the past, such as-10y
+<number>w
:number
of weeks in the future, such as+10w
-<number>w
:number
of weeks in the past, such as-10w
today
: The date the template is evaluated
Case is relevant. The "M" for months is uppercase. The rest are lowercase.
Let's pick a date between 30 days ago and 108 days in the future.
Payment_Date:
date_between:
start_date: -30d
end_date: +180d
The date_between
function can also be used in formulas.
wedding_date: Our big day is ${{date_between(start_date="2022-01-31", end_date="2022-12-31")}}
random_number
¶
The random_number
function picks a number in a range specified by min
and max
.
age:
random_number:
min: 12
max: 95
If the number has to be divsible by another number (such as only multiples of 10), supply a step
.
age:
random_number:
min: 10
max: 90
step: 10
As the name step
implies, and to be a bit more precise, the output number minus min
is a multiple of the step
, which generates 12, 17 or 22.
age:
random_number:
min: 12
max: 23
step: 5
The random_number
function can also be used as a function in formulas.
some_number: A number ${{random_number(min=5, max=10)}}
if
and when
¶
if
lets you make field values conditional on other field values.
when
can be a Python formula. It's even interpreted as a boolean similar to how Python does it: the first when
clause that matches is selected. The final choice
clause in the sequence has no when
clause; it's selected if there's no match to the previous choices.
Let's use if
and when
in a recipe named conditional.yml
.
# examples/conditional.yml
- object: Person
fields:
gender:
random_choice:
Male: 40%
Female: 40%
Other: 20%
name:
if:
- choice:
when: ${{gender=='Male'}}
pick:
fake: FirstNameMale
- choice:
when: ${{gender=='Female'}}
pick:
fake: FirstNameFemale
- choice:
pick:
fake: FirstNameNonBinary
Macros¶
Macros re-uses groups of fields instead of requiring you to repeat them manually.
Let's use a macro in a recipe named evolution.yml
.
- macro: canine
fields:
sound: barks
legs: 4
family: Caninae
- object: Animal
include: canine
fields:
species: dog
home: inside
- object: Animal
include: canine
fields:
species: wolf
home: outside
This recipe generates:
Animal(id=1, sound=barks, legs=4.0, family=Caninae, species=dog, home=inside)
Animal(id=2, sound=barks, legs=4.0, family=Caninae, species=wolf, home=outside)
Let's include more than one group of macros in a recipe named evolution_2.yml
.
- macro: canine
fields:
sound: barks
legs: 4
family: Caninae
- macro: domestic
fields:
home: inside
eats: petfood
- object: Animal
count: 2
include: canine, domestic
fields:
name: dog
This recipe generates:
Animal(id=1, sound=barks, legs=4.0, family=Caninae, home=inside, eats=petfood, species=dog)
Animal(id=2, sound=barks, legs=4.0, family=Caninae, home=inside, eats=petfood, species=dog)
Macros can include other macros. In fact, macros are especially powerful if you combine them with the include_file
feature that lets one file include another. We even suggest you make a library of the most common object types you work with and then override fields to combine or specialize them.
.. note::
fields
or friends
declared in the macros listed later override those listed earlier. fields
or friends
declared in the Object Template override those declared in macros.
Define Variables¶
To generate a value shared by multiple templates, create a variable with var
.
Let's create a variable named "lastname.yml" that uses a shared var
.
- var: lastname_var
value:
fake: LastName
- object: person
fields:
FirstName:
fake: FirstName
LastName: ${{lastname_var}}
- object: spouse
fields:
FirstName:
fake: FirstName
LastName: ${{lastname_var}}
.. note::
This works both at the top level of your recipe and in friends
lists.
To group several fields together, create a "hidden" object.
- var: shared_address
value:
- object: __shared_address
fields:
street:
fake: StreetAddress
city:
fake: city
state:
fake: state
Include Files¶
The include_file
feature pulls in all declarations from the declared file. The file can itself include other files.
To include a file by a relative path:
- include_file: child.yml
Formulas¶
To insert data from one field into into another, use a formula.
# examples/price_math.recipe.yml
- object: Sale
fields:
num_items:
random_number:
min: 10
max: 20
per_item_price:
random_number:
min: 10
max: 20
message: Thanks for buying ${{num_items}} items @ $${{per_item_price}} each!
You can also make your data more dynamic with formulas. Formulas even use the same functions described in Function Blocks, but they can also be used inline.
- object: Sale
count: 2
fields:
per_item_price: ${{random_number(20, 50)}}
number_of_items: 3
total: ${{per_item_price * number_of_items}}
message: Thank you for buying $${{total}} items!
Formula Language¶
Some general principles about formulas:
- Use
${{
to start a formula and}}
to end it. - Use Python expression syntax in the middle.
- Field values defined earlier on this object are available as names.
- Generate
fake
values in formulas with dots ("."). For example,Name: ${{fake.FirstName}} Johnson
- Parent (or ancestor) values are available through the parent's
object
name, such asOpportunity.amount
.
Formulas are based on a similar language called Jinja2, but we use ${{
and }}
to be more compatible with YAML.
The relevant section of the Jinja document is called Expressions. It includes information about Literals, Math, Comparisons, Logic, Other Operators, If Expressions and Python Methods.
Formula Functions and Variables¶
"Filters" allow you to do transformations of strings and numbers like this:
# examples/jinja_filters.recipe.yml
- object: Campaign
fields:
Name: ${{fake.bs | title}} Campaign
UpperCaseName: ${{fake.bs | upper}}
LowerCaseName: ${{fake.bs | lower}}
The list of filters is in the Jinja documentation.
Formula Functions and Variables¶
The functions below are designed to be used in formulas.
child_index
¶
The child_index
function returns a counter of how many objects from this template were generated during the execution of the nearest parent template. It resets each time the parent template is executed.
child_index: ${{child_index}}
unique_id
¶
The unique_id
variable returns a unique number that can
be used to distinguish any record from others. For example,
we can incorporate a unique ID into an email address or
an employee ID.
# examples/unique_id/test_unique_id.yml
- object: Contact
count: 3
fields:
FirstName:
fake: FirstName
LastName:
fake: LastName
EmployeeNum: ${{unique_id}}
Email: ${{unique_id}}_${{fake.email}}
By default, Snowfakery works in "small id" mode, which means that IDs start as 6-7 digits, and grow longer as you generate more records.
$ snowfakery examples/unique_id/test_unique_id.yml
Contact(id=1, FirstName=Melody, LastName=Sherman, EmployeeNum=1371010, Email=9782010_cowens@example.org)
Contact(id=2, FirstName=Christian, LastName=Owens, EmployeeNum=2433010, Email=2414010_jimmy21@example.com)
Contact(id=3, FirstName=Tami, LastName=Stevenson, EmployeeNum=6375010, Email=8126010_leah50@example.net)
...
"Big ID" mode allows you to make 22+ digit IDs that are very likely to
be unique across time and space. It is
described more in unique_id
and Big IDs.
Snowfakery may change its default mode to Big IDs at some point in the future. Do not assume that your IDs will always be small.
If you do not care about uniqueness, use random_number
instead.
If you want the numbers to be generated in
a predictable fashion with no gaps, use
counter
instead.
If you want the numbers to have a certain
number of digits, use the UniqueId
plugin.
unique_alpha_code
¶
The unique_alpha_code
functions generates unique alphanumeric codes similar to those
used in some bureaucracies were space is an issue. Snowfakery alphanumeric codes are
about 11 characters long in small ID mode and about 17 in Big ID mode. The more you use,
the bigger they will grow.
# examples/unique_id/test_unique_alpha_code.yml
- object: Contact
count: 3
fields:
FirstName:
fake: FirstName
LastName:
fake: LastName
DepartmentCode: ${{unique_alpha_code}}
Contact(id=1, FirstName=Jeremy, LastName=Higgins, DepartmentCode=2AUHHSZN)
Contact(id=2, FirstName=Vicki, LastName=Vazquez, DepartmentCode=GDH9OYLN)
Contact(id=3, FirstName=Gene, LastName=Wall, DepartmentCode=42Q3XX3N)
today
¶
The today
variable returns a date representing the current date. This date does not change chanage during the execution of a single recipe.
fake:
and fake.
¶
The fake:
block function and fake.
namespace both generate fake data as described in the Fake Data Tutorial.
Let's use fake:
and fake.
in a recipe named two_fakers.yml
.
# examples/two_fakers.yml
- object: Contact
fields:
FirstName:
fake: FirstName
LastName: ${{fake.LastName}}
snowfakery_filename
¶
The snowfakery_filename
variable represents the file that contains the template. This is useful for relative paths.
date
¶
The date
function can coerce a string into a date object for calculations, or generate a new date object from year/month/day designations. For example:
the_date: ${{date("2018-10-30")}}
another_date: ${{date(year=2018, month=11, day=30)}}
relativedelta
¶
The relativedelta
function from dateutil
is available for use in calculations. For example:
${{ date(Date_Established__c) + relativedelta(months=child_index) }}
Some plugins are also interested in a template
variable that has an id
attribute that represents a unique identifier for the current template. Look at
datasets.py to see one use-case where the template's ID can used to differentiate between two otherwise identical datasets.
NULL
¶
The NULL
value represents a missing value.
Let's use this value in a recipe named nullvalue.yml
.
- object: foo
count: 10
fields:
EndDate:
random_choice:
- date_between:
start_date: 2018-01-01
end_date: 2021-12-12
- NULL
DateSupplied:
if:
- choice:
when: ${{ EndDate!=NULL }}
pick: "Yes"
- choice:
pick: "No"
In this example, NULL
becomes a random_choice
option in the EndDate
field. Then in the DateSupplied
field, you test if the value is NULL
or not.
Command Line Interface¶
Use the command line interface to learn the list of options available in the latest version of Snowfakery.
Usage: snowfakery [OPTIONS] YAML_FILE
Generates records from a YAML file
Records can go to:
* stdout (default)
* JSON file (--output-format=json --output-file=foo.json)
* diagram file (--output-format=png --output-file=foo.png)
* a database identified by --dburl (such as --dburl sqlite:////tmp/foo.db)
* or to a directory as a set of CSV files (--output-format=csv --output-folder=csvfiles)
Diagram output depends on the installation of graphviz
(https://www.graphviz.org/download/)
Full documentation here:
* https://snowfakery.readthedocs.io/en/docs/
Options:
--dburl TEXT URL for database to save data to. Use
sqlite:///foo.db if you don't have one set
up.
--output-format [png|svg|svgz|jpeg|jpg|ps|dot|json|txt|csv|sql]
--output-folder PATH
-o, --output-file PATH
--option EVAL_ARG... Option to send to the recipe YAML in a
format like 'OptName OptValue'. Specify
multiple times if needed.
--target-number, --target-count TEXT...
Target record count for the recipe YAML in
the form of 'number tablename'. For example:
'50 Account' to generate roughly 50
accounts.
--reps INTEGER Target repetition count for the recipe YAML.
Use as an alternative to --target-number
--debug-internals / --no-debug-internals
--generate-cci-mapping-file FILENAME
Generate a CumulusCI mapping file for the
dataset
--generate-continuation-file FILENAME
A file that captures information about how
to continue a multi-batch data generation
process
--continuation-file FILENAME Continue generating a dataset where
'continuation-file' left off
--plugin-option EVAL_ARG... Option to send to a plugin in a format like
'OptName OptValue'. Specify multiple times
if needed.
--load-declarations FILE Declarations to mix into the generated
mapping file
--version Show the version and exit.
--help Show this message and exit.
Scale Up Recipe Execution¶
From the command line you can control how many rows a recipe generates.
The simplest method:
snowfakery accounts.yml --reps 1000
This runs the recipe 1000 times. Easy!
But let's say you want to run a test against 50,000 accounts. You can calculate out how many reps
to run in the command line, but this can be complex because Snowfakery recipes can also generate a random number of accounts in each run. Even for simple recipes, doing the math can be a headache, especially if you frequently make edits to the recipe.
A better solution is to specify the target-number
and target tablename
. For example:
snowfakery accounts.yml --target-number 1000 Account
The count works like this:
- Snowfakery always executes a complete recipe. It never stops halfway through.
- At the end of recipe execution, Snowfakery checks if it created enough of the object type defined by
target-number
. - If so, Snowfakery finishes the recipe. If not, it runs the recipe again.
If your recipe creates 10 Accounts, 5 Contacts and 15 Opportunities, and the previous command runs the recipe 100 times (1000/10=100), it generates 1000 Accounts, 500 Contacts, and 1500 Opportunites.
CSV Output¶
To create a CSV directory:
$ snowfakery template.yml --output-format csv --output-folder csvfiles
...
This generates a directory that looks like:
Animal.csv
Person.csv
PetFood.csv
csvw_metadata.json
.. note::
If you do not specify an output-folder
, the files are created in the current folder.
The CSVW JSON file is a sort of manifest for all of the CSV files.
Advanced Features¶
Singletons with the just_once
Feature¶
Snowfakery scales up to larger data volumes by evaluating your recipe over and over again. Each evaluation is called an iteration
.
Some objects are meant to be produced only once, regardless of how many times the recipe executes. The programming language term for this is a "singleton". For example, an accounting system generates a dataset that has exactly two Ledger
objects, Checking
and Savings
. You can have dozens of Entries
per Ledger
, or even billions. But the recipe only generates two Ledger
rows.
Let's generate two--and only two!--Ledger
rows with the just_once
function in a recipe named twoledgers.yml
.
- object: Ledger
just_once: True
nickname: Checking
fields:
Name: Checking
- object: Ledger
just_once: True
nickname: Savings
fields:
Name: Savings
- object: Entry
count: 1000
fields:
Ledger: Checking
...
- object: Entry
count: 1000
fields:
Ledger: Savings
...
To execute this recipe from the Snowfakery command line:
$ snowfakery accounting.yml --target-number 10_000 Entry
...
This recipe creates two Ledger
rows, and 5000 Entry
rows attached to each of the Ledger
rows. If you scale up the recipe to one million, you still have only two Ledger
rows.
Hidden Fields and Objects¶
As described earlier, fields can refer to each other. For example, field c
can be the sum of fields a
and b
. Or perhaps you only want to output PersonLastName
if PersonFirstName
was set, and PersonFirstName
is set randomly.
To create a value that gets used in computations but does not become output in the final database or CSV, create a field value prefixed by two underscores.
You can even do this with Object Templates to generate "objects" that are never saved as rows to your database, Salesforce org, or output file.
Let's hide several fields and objects in a recipe named hidden_fields.yml
.
- object: Dates
fields:
__total_months: 48
__first_month: ${{today - relativedelta(months=__total_months)}}
__end_of_first_quarter: ${{date(__first_month) + relativedelta(months=3)}}
ProgramStartDate: ${{__first_month}}
FirstEvent:
date_between:
start_date: ${{__first_month}}
end_date: ${{__end_of_first_quarter}}
ProgramEndDate: ${{date(__first_month) + relativedelta(months=12)}}
This generates:
Dates(id=1, ProgramStartDate=2016-11-30, FirstEvent=2017-02-24, ProgramEndDate=2017-11-30)
Random Weights that are Not Percentages¶
Consider this field definition.
StageName:
random_choice:
Closed Won: 5
In Progress: 3
New: 4
Observant readers note that the values do not add up to 100. That's fine. Closed Won
is selected 5/12 of the time, In Progress
is selected 3/12, and New
is selected 4/12 of the time. They are just weights, not necessarily percentage weights.
Many to One Relationships¶
In relational databases, child records typically have a reference to their parent record, but the opposite is not true.
Let's examine the relationships featured in the recipe named company.yml
. You see that the Company
object (row) relates to many Employee
objects (rows).
# examples/company.yml
- object: Company
fields:
Name:
fake: company
- object: Employee
nickname: Employee 1
fields:
Name:
fake: name
EmployedBy:
- object: Company
- object: Employee
nickname: Employee 1
fields:
Name:
fake: name
EmployedBy:
- object: Company
This generates:
Company(id=1, Name=Nelson-Sampson)
Company(id=2)
Employee(id=1, Name=Julie Turner, EmployedBy=Company(2))
Company(id=3)
Employee(id=2, Name=Amanda Martin, EmployedBy=Company(3))
Now let's say you generate 10 companies with 100 employees per company. It's easy with Snowfakery's friends
or hidden field
feature.
Let's use the friends
feature in a recipe named company2.yml
.
# examples/company2.yml
- object: Company
count: 10
fields:
Name:
fake: company
friends:
- object: Employee
count: 100
nickname: Employee 1
fields:
Name:
fake: name
EmployedBy:
reference: Company
And let's use the hidden fields
([#hidden-fields-and-objects]) feature in a recipe named company3.yml
.
# examples/company3.yml
- object: Company
count: 10
fields:
Name:
fake: company
__employees:
- object: Employee
count: 100
nickname: Employee 1
fields:
Name:
fake: name
EmployedBy:
reference: Company
Advanced Unique ID Generators¶
Snowfakery has two built-in functions for generating UniqueIDs plus the more advanced
objects described below. The simple functions are unique_id
and
unique_alpha_code
. More advanced generators ID generators
can also be created.
unique_id
and Big IDs¶
Snowfakery's default mode for ID generation is "Small ID" mode, wherein IDs start out as single-digit numbers and grow over time.
"Big ID" mode allows you to make IDs that are very likely to be unique across time and space. Specifically: it incorporates a timestamp, information about the current process and about 8 bits of randomness. The generated IDs are about 22 digits long. Unless you are running Snowfakery on thousands of computers at the exact same time, the chances of a clash are miniscule, and in any case, a clash only matters if the unique IDs are destined to end up in a common Salesforce Org or database.
CumulusCI manages Big ID mode for you automatically.
If you are using Snowfakery outside of CumulusCI, you can turn on Big ID mode from the command line:
$ snowfakery examples/test_unique_id.yml --plugin-option big_ids True
Contact(id=1, FirstName=Stephen, LastName=Parrish, Employee=763534209134265915391, Email=763534209134265915392_brittany17@example.org)
If you do need even higher levels of uniqueness, you can
inject "unique execution ids" from some external source through the
plugin option pid
. Those execution
ids would replace the timestamp and process information in your Big ID's calculation.
$ snowfakery examples/test_unique_id.yml --plugin-option big_ids True --plugin-option pid 111
Contact(id=1, FirstName=Melody, LastName=Marquez, Employee=157937691, Email=157937692_emmahoover@example.org)
The 111
value is not literally included in the ID, but it is an input into
the unique ID generation algorithm. As an example, you can inject
about 15 digits of randomness and a timestamp like this on Unix-ish systems:
snowfakery examples/test_unique_id.yml --plugin-option big_ids True --plugin-option pid `date +"%s"`$RANDOM$RANDOM$RANDOM
Contact(id=1, FirstName=Cheryl, LastName=Estes, Employee=11014765223046647500920591, Email=11014765223046647500920592_solisroy@example.org)
If you pause and resume Snowfakery recipe generation using the continuation
file feature, indexes will restart, so you should inject an external unique
context ID through pid
as shown above or through the API.
Counters¶
Snowfakery has a plugin called Counters
which can generate
numeric counters (1,2,3,4 or 3,6,9,12) and date counters
(2021-01-01, 2021-01-02, 2021-01-03).
Counters are reset
every iteration of the recipe.
Counters are reset at the beginning of every "iteration". This makes them
inappropriate for generating unique identifiers. Use unique_id
instead.
Numeric Counters¶
Snowfakery can generate incrementing numbers like this:
# examples/counters/number_counter.recipe.yml
- plugin: snowfakery.standard_plugins.Counters
- object: Example
count: 10
fields:
count:
Counters.NumberCounter:
start: 1
name: foo
This would output:
$ snowfakery examples/counters/number_counter.recipe.yml
Example(id=1, count=1)
Example(id=2, count=2)
Example(id=3, count=3)
Example(id=4, count=4)
Example(id=5, count=5)
Example(id=6, count=6)
Example(id=7, count=7)
Example(id=8, count=8)
Example(id=9, count=9)
Example(id=10, count=10)
You can also control the start position and "step" like this:
# examples/counters/counter_start.recipe.yml
- plugin: snowfakery.standard_plugins.Counters
- object: Example
count: 10
fields:
count:
Counters.NumberCounter:
start: 11
step: 3
This generates:
Example(id=1, count=11)
Example(id=2, count=14)
Example(id=3, count=17)
Example(id=4, count=20)
Example(id=5, count=23)
Example(id=6, count=26)
Example(id=7, count=29)
Example(id=8, count=32)
Example(id=9, count=35)
Example(id=10, count=38)
As described above, counters are reset each iteration, as described above:
snowfakery examples/counters/number_counter.recipe.yml --reps 2
Example(id=1, count=1)
Example(id=2, count=2)
Example(id=3, count=3)
Example(id=4, count=4)
Example(id=5, count=5)
Example(id=6, count=6)
Example(id=7, count=7)
Example(id=8, count=8)
Example(id=9, count=9)
Example(id=10, count=10)
Example(id=11, count=1)
Example(id=12, count=2)
Example(id=13, count=3)
Example(id=14, count=4)
Example(id=15, count=5)
Example(id=16, count=6)
Example(id=17, count=7)
Example(id=18, count=8)
Example(id=19, count=9)
Example(id=20, count=10)
Date Counters for Schedules¶
Snowfakery can generate incrementing numbers like this:
# examples/counters/simple_date_counter.recipe.yml
- plugin: snowfakery.standard_plugins.Counters
- var: EveryTwoWeeks
value:
Counters.DateCounter:
start_date: 2021-01-01
step: +2w
- object: Meetings
count: 4
fields:
Date: ${{EveryTwoWeeks.next}}
Topic:
fake: catchphrase
This generates:
Meetings(id=1, Date=2021-01-01, Topic=Innovative dedicated solution)
Meetings(id=2, Date=2021-01-15, Topic=Open-architected tangible artificial intelligence)
Meetings(id=3, Date=2021-01-29, Topic=Compatible global definition)
Meetings(id=4, Date=2021-02-12, Topic=Optimized real-time archive)
The start_date
can be a particular date in YYYY-MM-DD
or the word today
.
The step
is based on a syntax from the Python Faker library. It can be:
+<number>d
:number
days between steps, e.g.+10d
(10 days)+<number>w
:number
weeks between steps, e.g.+10w
(70 days)+<number>M
:number
months between steps, e.g.+10M
(304 days)+<number>y
:number
years between steps, e.g.+10y
(3625 days)
Recipe Options¶
Instead of manually entering the exact number of records to create into a template file, pass an option
(such as numbers, strings, booleans, and so on) to your generator recipe from the command line.
First, declare the option
in your template file. For example:
- option: num_accounts
default: 10
.. note::
If you do not specify a default
, the option
itself becomes mandatory, and the template is not processed without it.
To use an option
value by referring to it in a formula:
- object: Account
count: ${{num_accounts}}
Of course, you can do any math you want in the formula...
- object: Account
count: ${{num_accounts / 2}}
field:
type: A
- object: Account
count: ${{num_accounts / 2}}
field:
type: B
...Then pass that option.
--option numaccounts 10
Use Snowfakery in Python¶
You can embed Snowfakery in a Python application.
from snowfakery import generate_data
generate_data(
yaml_file="examples/company.yml",
option=[("A", "B")],
target_number=(20, "Employee"),
debug_internals=True,
output_format="json",
output_file=outfile,
)
To learn more about using Snowfakery in Python, see Embedding Snowfakery into Python Applications
Use Snowfakery with Databases¶
Snowfakery is built on top of a very flexible engine called SQLAlchemy. This engine lets it connect to many different databases subject to the limitations described in this section.
First, install Snowfakery in a context that makes it easy to use the Python command pip
to manage your Python environment. For example, you can install Python
using the standard installers from python.org
, and then run these commands to create and use a venv
with the Postgres package.
# create a new directory for our experiment
$ mkdir experiment_with_postgres
# cd into it
$ cd experiment_with_postgres
# create a new database:
# https://www.postgresql.org/docs/9.1/app-createdb.html
$ createdb snowfakerydb
# create a virtual environment. A good habit to get into.
# https://docs.python.org/3/library/venv.html
$ python3 -m venv myvenv
# activate the venv
$ source myvenv/bin/activate
# install Snowfakery in this venv
$ pip install snowfakery
# install the Postgres library for Python
# https://pypi.org/project/psycopg2/
$ pip install psycopg2
# let's use it!
$ snowfakery --dburl='postgresql://localhost:5432/snowfakerydb' ~/code/Snowfakery/examples/company.yml --target-number 1000 Employee
# and check the results
# https://www.postgresql.org/docs/9.3/app-psql.html
$ echo 'select * from "Employee"' | psql snowfakerydb
A limitation of this process is that Snowfakery currently can only create new tables rather than import into existing ones. The table has an id
column in addition to columns for every field that was generated by the recipe. All columns are type text
.
The list of databases supported by our underlying infrastructure (SQLAlchemy) is listed here and here.
Snowfakery is not proactively tested with every output database. We encourage you to keep in touch with the Snowfakery team about your experience of using Snowfakery with your databases. It gives us a better sense of what works well and what does not. We also appreciate any bug reports and pull requests related to problems you discover along the way.
Plugins and Providers¶
Plugins and providers extend Snowfakery with Python code. A plugin adds new functions to Snowfakery. A provider adds new capabilities to the Faker library, which is exposed to Snowfakery users through the fake:
keyword.
To include plugins or providers in a Snowfakery file:
- plugin: package.module.classname
Built-in Plugins¶
Advanced Math¶
Snowfakery has a Math
plugin that offers access to all features from Python's math
, min
, max
, and round
.
For example:
- plugin: snowfakery.standard_plugins.Math
- object: OBJ
fields:
twelve:
Math.sqrt: ${{Math.min(144, 169)}}
Or:
- plugin: snowfakery.standard_plugins.Math
- object: OBJ
fields:
twelve: ${Math.sqrt}
Advanced Unique IDs with the UniqueId plugin¶
There is a plugin which gives you more control over the generation of Unique IDs and Alphanumeric Codes.
Unique ID Alphabets¶
One of the features of the UniqueId plugin is controlling the alphabet used for generating alphanumeric codes. One could make the alphabet bigger or smaller as below:
# examples/unique_id/alphabet.recipe.yml
- plugin: snowfakery.standard_plugins.UniqueId
- var: LargeAlphabetGenerator
value:
UniqueId.AlphaCodeGenerator:
alphabet: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
- var: DNAGenerator
value:
UniqueId.AlphaCodeGenerator:
alphabet: ACGT
- var: NumberGenerator
value:
UniqueId.AlphaCodeGenerator:
alphabet: 0123456789
- object: DemonstrateAlphabets
count: 5
fields:
big_alpha_example: ${{LargeAlphabetGenerator.unique_id}}
dna_example: ${{DNAGenerator.unique_id}}
num_example: ${{NumberGenerator.unique_id}}
This would generate:
DemonstrateAlphabets(id=1, big_alpha_example=2r0NXGiBY, dna_example=GACGCCCTACCA, num_example=6843813157)
DemonstrateAlphabets(id=2, big_alpha_example=4LTepQ9TE, dna_example=TGTGTTTTTATGA, num_example=56465396322)
DemonstrateAlphabets(id=3, big_alpha_example=2yRu7l5Rk, dna_example=TGTCTGGTCTTA, num_example=22377438298)
DemonstrateAlphabets(id=4, big_alpha_example=1WCE6hDIO, dna_example=TGGTTTGTTAGA, num_example=22299878992)
DemonstrateAlphabets(id=5, big_alpha_example=1I25wfsXE, dna_example=GCGTGTCGTATTA, num_example=39979614438)
Code length¶
You can control the minimum (but not the maximum) length of a unique code:
# examples/unique_id/min_length.recipe.yml
- plugin: snowfakery.standard_plugins.UniqueId
- var: MySmallCodeGenerator
value:
UniqueId.AlphaCodeGenerator:
min_chars: 6
- object: Example
count: 10
fields:
unique: ${{MySmallCodeGenerator.unique_id}}
The code will grow over time as a consequence of keeping every value unique.
External datasets¶
Snowfakery can incorporate data from external CSV files or databases as datasets.
Iterate Over CSV Datasets¶
Here's an example of a simple CSV-based dataset named addresses.csv
. Note that the column names come from the first row.
Number,Street,City,Postcode
420,Kings Ave,Burnaby,85633
421,Granville Street,White Rock,85633
422,Kingsway Road,Richmond,85633
To iterate over this data with Snowfakery:
- plugin: snowfakery.standard_plugins.datasets.Dataset
- object: Person
count: 10
fields:
Name:
fake: name
__address_from_csv:
Dataset.iterate:
dataset: addresses.csv
StreetAddress: ${{__address_from_csv.Number}} ${{__address_from_csv.Street}}
City: ${{__address_from_csv.City}}
examples/datasets/datasets.recipe.yml
The plugin only needs to be declared once per recipe.
The double-underscore ("dunder") syntax is described in [#hidden-fields-and-objects]. In this context, it grabs an entire CSV row and gives it a name so that specific columns can later be called by that name.
The Dataset.iterate
function block pulls each row from the dataset in order, one for each Person
. It returns to the top after it has used every row.
Let's look at what this dataset (randomly) generates.
$ snowfakery examples/datasets/datasets.recipe.yml
Person(id=1, Name=Dawn Gray, StreetAddress=420 Kings Ave, City=Burnaby)
Person(id=2, Name=Melissa Walker, StreetAddress=421 Granville Street, City=White Rock)
Person(id=3, Name=Roberto Simon, StreetAddress=422 Kingsway Road, City=Richmond)
Person(id=4, Name=Miss Jessica Ramsey, StreetAddress=420 Kings Ave, City=Burnaby)
Person(id=5, Name=Bobby Sutton, StreetAddress=421 Granville Street, City=White Rock)
Person(id=6, Name=Tiffany Cox, StreetAddress=422 Kingsway Road, City=Richmond)
Person(id=7, Name=Ethan Potts, StreetAddress=420 Kings Ave, City=Burnaby)
Person(id=8, Name=Tamara Farley, StreetAddress=421 Granville Street, City=White Rock)
Person(id=9, Name=Austin Wong, StreetAddress=422 Kingsway Road, City=Richmond)
Person(id=10, Name=Kelly Jones, StreetAddress=420 Kings Ave, City=Burnaby)
It's best to use this data with:
- A CSV file to "enrich" your fake data with real or pre-generated data
- Snowfakery to "enrich" your real or pre-generated data with fake data
Iterate over Salesforce Datasets¶
CumulusCI can be used to download CSV data for enrichment.
Let's download CSV data in a recipe named CumulusCI.yml
.
contacts_for_accounts:
steps:
1:
task: query
options:
object: Account
query: select Id from Account
result_file: accounts.csv
2:
task: generate_and_load_from_yaml
options:
generator_yaml: examples/salesforce/ContactsForAccounts.recipe.yml
num_records: 100
num_records_tablename: Contact
As you can see, CumulusCI.yml
uses the generate_and_load_from_yaml
task to parse data from the recipe named ContactsForAccounts.recipe.yml
.
- plugin: snowfakery.standard_plugins.datasets.Dataset
- object: Contact
fields:
__accounts:
Dataset.shuffle:
dataset: ../../accounts.csv
FirstName:
fake: FirstName
LastName:
fake: LastName
AccountId: ${{__accounts.Id}}
Of course, adjust the paths based on your directory structure.
It's a matter of time until Snowfakery has built-in features to query SOQL, so we recommend that you evaluate those features as an alternative to this technique.
Iterate over SQL Database Datasets¶
If the reference to a dataset is a URL, Snowfakery attempts to use it as a reference to a database. Snowfakery supports sqlite:///
urls as described in the documentation for SQL Alchemy.
That looks exactly like above except for this line.
dataset: sqlite:///addresses.db
Depending on the context, other database URLs work as well if the context has appropriate drivers installed. Only SQLite is part of our test suite, however.
If a SQL dataset has more than one table, specify what table to use. For example:
__address_from_csv:
Dataset.iterate:
dataset: addresses.csv
table: addresses
Shuffle Data¶
If you prefer the rows be pulled out in random order, you have two options. The simplest thing is to shuffle it in your Snowfakery recipe.
__address_from_csv:
Dataset.shuffle:
dataset: addresses.csv
If you experience performance or memory usage problems with big datasets, your next option is to shuffle the data into a random order before using it in Snowfakery. Snowfakery's source repository includes a simplistic tool called shufflecsv.py
that shuffles data for CSV files. You feed it a CSV file on stdin
, and it generates another CSV file on stdout
.
To shuffle data with shufflecsv.py
in the command line:
$ python tools/shufflecsv.py < examples/datasets/addresses.csv > examples/datasets/shuffled.csv
...
.. note:: This script does not currently support CSVs that include newlines within fields.
To shuffle for SQLite or another database, write a script that uses the clause order by random()
. This is also how Snowfakery does its randomization internally, so there's not much incentive to only run the Snowfakery recipe once. However, it can save you time if you run the Snowfakery recipe over and over because the shuffling happens once.
Read Files¶
To read and include Unicode files:
- plugin: snowfakery.standard_plugins.file.File
- object: TextData
fields:
encoded_data:
- File.file_data:
encoding: utf-8
file: ../CODE_OF_CONDUCT.md
utf-8
is the default encoding, so you don't actually need to declare it here. That said, other popular text encodings include ascii
, big5
, latin-1
and shift_jis
. There's even more encodings featured on the 100 encodings list.
To read and include Binary files:
- plugin: snowfakery.standard_plugins.base64.Base64
- plugin: snowfakery.standard_plugins.file.File
- object: BinaryData
fields:
encoded_data:
Base64.encode:
- File.file_data:
encoding: binary
file: salesforce/example.pdf
Other encodings of binary data are not currently supported, and output streams generally don't support raw binary data being written to them. It is relatively easy to make a plugin that does other encodings by building on the code in File.py
.
Custom Plugins and Providers¶
Snowfakery can be extended with custom plugins and fake data providers as described in Extending Snowfakery with Python Code.
Use Snowfakery with Salesforce¶
Snowfakery recipes that generate Salesforce records are like any other Snowfakery recipes, but instead use SObject
names for the objects
. There are several examples in the Snowfakery repository.
Salesforce-specific patterns and tools are described in Use Snowfakery with Salesforce
Appendices¶
Snowfakery Glossary¶
- Object: A type of data, or in this case a row of data, that represents real-world entities like people, places, or things. See also: Rows.
- Object Template: Instructions on how to create a row, or multiple rows, in a database. Each row represents a real-world object.
- Rows: A unit of related information to an object listed in a database, often called records. For example, in Salesforce (which includes a database) a
Contact
record has a first name, last name, phone number, and so on. In a database eachContact
is listed as a separate row.Contact
is the object type of each of those rows, and rows represent real-world objects. See also: Objects. - Recipe: A Snowfakery YAML file that instructs Snowfakery on what to generate.
- Iteration: Snowfakery recipes can be "scaled up" to generate more data when executed over and over again. These executions are called iterations.
- Singleton: A singleton is an Object Template that generates a single row regardless of how many times the recipe is iterated.
- YAML: A relatively simple, human-readable format. Learn more at yaml.org.
Appendix: Security Profile of Snowfakery¶
Snowfakery is a domain-specific programming language with access to most of the power of Python. It can load Python plugins and call Python methods. It's unwise to run untrusted recipes in an environment that has access to secure resources, such as passwords, network connections, and so on.
Appendix: The Age Old Puzzle¶
# snowfakery/docs/examples/st_ives.yml
# As I was going to St. Ives,
# I met a man with seven wives,
# Each wife had seven sacks,
# Each sack had seven cats,
# Each cat had seven kits:
# Kits, cats, sacks, and wives,
# How many were there going to St. Ives?
#
# https://en.wikipedia.org/wiki/As_I_was_going_to_St_Ives
- object: narrator
- object: man
fields:
wives:
- object: woman
count: 7
fields:
husband:
reference: man
luggage:
- object: sack
count: 7
fields:
holder:
reference: woman
contents:
- object: cat
count: 7
fields:
container:
reference: sack
offspring:
- object: kit
count: 7
fields:
parent:
reference: cat
- object: stats
fields:
num_narrators: ${{ man.id }}
num_men: ${{ man.id }}
num_women: ${{ woman.id }}
num_sack: ${{ sack.id }}
num_cat: ${{ cat.id }}
num_kittens: ${{ kit.id }}
everyone: ${{ num_men + num_narrators + num_women + num_sack + num_cat + num_kittens }}
going_to_st_ives: ${{ num_narrators }}
What does it output as its last row?
This (incomplete) picture probably won't help....