Bulk Creates
zachdaniel:
Bulk creates are released! Just like generic actions, its still a bit experimental, but its in a good place to play with. The main thing that is likely to be problematic currently is that it seems to swallow up certain kinds of errors. Would be great to get some hands on bulk actions and see how it goes!
https://ash-hq.org/docs/guides/ash/latest/topics/bulk-actions
victorbjorklund:
Played around with it and it works when creating something that doesnt exist yet but upserts does not seem to work. Are upserts supported already?
zachdaniel:
They should be yes
zachdaniel:
what was the issue that you found?
zachdaniel:
I’ve got another issue to look into today around bulk upserts specifically, so perhaps you’re running into the same issue
zachdaniel:
any details would be very useful
victorbjorklund:
Alright. Maybe I defined it wrong. If I give it a long list it kind of gives an unreadable error but if I provide just a list of one
iex(44)> TheSchoolApp.Courses.bulk_create([%{name: "Acme Inc", pam_id: "666"}], TheSchoolApp.Courses.Customer, :create) [debug] QUERY OK db=0.9ms idle=1562.5ms begin [] ↳ anonymous fn/7 in Ash.Actions.Create.Bulk.do_run/5, at: lib/ash/actions/create/bulk.ex:190 [debug] QUERY ERROR db=1.0ms INSERT INTO "customers" ("id","inserted_at","name","pam_id","updated_at") VALUES ($1,$2,$3,$4,$5) ["03ec4e84-6f0f-42c6-8287-7ba21064f926", ~U[2023-05-04 14:06:05.794095Z], #Ash.CiString<"Acme Inc">, #Ash.CiString<"666">, ~U[2023-05-04 14:06:05.794095Z]] ↳ AshPostgres.DataLayer.create/2, at: lib/data_layer.ex:1037 [debug] QUERY OK db=1.3ms rollback [] ↳ anonymous fn/7 in Ash.Actions.Create.Bulk.do_run/5, at: lib/ash/actions/create/bulk.ex:190 %Ash.BulkResult{ status: :error, errors: [:rollback], records: [], notifications: nil }
It seems that it just create sql without any conflict handling.
The action:
create :create do
accept [:pam_id, :name]
upsert? true
upsert_identity :unique_pam_id
upsert_fields [:name]
end
Upserts work though if I just do a traditional create! per entry
zachdaniel:
oh, interesting
zachdaniel:
have you updated
ash_postgres
also?
zachdaniel:
If core sees a data layer that doesn’t support bulk, it just loops and creates (simulates an upsert) and it kinda looks like thats what it is doing there
zachdaniel:
and yeah
errors: [:rollback]
is weird. I think we are missing some kind of error handling there.
victorbjorklund:
let me try
victorbjorklund:
Interesting. After updating ash_postgres the error disappears but I get some strange behaviour:
TheSchoolApp.Courses.bulk_create([%{name: "The Pope", pam_id: "2220"}], TheSchoolApp.Courses.Customer, :create)
results in sql
INSERT INTO "customers" ("id","inserted_at","name","pam_id","updated_at") VALUES ($1,$2,$3,$4,$5) ON CONFLICT ("id") DO UPDATE SET "name" = EXCLUDED."name" ["02e95fc3-e685-4a96-8c77-939903a4915a", ~U[2023-05-04 14:33:43.485236Z], #Ash.CiString<"The Pope">, #Ash.CiString<"2220">, ~U[2023-05-04 14:33:43.485236Z]]
while
Customer.create!(%{name: "The Pope", pam_id: "2220"})
results in sql:
INSERT INTO "customers" AS c0 ("id","inserted_at","name","pam_id","updated_at") VALUES ($1,$2,$3,$4,$5) ON CONFLICT ("pam_id") DO UPDATE SET "name" = $6, "pam_id" = $7 RETURNING "updated_at","inserted_at","pam_id","name","id" ["66e6fde9-4487-4313-bd23-0f44cdc96de0", ~U[2023-05-04 14:33:23.873013Z], #Ash.CiString<"The Pope">, #Ash.CiString<"2220">, ~U[2023-05-04 14:33:23.873013Z], #Ash.CiString<"The Pope">, #Ash.CiString<"2220">]
zachdaniel:
What am I looking for?
zachdaniel:
Oh
zachdaniel:
It’s not using the correct conflict targets?
victorbjorklund:
Exactly. And Im not sure if = EXCLUDED makes a difference too
victorbjorklund:
nevermind EXCLUDED. I get the purpose of that now
zachdaniel:
Okay, ill investigate, thanks!
victorbjorklund:
Great! Bulk actions will make ash even better!
zachdaniel:
okay, can you try
main
? I think I found the issue 🙂
Eduardo B. Alexandre:
Not sure if I’m doing something wrong, but when I try to use the
bulk_create
function it will always rollback:
iex(1)> Pacman.Markets.bulk_create([%{full_address: "hue", external_id: "djiweo", external_id_type: :attom_id, state: "FL", type: :residential, publication_date: ~D[2022-01-01], upload_date: ~D[2022-01-01]}], Pacman.Markets.Property, :create, return_records?: true)
[debug] QUERY OK db=9.1ms queue=0.1ms idle=898.9ms
begin []
[debug] QUERY ERROR db=0.0ms
INSERT INTO "properties" ("country","external_id","external_id_type","full_address","id","inserted_at","publication_date","state","type","updated_at","upload_date") VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11) RETURNING "updated_at","inserted_at","upload_date","publication_date","rooms","bedrooms","partial_baths","baths","lot_width","lot_depth","lot_square_feet","year_built","type","geography","country","zip_4","zip","state","county_fips","county","city","unit_value","unit_suffix","street_suffix","street_name","street_post_direction","street_direction","house_number","full_address","minor_civil_division_code","msa_code","cbsa_code","external_id_type","external_id","id" ["USA", "djiweo", :attom_id, "hue", "782fd086-4b02-46f8-a0f8-b4b5466371da", ~U[2023-05-07 16:16:18.306570Z], ~D[2022-01-01], "FL", :residential, ~U[2023-05-07 16:16:18.306570Z], ~D[2022-01-01]]
[debug] QUERY OK db=0.2ms
rollback []
%Ash.BulkResult{
status: :error,
errors: [:rollback],
records: [],
notifications: nil
}
If I just use resource
create
function directly with the same params, it works fine.
(I tried with both the latest version of ash and ash_postgres and also with the
main
branch)
zachdaniel:
There is probably a bug somewhere in there 😦
zachdaniel:
Can you spot any issues with the generated SQL?
Eduardo B. Alexandre:
Yes, actually I changed a field name and didn’t run the migration, the problem is the
geography
field that is in the RETURNING part, running the migration fixed it
zachdaniel:
Interesting. So we’re definitely swallowing up some errors. Will be looking into it tomorrow or the day after
Eduardo B. Alexandre:
Yeah, I also get the same generic error if I don’t send all the non-null fields for each row being added.
Eduardo B. Alexandre:
So, for example, if I add 2 rows, and one is valid and the other one is not, I expected the first one to be added and get an error specific for the second one, but I just got the
rollback
generic error.
zachdaniel:
Gotcha, so the error message not showing is definitely a problem
zachdaniel:
however, if you want individual things to succeed you’re going to need to set your
batch_size
to
1
zachdaniel:
because each batch fails as a batch
zachdaniel:
(there is no way AFAIK to do a partially successful
INSERT INTO
in postgres, for example)
Eduardo B. Alexandre:
Hmm, I thought it would be something similar to what the insert_all call from Ecto does where it doesn’t return a specific error, but it returns the list of inserted ids, so you can diff that list and get the rows that are inserted correctly and the ones it didn’t.
I also thought it would be something like that since the documentation uses a Enum.reduce with the bulk_create function return. Not sure what exactly is the errors in this case.
zachdaniel:
🤔 how sure are you that that is how ecto’s insert_all works?
zachdaniel:
You can have it return ids/inserted count, yes, but you’re sure that it supports partial success? To the best of my knowledge, the
insert
call either succeeds or fails
Eduardo B. Alexandre:
Hmmm, let me check that, I thought I saw an example that did that for Broadway
Eduardo B. Alexandre:
Just tested it, you are correct, it will rollback if any row fails
zachdaniel:
Yeah, so if you want individual row failure, you can still use bulk actions with a bulk size of one. But keep in mind that does each call to the data layer as an individual insert. Thats not necessarily a problem, just something you want to be aware of.
\ ឵឵឵:
Depends on the error condition; malformed queries notwithstanding, supporting
on conflict do nothing
would be a useful counterpart to
upsert?
.
\ ឵឵឵:
There are ways to structure the query to tag the results that were and were not inserted separately as well.
zachdaniel:
Agreed 👍 we also need to add that to upserts, right now we set no metadata telling you if an insert occurred
zachdaniel:
okay, the issues w/ bulk creates not properly showing errors has been resolved 🙂
zachdaniel:
Hey everyone! There was a “bug” insofar as the initially planned
return_errors?
option wasn’t actually supported and doing the right thing. This
may
be a minor breaking change for some use cases.
zachdaniel:
essentially you’ll need to pass
return_errors?: true
if you ever want errors to be anything other than
[]
zachdaniel:
To supplement this, however (unless you are using
return_stream? true
, the result will accumulate an error count regardless
zachdaniel:
this is in main, will be releasing soon