{@thread.name}

zachdaniel
2023-05-03

zachdaniel:

Bulk creates are released! Just like generic actions, its still a bit experimental, but its in a good place to play with. The main thing that is likely to be problematic currently is that it seems to swallow up certain kinds of errors. Would be great to get some hands on bulk actions and see how it goes!

https://ash-hq.org/docs/guides/ash/latest/topics/bulk-actions

victorbjorklund:

Played around with it and it works when creating something that doesnt exist yet but upserts does not seem to work. Are upserts supported already?

zachdaniel:

They should be yes

zachdaniel:

what was the issue that you found?

zachdaniel:

I’ve got another issue to look into today around bulk upserts specifically, so perhaps you’re running into the same issue

zachdaniel:

any details would be very useful

victorbjorklund:

Alright. Maybe I defined it wrong. If I give it a long list it kind of gives an unreadable error but if I provide just a list of one

iex(44)> TheSchoolApp.Courses.bulk_create([%{name: "Acme Inc", pam_id: "666"}], TheSchoolApp.Courses.Customer, :create) [debug] QUERY OK db=0.9ms idle=1562.5ms begin [] ↳ anonymous fn/7 in Ash.Actions.Create.Bulk.do_run/5, at: lib/ash/actions/create/bulk.ex:190 [debug] QUERY ERROR db=1.0ms INSERT INTO "customers" ("id","inserted_at","name","pam_id","updated_at") VALUES ($1,$2,$3,$4,$5) ["03ec4e84-6f0f-42c6-8287-7ba21064f926", ~U[2023-05-04 14:06:05.794095Z], #Ash.CiString<"Acme Inc">, #Ash.CiString<"666">, ~U[2023-05-04 14:06:05.794095Z]] ↳ AshPostgres.DataLayer.create/2, at: lib/data_layer.ex:1037 [debug] QUERY OK db=1.3ms rollback [] ↳ anonymous fn/7 in Ash.Actions.Create.Bulk.do_run/5, at: lib/ash/actions/create/bulk.ex:190 %Ash.BulkResult{ status: :error, errors: [:rollback], records: [], notifications: nil }

It seems that it just create sql without any conflict handling.

The action:

create :create do
  accept [:pam_id, :name]
  upsert? true
  upsert_identity :unique_pam_id
  upsert_fields [:name]
end

Upserts work though if I just do a traditional create! per entry

zachdaniel:

oh, interesting

zachdaniel:

have you updated ash_postgres also?

zachdaniel:

If core sees a data layer that doesn’t support bulk, it just loops and creates (simulates an upsert) and it kinda looks like thats what it is doing there

zachdaniel:

and yeah errors: [:rollback] is weird. I think we are missing some kind of error handling there.

victorbjorklund:

let me try

victorbjorklund:

Interesting. After updating ash_postgres the error disappears but I get some strange behaviour:

TheSchoolApp.Courses.bulk_create([%{name: "The Pope", pam_id: "2220"}], TheSchoolApp.Courses.Customer, :create) results in sql

INSERT INTO "customers" ("id","inserted_at","name","pam_id","updated_at") VALUES ($1,$2,$3,$4,$5) ON CONFLICT ("id") DO UPDATE SET "name" = EXCLUDED."name" ["02e95fc3-e685-4a96-8c77-939903a4915a", ~U[2023-05-04 14:33:43.485236Z], #Ash.CiString<"The Pope">, #Ash.CiString<"2220">, ~U[2023-05-04 14:33:43.485236Z]]

while

Customer.create!(%{name: "The Pope", pam_id: "2220"})

results in sql:

INSERT INTO "customers" AS c0 ("id","inserted_at","name","pam_id","updated_at") VALUES ($1,$2,$3,$4,$5) ON CONFLICT ("pam_id") DO UPDATE SET "name" = $6, "pam_id" = $7 RETURNING "updated_at","inserted_at","pam_id","name","id" ["66e6fde9-4487-4313-bd23-0f44cdc96de0", ~U[2023-05-04 14:33:23.873013Z], #Ash.CiString<"The Pope">, #Ash.CiString<"2220">, ~U[2023-05-04 14:33:23.873013Z], #Ash.CiString<"The Pope">, #Ash.CiString<"2220">]

zachdaniel:

What am I looking for?

zachdaniel:

Oh

zachdaniel:

It’s not using the correct conflict targets?

victorbjorklund:

Exactly. And Im not sure if = EXCLUDED makes a difference too

victorbjorklund:

nevermind EXCLUDED. I get the purpose of that now

zachdaniel:

Okay, ill investigate, thanks!

victorbjorklund:

Great! Bulk actions will make ash even better!

zachdaniel:

okay, can you try main ? I think I found the issue 🙂

Eduardo B. Alexandre:

Not sure if I’m doing something wrong, but when I try to use the bulk_create function it will always rollback:

iex(1)> Pacman.Markets.bulk_create([%{full_address: "hue", external_id: "djiweo", external_id_type: :attom_id, state: "FL", type: :residential, publication_date: ~D[2022-01-01], upload_date: ~D[2022-01-01]}], Pacman.Markets.Property, :create, return_records?: true)
[debug] QUERY OK db=9.1ms queue=0.1ms idle=898.9ms
begin []
[debug] QUERY ERROR db=0.0ms
INSERT INTO "properties" ("country","external_id","external_id_type","full_address","id","inserted_at","publication_date","state","type","updated_at","upload_date") VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11) RETURNING "updated_at","inserted_at","upload_date","publication_date","rooms","bedrooms","partial_baths","baths","lot_width","lot_depth","lot_square_feet","year_built","type","geography","country","zip_4","zip","state","county_fips","county","city","unit_value","unit_suffix","street_suffix","street_name","street_post_direction","street_direction","house_number","full_address","minor_civil_division_code","msa_code","cbsa_code","external_id_type","external_id","id" ["USA", "djiweo", :attom_id, "hue", "782fd086-4b02-46f8-a0f8-b4b5466371da", ~U[2023-05-07 16:16:18.306570Z], ~D[2022-01-01], "FL", :residential, ~U[2023-05-07 16:16:18.306570Z], ~D[2022-01-01]]
[debug] QUERY OK db=0.2ms
rollback []
%Ash.BulkResult{
  status: :error,
  errors: [:rollback],
  records: [],
  notifications: nil
}

If I just use resource create function directly with the same params, it works fine.

(I tried with both the latest version of ash and ash_postgres and also with the main branch)

zachdaniel:

There is probably a bug somewhere in there 😦

zachdaniel:

Can you spot any issues with the generated SQL?

Eduardo B. Alexandre:

Yes, actually I changed a field name and didn’t run the migration, the problem is the geography field that is in the RETURNING part, running the migration fixed it

zachdaniel:

Interesting. So we’re definitely swallowing up some errors. Will be looking into it tomorrow or the day after

Eduardo B. Alexandre:

Yeah, I also get the same generic error if I don’t send all the non-null fields for each row being added.

Eduardo B. Alexandre:

So, for example, if I add 2 rows, and one is valid and the other one is not, I expected the first one to be added and get an error specific for the second one, but I just got the rollback generic error.

zachdaniel:

Gotcha, so the error message not showing is definitely a problem

zachdaniel:

however, if you want individual things to succeed you’re going to need to set your batch_size to 1

zachdaniel:

because each batch fails as a batch

zachdaniel:

(there is no way AFAIK to do a partially successful INSERT INTO in postgres, for example)

Eduardo B. Alexandre:

Hmm, I thought it would be something similar to what the insert_all call from Ecto does where it doesn’t return a specific error, but it returns the list of inserted ids, so you can diff that list and get the rows that are inserted correctly and the ones it didn’t.

I also thought it would be something like that since the documentation uses a Enum.reduce with the bulk_create function return. Not sure what exactly is the errors in this case.

zachdaniel:

🤔 how sure are you that that is how ecto’s insert_all works?

zachdaniel:

You can have it return ids/inserted count, yes, but you’re sure that it supports partial success? To the best of my knowledge, the insert call either succeeds or fails

Eduardo B. Alexandre:

Hmmm, let me check that, I thought I saw an example that did that for Broadway

Eduardo B. Alexandre:

Just tested it, you are correct, it will rollback if any row fails

zachdaniel:

Yeah, so if you want individual row failure, you can still use bulk actions with a bulk size of one. But keep in mind that does each call to the data layer as an individual insert. Thats not necessarily a problem, just something you want to be aware of.

\ ឵឵឵:

Depends on the error condition; malformed queries notwithstanding, supporting on conflict do nothing would be a useful counterpart to upsert? .

\ ឵឵឵:

There are ways to structure the query to tag the results that were and were not inserted separately as well.

zachdaniel:

Agreed 👍 we also need to add that to upserts, right now we set no metadata telling you if an insert occurred

zachdaniel:

okay, the issues w/ bulk creates not properly showing errors has been resolved 🙂

zachdaniel:

Hey everyone! There was a “bug” insofar as the initially planned return_errors? option wasn’t actually supported and doing the right thing. This may be a minor breaking change for some use cases.

zachdaniel:

essentially you’ll need to pass return_errors?: true if you ever want errors to be anything other than []

zachdaniel:

To supplement this, however (unless you are using return_stream? true , the result will accumulate an error count regardless

zachdaniel:

this is in main, will be releasing soon