2 Realistic Portrait Tools and an All-in-One GenAI Transcription Platform

A GPT-based DALLE-3 portrait builder, RenderNet, and Notta.ai

Feb 08, 2024

Morning y’all!

Hope the day is off to a good start! I reviewed a handful of tools yesterday and 3 of them stood out as “you should try this” material — two of them are “realistic" portrait building (GPT) and the other is for text transcription tool that has a number of practical uses (and it’s pretty wide in its offering).

As always, you’ll probably get varied results so experimenting and finding the right solution for your own needs is important. Cheers!

※\(^o^)/※

— Summer

The first is a realistic people GPT that bills itself as a tool to generate (wait for it) realistic human portraits using DALLE-3.

So, I gave it a whirl with this prompt:

a 30 year old korean woman with light purple streaks in her hair sitting at a coffee shop with the beach in the background in the early morning, people in the background and a coffee cup at the table.

And, here were the results:

Not exactly the best result and certainly not realistic. So, I went directly to DALLE-3 myself and tried the same prompt there, thinking I might get better results:

I kept at it in both the direct app as well as the GPT and the results kept getting worse. Perhaps you’ll have better results but I was generally unimpressed by the outcomes and definitely not what I was expecting.

But, if you’re in need of a cartoon-ish version of a portrait or even of yourself then it might be what you want. I didn’t hate the outputs though:

Good luck with this one!

The second did a much better job but will require you to pay after a few iterations:

Some of the more interesting and useful tools available are ControlNet and Facelock, the latter which allows you to tell the tool to “lock” in on the face giving you more control on the face output and the former for control of the body.

I gave it a whirl the the results were a lot better:

Here are some of the final outputs that I could see myself using in the future:

I hit the limit of free credits pretty quickly so I moved back to my free Stable Diffusion XL build on my local macOS to give it a go a well. I used the exact same prompt and included the same base image as well:

And the results were much, much better:

Experiment, iterate, experiment, iterate!

Transcription service using generative AI are fast-becoming a commodity as the technology has been around for a while and the (business) use cases are obvious. But, finding a good user experience in a non-confusing user interface is still the hill to climb and Notta didn’t offend me too badly.

What’s nice about this service is that it has a clean workflow and can do a number of useful tasks like uploading files for transcription, the ability to add it to a live meeting, and even record a video within their interface (doesn’t work in Safari, only Chrome).

I uploaded a very old file of a MacBook Air unboxing video I did more than 10 years ago and it was able to pull out the text quite easily:

Of course they’re going to try to upsell you into a subscription but they have a generous trial available with 120 minutes to get you started, more than enough to get a sense of whether it’ll be something you want to pay for.

I’d also add that the ability to translate and edit the results as well as export is a nice touch too since a lot of other tools don’t have these options right out of the box.

Not too shabby!

Have a good one folks!

※\(^o^)/※

— Summer

Building DeathNote

Discussion about this post

Ready for more?