autoedit is a Python audio tool that creates automagic edits of too-long or too-short audio files. Randomly, no repeat. Generate and test. Good news is, they always sound good. How? With big use of small AI. Now available as web app, no install required.

tl;dr autoedit

autoedit is a tool that creates an edit from your input file. Edit means, the input file is broken into chunks and then some chunks are chosen and arranged in some order to make the output file.

Quick start

autoedit is available as a Python project as part of my smp audio codebase in Github. If you are on good terms with setting up a Python environment and installing a set of dependencies you are good to go with this.

If you are not inclined to setup the Python project, we have prepared autoedit to be used as a web app. To use it, you need to enter an API key. First time round, please be in touch with me to get a key. Once you have got the key, go to to https://api.jetpack.cl and paste your key into the form field you see on that site.

On the the site you will find four red tabs, Drop, Files, Actions, Tasks. Upload some files via the drop tab, either from your hard disk or from the internet via a Youtube or similar URL.

Go to the file tab to make sure your files are there.

Go to the Actions tab and select autoedit. In the autoedit configuration form, add an input file, set the duration to something small, 30 - 60 seconds. Set the assembly mode to random. Set the number of segments (numsegs) to something between 20 and 200.

Click start. Wait a bit and download the .wav result and ignore the other outputs. Don’t like it, rerun it, like it. Generate and test.

Why autoedit?

A lot more music is created every day than is actually published.

To some extent that is about the confidence that people have about the relevance of the music they make in their own sociocultural island. They play it safe and say, I’m just making this for my own pleasure. Granted.

Even for music producers that would describe themselves as such, and who have previously published music to critical acclaim, a large proportion of what is created remains dark in private storage and is not published, for a number of reasons.

One of these reasons is the effort of going the last mile, so to say, the effort to take a raw piece of music with some good bits in it, like a session recording, and from that make refined and publishable form. The reluctance to do so is understandable because it can become a lot of effort, even just screening the footage to get an overview of what is in there in a longer session.

What then, if this part of the production/publication cycle could be automated? Would we hear even more cool music than we do already because just more of what is made is actually published?

Why autoedit? II

In many areas of professional and recreational activity, you find yourself in front of long media files, often recordings of some session or event. Common to all these cases is the wish to shorten these files for publishing and easier consumption. Shortening means to cut the original recording into slices, leaving some or most of them out and picking a few to remain in the resulting shortened version.

Most often again, there is parts which do not contain any relevant information. While there is other parts where a lot of relevant information is packed into a small amount of time (think for example biomonitoring and wildlife recordings).

In all of these cases, the transition from one part to the next is marked by an onset of something new, which hasn’t been there before. If you automagically slice the original recording into chunks on each of these onsets, then select the relevant chunks, and glue them back together to form an output, that would help a lot. Example areas where this is the daily business is music production, publishing, event organization, or journalism.

Turns out that can be done.

How does it work?

TODO

What is autoedit?

Autoedit was invented collectively observing a session based production workflow. Visiting friends in the stu on a regular basis, we observed that capture (recording the stuff) was being handled, yet consolidating or even publishing it was not.

A lot of thinking ticks later, we arrived at a workable proposition consisting of a collection of Python scripts that do the job. A couple of run-ins with potential users and going through their individual installation processes told us to create a web app and provide autoedit in style of Software-as-a-Service (SaaS).

If you’re in a hurry, enter https://api.jetpack.cl

Questions

what happens if you tell autoedit to create a track longer than what you give it? does it just repeat segments?

yes

how do i credit droptrack on my release?

please use

“Sensorimotor Primitives Auditory Perception Software by Jetpack Cognition Lab”

Design philosophy

What we are talking about is music cognition and what we use to create that is a set of libraries from the Music Information Retrieval (MIR) community [1]. There exists significant amount of music cognition ability in these libraries.

We set out to create a segmentation of the input file into a set of different parts. Part in the sense of where you and I would agree that these do constitute different parts of any given piece of audio recording. Twenty seconds of the main voice active, 40 seconds of no bass drum, 12 seconds of a single voice. That kinda stuff, mute on / off on the desk. In addition we want the large scale segmentation to be aligned on the beat grid, if the audio has a beat (every audio has some degree of beat but don’t tell anyone, else confusion and cost rained down on us).

Now there is some realization from that we can put into action:

1/ combine the functionally overlapping feature detectors of different MIR libraries to obtain an improved result

2/ use philosophy to utilize onset type features on two different time scales

3/ combine two time scales to make beat aligned segment boundary inferences

4/ use random strategies, they are cheap and always worth it.

(Idea can be extended to other domains, that’s another post).

Footnotes

[1] ISMIR Community, TODO insert link

Comments

tag cloud

robotics AI music books research psychology intelligence feed ethology computational startups sound jcl audio brain organization motivation models micro management jetpack funding dsp testing test synthesis sonfication smp scope risk principles musician motion mapping language gt fail exploration evolution epistemology digital decision datadriven computing complexity algorithms aesthetics wayfinding visualization tools theory temporal sustainability stuff sonic-art sonic-ambience society signal-processing self score robots robot-learning robot python pxp priors predictive policies philosophy perception organization-of-behavior open-world open-culture neuroscience networking network navigation movies minecraft midi measures math locomotion linux learning kpi internet init health hacker growth grounding graphical generative gaming games explanation event-representation embedding economy discrete development definitions cyberspace culture creativity computer-music computer compmus cognition business birds biology bio-inspiration android agents action